Statistical Tests¶

x = sample(letters, 1000, replace=TRUE)
table(x)/10  # per cent

x
  a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t 
4.6 4.3 3.5 3.5 4.4 3.0 3.6 2.9 3.8 4.2 4.2 4.9 2.8 4.4 2.5 3.8 3.8 3.4 3.8 4.3 
  u   v   w   x   y   z 
3.2 4.2 4.1 3.8 4.5 4.5

Probability Functions¶

unif    uniform distribution
norm    normal
pois    Poisson
t       Student's
weibull Weibull
...

p is the cumulative distribution, dthe density function, and r generates random numbers according to this distribution.

x = rnorm(100)
hist(x, freq=FALSE)
curve(dnorm, -3, 3, col="red", add=TRUE)

Test: A company expects 7 service requests a day.
What is the probability that one day only 2 or less requests will come in?
(Assume, requests will be Poisson-distributed.) $$ Poisson(n, \lambda) = \frac{\lambda^n}{n!} \cdot e^{-\lambda} $$

poiss = function(n, lambda=1.0)
    lambda^n / factorial(n) * exp(-lambda)

sum(poiss(0:2, 7))

ppois(2, 7)

Statistical Tests¶

qqnorm(x)  # quantile-quantile plot
qqline(x, lty=2)

shapiro.test(x)

	Shapiro-Wilk normality test

data:  x
W = 0.97669, p-value = 0.07317

"Two sample" Tests¶

ozdata <- read.csv("../data/ozone.csv")
T5 <- ozdata$Temp[ozdata$Month==5]
T7 <- ozdata$Temp[ozdata$Month==7]
T8 <- ozdata$Temp[ozdata$Month==8]
boxplot(T5, T7, T8, col="snow")

Student's t-Test

t.test(T5, T8)

	Welch Two Sample t-test

data:  T5 and T8
t = -10.789, df = 59.904, p-value = 1.138e-15
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -21.83446 -15.00425
sample estimates:
mean of x mean of y 
 65.54839  83.96774

t.test(T7, T8)

	Welch Two Sample t-test

data:  T7 and T8
t = -0.045624, df = 51.755, p-value = 0.9638
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.902417  2.773385
sample estimates:
mean of x mean of y 
 83.90323  83.96774

Kolmogorov-Smirnov Test

ks.test(T7, T8)

Warning message:
In ks.test(T7, T8): cannot compute exact p-value with ties

	Two-sample Kolmogorov-Smirnov test

data:  T7 and T8
D = 0.25806, p-value = 0.2532
alternative hypothesis: two-sided

Variance and Covariance¶

mean(x)
var(x)
sd(x)

sum(abs(x) <= sd(x)) / length(x)
sum(abs(x) <= 2*sd(x)) / length(x)
sum(abs(x) <= 3*sd(x)) / length(x)