## Statistical Tests¶

In [1]:
x = sample(letters, 1000, replace=TRUE)
table(x)/10  # per cent

Out[1]:
x
a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t
4.6 4.3 3.5 3.5 4.4 3.0 3.6 2.9 3.8 4.2 4.2 4.9 2.8 4.4 2.5 3.8 3.8 3.4 3.8 4.3
u   v   w   x   y   z
3.2 4.2 4.1 3.8 4.5 4.5 

### Probability Functions¶

unif    uniform distribution
norm    normal
pois    Poisson
t       Student's
weibull Weibull
...



p is the cumulative distribution, dthe density function, and r generates random numbers according to this distribution.

In [2]:
x = rnorm(100)
hist(x, freq=FALSE)


Test: A company expects 7 service requests a day.
What is the probability that one day only 2 or less requests will come in?
(Assume, requests will be Poisson-distributed.) $$Poisson(n, \lambda) = \frac{\lambda^n}{n!} \cdot e^{-\lambda}$$

In [3]:
poiss = function(n, lambda=1.0)
lambda^n / factorial(n) * exp(-lambda)

sum(poiss(0:2, 7))

ppois(2, 7)

Out[3]:
0.0296361638805218
Out[3]:
0.0296361638805218

## Statistical Tests¶

In [4]:
qqnorm(x)  # quantile-quantile plot
qqline(x, lty=2)

In [5]:
shapiro.test(x)

Out[5]:
	Shapiro-Wilk normality test

data:  x
W = 0.97669, p-value = 0.07317


### "Two sample" Tests¶

In [6]:
ozdata <- read.csv("../data/ozone.csv")
T5 <- ozdata$Temp[ozdata$Month==5]
T7 <- ozdata$Temp[ozdata$Month==7]
T8 <- ozdata$Temp[ozdata$Month==8]
boxplot(T5, T7, T8, col="snow")


Student's t-Test

In [7]:
t.test(T5, T8)

Out[7]:
	Welch Two Sample t-test

data:  T5 and T8
t = -10.789, df = 59.904, p-value = 1.138e-15
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-21.83446 -15.00425
sample estimates:
mean of x mean of y
65.54839  83.96774

In [8]:
t.test(T7, T8)

Out[8]:
	Welch Two Sample t-test

data:  T7 and T8
t = -0.045624, df = 51.755, p-value = 0.9638
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.902417  2.773385
sample estimates:
mean of x mean of y
83.90323  83.96774


Kolmogorov-Smirnov Test

In [9]:
ks.test(T7, T8)

Warning message:
In ks.test(T7, T8): cannot compute exact p-value with ties
Out[9]:
	Two-sample Kolmogorov-Smirnov test

data:  T7 and T8
D = 0.25806, p-value = 0.2532
alternative hypothesis: two-sided


### Variance and Covariance¶

In [ ]:
In [10]:
mean(x)
var(x)
sd(x)

Out[10]:
-0.0691765808173926
Out[10]:
0.735739737621592
Out[10]:
0.857752725219566
In [11]:
sum(abs(x) <= sd(x)) / length(x)
sum(abs(x) <= 2*sd(x)) / length(x)
sum(abs(x) <= 3*sd(x)) / length(x)

Out[11]:
0.63
Out[11]:
0.99
Out[11]:
1