Statistical Tests

In [1]:
x = sample(letters, 1000, replace=TRUE)
table(x)/10  # per cent
Out[1]:
x
  a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t 
4.6 4.3 3.5 3.5 4.4 3.0 3.6 2.9 3.8 4.2 4.2 4.9 2.8 4.4 2.5 3.8 3.8 3.4 3.8 4.3 
  u   v   w   x   y   z 
3.2 4.2 4.1 3.8 4.5 4.5 

Probability Functions

unif    uniform distribution
norm    normal
pois    Poisson
t       Student's
weibull Weibull
...

p is the cumulative distribution, dthe density function, and r generates random numbers according to this distribution.

In [2]:
x = rnorm(100)
hist(x, freq=FALSE)
curve(dnorm, -3, 3, col="red", add=TRUE)

Test: A company expects 7 service requests a day.
What is the probability that one day only 2 or less requests will come in?
(Assume, requests will be Poisson-distributed.) $$ Poisson(n, \lambda) = \frac{\lambda^n}{n!} \cdot e^{-\lambda} $$

In [3]:
poiss = function(n, lambda=1.0)
    lambda^n / factorial(n) * exp(-lambda)

sum(poiss(0:2, 7))

ppois(2, 7)
Out[3]:
0.0296361638805218
Out[3]:
0.0296361638805218

Statistical Tests

In [4]:
qqnorm(x)  # quantile-quantile plot
qqline(x, lty=2)
In [5]:
shapiro.test(x)
Out[5]:
	Shapiro-Wilk normality test

data:  x
W = 0.97669, p-value = 0.07317

"Two sample" Tests

In [6]:
ozdata <- read.csv("../data/ozone.csv")
T5 <- ozdata$Temp[ozdata$Month==5]
T7 <- ozdata$Temp[ozdata$Month==7]
T8 <- ozdata$Temp[ozdata$Month==8]
boxplot(T5, T7, T8, col="snow")

Student's t-Test

In [7]:
t.test(T5, T8)
Out[7]:
	Welch Two Sample t-test

data:  T5 and T8
t = -10.789, df = 59.904, p-value = 1.138e-15
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -21.83446 -15.00425
sample estimates:
mean of x mean of y 
 65.54839  83.96774 
In [8]:
t.test(T7, T8)
Out[8]:
	Welch Two Sample t-test

data:  T7 and T8
t = -0.045624, df = 51.755, p-value = 0.9638
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.902417  2.773385
sample estimates:
mean of x mean of y 
 83.90323  83.96774 

Kolmogorov-Smirnov Test

In [9]:
ks.test(T7, T8)
Warning message:
In ks.test(T7, T8): cannot compute exact p-value with ties
Out[9]:
	Two-sample Kolmogorov-Smirnov test

data:  T7 and T8
D = 0.25806, p-value = 0.2532
alternative hypothesis: two-sided

Variance and Covariance

In [ ]:
In [10]:
mean(x)
var(x)
sd(x)
Out[10]:
-0.0691765808173926
Out[10]:
0.735739737621592
Out[10]:
0.857752725219566
In [11]:
sum(abs(x) <= sd(x)) / length(x)
sum(abs(x) <= 2*sd(x)) / length(x)
sum(abs(x) <= 3*sd(x)) / length(x)
Out[11]:
0.63
Out[11]:
0.99
Out[11]:
1