| Assumptions | Test to use |
|---|---|
| not necessarily normal populations | Wilcoxon two-sample rank sum test |
| normal populations (with equal variances) | t test with pooled sample variance |
To do a Wilcoxon rank sum test for paired data, we can do
> x_c(.8,.83,1.89,1.04,1.45,1.38,1.91,1.64,0.73,1.46)
> y_c(1.15,.88,.9,.74,1.21)
> wilcox.test(x,y,alternative="greater")
Exact Wilcoxon rank-sum test
data: x and y
rank-sum statistic W = 90, n = 10, m = 5, p-value = 0.1272
alternative hypothesis: true mu is greater than 0
I have used the data from Example 4.1 in the textbook. In Splus,
the alternative="greater" means that the location of x is
greater than that of y. This is the opposite of the textbook.Splus do not find the distribution of the null distribution of the Wilcoxon rank sum statistic. We can estimate it by doing simulations (or we can look at Table A.6 in the textbook):
**********4a*******
n_6
m_8
nm_n+m
N_10000
rm(xb,y,ind,xb)
xb_c(1:N)
ind <-c(1:nm)
for(i in 1:N){
y <- sample(ind,n,replace=F)
xb[i]<-sum(y)
}
xb_sort(xb)
a1_rle(xb)$values
a2_rle(xb)$lengths
a2_a2/N
a3_cumsum(a2)
a4_1-a3
a_cbind(a1,a2,a3,a4)
dimnames(a)_list(c(NULL),c("value","probab.","cdf","1-cdf"))
*******************
The following program finds the Hodges-Lehamnn estimator of Delta and
a confidence interval for Delta, using normal approximations.
*****4b*********
m_length(x)
n_length(y)
alpha_.05
rm(differ,calpha1,calpha2,con1,con2,conf)
differ_c(y[1]-x[1])
for(i in 1:m){
for(j in 1:n){
differ_append(differ,y[j]-x[i])
}
}
differ_differ[-1]
k_median(differ)
zalpha_qnorm(.975)
calpha1_(((m*n)/2)-zalpha*sqrt(m*n*(m+n+1)/12))
calpha1_round(calpha1)
calpha2_((m*n)+1-calpha1)
diff_sort(differ)
con1_diff[calpha1]
con2_diff[calpha2]
conf_c(con1,con2)
************
> source("4b")
> k
[1] -0.305
> conf
[1] -0.76 0.15
> calpha1
[1] 9
> calpha2
[1] 42
We hat-Delta=-0.305 and the confidence interval for delta is
(-0.76,0.15)The usual t-statistic and t-confidence intervals are:
> t.test(x,y,alternative="greater")
Standard Two-Sample t-Test
data: x and y
t = 1.6061, df = 13, p-value = 0.0661
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-0.03457769 NA
sample estimates:
mean of x mean of y
1.313 0.976
> t.test(x,y,conf.level=.95)
Standard Two-Sample t-Test
data: x and y
t = 1.6061, df = 13, p-value = 0.1323
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1162891 0.7902891
sample estimates:
mean of x mean of y
1.313 0.976
which are similar to those found. The notation in the textbook is
opposite as the notation as the notation in Splus:(-0.76,0.15) and (-0.116,0.790) are almost the opposite of each other.