Chapter 4.

The two sample location problem.


We have X1,...,Xm i.i.d.r.v.'s from population 1. We have Y1,...,Yn i.i.d.r.v.'s from population 2. We assume that Y has the same distribution as X+ D. We test Ho: D= Do versus H1: D> Do (or H1: D < Do or H1: D ¹D).

Assumptions Test to use
not necessarily normal populations Wilcoxon two-sample rank sum test
normal populations (with equal variances) t test with pooled sample variance

To do a Wilcoxon rank sum test for paired data, we can do

> x_c(.8,.83,1.89,1.04,1.45,1.38,1.91,1.64,0.73,1.46)   
> y_c(1.15,.88,.9,.74,1.21)
> wilcox.test(x,y,alternative="greater")

        Exact Wilcoxon rank-sum test

data:  x and y
rank-sum statistic W = 90, n = 10, m = 5, p-value = 0.1272
alternative hypothesis: true mu is greater than 0  
I have used the data from Example 4.1 in the textbook. In Splus, the alternative="greater" means that the location of x is greater than that of y. This is the opposite of the textbook.

Splus do not find the distribution of the null distribution of the Wilcoxon rank sum statistic. We can estimate it by doing simulations (or we can look at Table A.6 in the textbook):

**********4a*******
n_6
m_8
nm_n+m
N_10000
rm(xb,y,ind,xb)
xb_c(1:N)
ind <-c(1:nm)
for(i in 1:N){
y <- sample(ind,n,replace=F)
xb[i]<-sum(y)
}
xb_sort(xb)
a1_rle(xb)$values
a2_rle(xb)$lengths
a2_a2/N
a3_cumsum(a2)
a4_1-a3
a_cbind(a1,a2,a3,a4)
dimnames(a)_list(c(NULL),c("value","probab.","cdf","1-cdf"))  
*******************
The following program finds the Hodges-Lehamnn estimator of Delta and a confidence interval for Delta, using normal approximations.
*****4b*********
m_length(x)
n_length(y)
alpha_.05
rm(differ,calpha1,calpha2,con1,con2,conf)
differ_c(y[1]-x[1])
for(i in 1:m){
for(j in 1:n){
differ_append(differ,y[j]-x[i])
}
}
differ_differ[-1]
k_median(differ)
zalpha_qnorm(.975)
calpha1_(((m*n)/2)-zalpha*sqrt(m*n*(m+n+1)/12))
calpha1_round(calpha1)
calpha2_((m*n)+1-calpha1)
diff_sort(differ)
con1_diff[calpha1]
con2_diff[calpha2]
conf_c(con1,con2) 
************
> source("4b")
> k
[1] -0.305
> conf
[1] -0.76  0.15
> calpha1
[1] 9
> calpha2
[1] 42 
We hat-Delta=-0.305 and the confidence interval for delta is (-0.76,0.15)

The usual t-statistic and t-confidence intervals are:

> t.test(x,y,alternative="greater")
	Standard Two-Sample t-Test
data:  x and y 
t = 1.6061, df = 13, p-value = 0.0661 
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -0.03457769          NA 
sample estimates:
 mean of x mean of y 
     1.313     0.976

> t.test(x,y,conf.level=.95)
	Standard Two-Sample t-Test
data:  x and y 
t = 1.6061, df = 13, p-value = 0.1323 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.1162891  0.7902891 
sample estimates:
 mean of x mean of y 
     1.313     0.976
which are similar to those found. The notation in the textbook is opposite as the notation as the notation in Splus:

(-0.76,0.15) and (-0.116,0.790) are almost the opposite of each other.

Comments to: Miguel A. Arcones