Chapter 4. The two sample location problem.

We have X₁,...,X_m i.i.d.r.v.'s from population 1. We have Y₁,...,Y_n i.i.d.r.v.'s from population 2. We assume that Y has the same distribution as X+ D. We test H_o: D= D_o versus H₁: D> D_o (or H₁: D < D_o or H₁: D ¹D).

Assumptions Test to use

not necessarily normal populations Wilcoxon two-sample rank sum test

normal populations (with equal variances) t test with pooled sample variance

To do a Wilcoxon rank sum test for paired data, we can do
> x_c(.8,.83,1.89,1.04,1.45,1.38,1.91,1.64,0.73,1.46) > y_c(1.15,.88,.9,.74,1.21) > wilcox.test(x,y,alternative="greater") Exact Wilcoxon rank-sum test data: x and y rank-sum statistic W = 90, n = 10, m = 5, p-value = 0.1272 alternative hypothesis: true mu is greater than 0
I have used the data from Example 4.1 in the textbook. In Splus, the alternative="greater" means that the location of x is greater than that of y. This is the opposite of the textbook.
Splus do not find the distribution of the null distribution of the Wilcoxon rank sum statistic. We can estimate it by doing simulations (or we can look at Table A.6 in the textbook):
**********4a******* n_6 m_8 nm_n+m N_10000 rm(xb,y,ind,xb) xb_c(1:N) ind <-c(1:nm) for(i in 1:N){ y <- sample(ind,n,replace=F) xb[i]<-sum(y) } xb_sort(xb) a1_rle(xb)$values a2_rle(xb)$lengths a2_a2/N a3_cumsum(a2) a4_1-a3 a_cbind(a1,a2,a3,a4) dimnames(a)_list(c(NULL),c("value","probab.","cdf","1-cdf")) *******************
The following program finds the Hodges-Lehamnn estimator of Delta and a confidence interval for Delta, using normal approximations.
*****4b********* m_length(x) n_length(y) alpha_.05 rm(differ,calpha1,calpha2,con1,con2,conf) differ_c(y[1]-x[1]) for(i in 1:m){ for(j in 1:n){ differ_append(differ,y[j]-x[i]) } } differ_differ[-1] k_median(differ) zalpha_qnorm(.975) calpha1_(((m*n)/2)-zalpha*sqrt(m*n*(m+n+1)/12)) calpha1_round(calpha1) calpha2_((m*n)+1-calpha1) diff_sort(differ) con1_diff[calpha1] con2_diff[calpha2] conf_c(con1,con2) ************ > source("4b") > k [1] -0.305 > conf [1] -0.76 0.15 > calpha1 [1] 9 > calpha2 [1] 42
We hat-Delta=-0.305 and the confidence interval for delta is (-0.76,0.15)
The usual t-statistic and t-confidence intervals are:
> t.test(x,y,alternative="greater") Standard Two-Sample t-Test data: x and y t = 1.6061, df = 13, p-value = 0.0661 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: -0.03457769 NA sample estimates: mean of x mean of y 1.313 0.976 > t.test(x,y,conf.level=.95) Standard Two-Sample t-Test data: x and y t = 1.6061, df = 13, p-value = 0.1323 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1162891 0.7902891 sample estimates: mean of x mean of y 1.313 0.976
which are similar to those found. The notation in the textbook is opposite as the notation as the notation in Splus:
(-0.76,0.15) and (-0.116,0.790) are almost the opposite of each other.
Comments to: Miguel A. Arcones

Assumptions	Test to use
not necessarily normal populations	Wilcoxon two-sample rank sum test
normal populations (with equal variances)	t test with pooled sample variance