Assumptions | Test to use |
---|---|
not necessarily normal populations | Wilcoxon two-sample rank sum test |
normal populations (with equal variances) | t test with pooled sample variance |
To do a Wilcoxon rank sum test for paired data, we can do
> x_c(.8,.83,1.89,1.04,1.45,1.38,1.91,1.64,0.73,1.46) > y_c(1.15,.88,.9,.74,1.21) > wilcox.test(x,y,alternative="greater") Exact Wilcoxon rank-sum test data: x and y rank-sum statistic W = 90, n = 10, m = 5, p-value = 0.1272 alternative hypothesis: true mu is greater than 0I have used the data from Example 4.1 in the textbook. In Splus, the alternative="greater" means that the location of x is greater than that of y. This is the opposite of the textbook.
Splus do not find the distribution of the null distribution of the Wilcoxon rank sum statistic. We can estimate it by doing simulations (or we can look at Table A.6 in the textbook):
**********4a******* n_6 m_8 nm_n+m N_10000 rm(xb,y,ind,xb) xb_c(1:N) ind <-c(1:nm) for(i in 1:N){ y <- sample(ind,n,replace=F) xb[i]<-sum(y) } xb_sort(xb) a1_rle(xb)$values a2_rle(xb)$lengths a2_a2/N a3_cumsum(a2) a4_1-a3 a_cbind(a1,a2,a3,a4) dimnames(a)_list(c(NULL),c("value","probab.","cdf","1-cdf")) *******************The following program finds the Hodges-Lehamnn estimator of Delta and a confidence interval for Delta, using normal approximations.
*****4b********* m_length(x) n_length(y) alpha_.05 rm(differ,calpha1,calpha2,con1,con2,conf) differ_c(y[1]-x[1]) for(i in 1:m){ for(j in 1:n){ differ_append(differ,y[j]-x[i]) } } differ_differ[-1] k_median(differ) zalpha_qnorm(.975) calpha1_(((m*n)/2)-zalpha*sqrt(m*n*(m+n+1)/12)) calpha1_round(calpha1) calpha2_((m*n)+1-calpha1) diff_sort(differ) con1_diff[calpha1] con2_diff[calpha2] conf_c(con1,con2) ************ > source("4b") > k [1] -0.305 > conf [1] -0.76 0.15 > calpha1 [1] 9 > calpha2 [1] 42We hat-Delta=-0.305 and the confidence interval for delta is (-0.76,0.15)
The usual t-statistic and t-confidence intervals are:
> t.test(x,y,alternative="greater") Standard Two-Sample t-Test data: x and y t = 1.6061, df = 13, p-value = 0.0661 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: -0.03457769 NA sample estimates: mean of x mean of y 1.313 0.976 > t.test(x,y,conf.level=.95) Standard Two-Sample t-Test data: x and y t = 1.6061, df = 13, p-value = 0.1323 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1162891 0.7902891 sample estimates: mean of x mean of y 1.313 0.976which are similar to those found. The notation in the textbook is opposite as the notation as the notation in Splus:
(-0.76,0.15) and (-0.116,0.790) are almost the opposite of each other.