> x_c(-1.7,-1.3,-5.0,0.2,6.0,1.5,2.3,2.7) > mean(x) [1] 0.5875 > var(x) [1] 11.09839 > var(x,unbiased=T) [1] 11.09839 > var(x,unbiased=F) [1] 9.711094 > stdev(x) [1] 3.331425These are the commands for finding quantiles of common distributions:
> qnorm(0.95) [1] 1.644854 > qt(0.95,df=10) [1] 1.812461 > help(qchi) > qchisq(0.95,df=10) [1] 18.30704Splus has build up functions to find different confidence intervals:
> x_c(-1.7,-1.3,-5.0,0.2,6.0,1.5,2.3,2.7) > y_c(12,4,5,12,6,7,2,3,4,5,2) > t.test(x) One-sample t-Test data: x t = 0.4988, df = 7, p-value = 0.6332 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -2.197641 3.372641 sample estimates: mean of x 0.5875 > y_c(12,4,5,12,6,7,2,3,4,5,2) > t.test(y) One-sample t-Test data: y t = 5.3401, df = 10, p-value = 0.0003 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 3.284595 7.988132 sample estimates: mean of x 5.636364 > t.test(x,y) Standard Two-Sample t-Test data: x and y t = -3.166, df = 17, p-value = 0.0056 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.413395 -1.684332 sample estimates: mean of x mean of y 0.5875 5.636364 > t.test(x,y,var.equal=F) Welch Modified Two-Sample t-Test data: x and y t = -3.1923, df = 15.679, p-value = 0.0058 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.407220 -1.690507 sample estimates: mean of x mean of y 0.5875 5.636364
In this hand out, we see how the t confidence intervals do not have the required level for non normal distributions. To do we do simulations and find the proportion of confidence intervals containing the mean. This proportion should be roughly 1-alpha. The following program generates N=10,000 confidence intervals for the mean. Each confidence interval is obtained by taking a random sample of size 10 from a N(1,3) distribution. The proportion of confidence interval containing the true mean is stored in the object "prop". prop should be close to 1-alpha.
************6b************ # This program simulates confidence intervals for sigma^2 n_10 N_5000 k_99 mu_0 sigma_1 rm(con1,con2,conn1,conn2) talpha_qt((1-alpha/2),df=n-1) chi1_qchisq(1-alpha/2,df=n-1) chi2_qchisq(alpha/2,df=n-1) chii1_rep(1,k) chii2_chii1 for (j in 1:k) { chii1[j]_qchisq(1-alpha+(alpha*j/(k+1)),df=n-1) chii2[j]_qchisq(alpha*j/(k+1),df=n-1) } con1_rep(1,N) con2_rep(1,N) conn1_rep(1,N*k) conn1_matrix(conn1,ncol=k,byrow=T) conn2_conn1 for (i in 1:N) { x1_rnorm(n,mean=mu, sd=sigma) con1[i]_(n-1)*var(x1)/chi1 con2[i]_(n-1)*var(x1)/chi2 for (j in 1:k) { conn1[i,j]_(n-1)*var(x1)/chii1[j] conn2[i,j]_(n-1)*var(x1)/chii2[j] } print(i) } lengs_(con2-con1) con_cbind(con1,con2,(con1< sigma**2)*(sigma**2< con2)) prop_mean((con1< sigma**2)*(sigma**2< con2)) leng_mean(lengs) print(c(prop,leng)) propp_rep(1,k) lengg_rep(1,k) for (j in 1:k) { propp[j]_mean((conn1[,j]< sigma**2)*(sigma**2< conn2[,j])) lengg[j]_mean(conn2[,j]-conn1[,j]) print(c(j,propp[j],lengg[j])) }If we run this program, we get
[1] 1.000000 0.945800 8.718593 [1] 2.000000 0.945200 7.272385 [1] 3.00000 0.94500 6.52363 [1] 4.000000 0.945800 6.031682 [1] 5.000000 0.945800 5.671128 [1] 6.000000 0.946200 5.389518 [1] 7.000000 0.946800 5.160185 [1] 8.000000 0.946200 4.967816 [1] 9.00000 0.94560 4.80285 [1] 10.000000 0.946200 4.658946 [1] 11.000000 0.946800 4.531688 [1] 12.000000 0.946600 4.417892 [1] 13.000000 0.946800 4.315186 [1] 14.000000 0.947000 4.221761 [1] 15.000000 0.946800 4.136207 [1] 16.000000 0.946800 4.057406 [1] 17.000000 0.946800 3.984456 [1] 18.000000 0.947000 3.916621 [1] 19.000000 0.947400 3.853291 [1] 20.000000 0.947000 3.793958 [1] 21.000000 0.946600 3.738193 [1] 22.000000 0.946600 3.685629 [1] 23.000000 0.946200 3.635954 [1] 24.000000 0.946000 3.588898 [1] 25.000000 0.946200 3.544225 [1] 26.000000 0.946000 3.501729 [1] 27.000000 0.946000 3.461231 [1] 28.000000 0.944800 3.422571 [1] 29.000000 0.945400 3.385607 [1] 30.000000 0.944600 3.350214 [1] 31.000000 0.945800 3.316277 [1] 32.000000 0.945800 3.283697 [1] 33.000000 0.946200 3.252382 [1] 34.000000 0.945600 3.222249 [1] 35.000000 0.945600 3.193224 [1] 36.000000 0.945200 3.165239 [1] 37.000000 0.945000 3.138231 [1] 38.000000 0.944400 3.112145 [1] 39.000000 0.944400 3.086928 [1] 40.000000 0.943800 3.062533 [1] 41.000000 0.943600 3.038916 [1] 42.000000 0.943400 3.016037 [1] 43.000000 0.944000 2.993858 [1] 44.000000 0.943600 2.972345 [1] 45.000000 0.944000 2.951466 [1] 46.000000 0.943600 2.931192 [1] 47.000000 0.944000 2.911495 [1] 48.00000 0.94480 2.89235 [1] 49.000000 0.944800 2.873732 [1] 50.000000 0.944800 2.855619 [1] 51.000000 0.944800 2.837992 [1] 52.00000 0.94500 2.82083 [1] 53.000000 0.944400 2.804115 [1] 54.00000 0.94460 2.78783 [1] 55.00000 0.94520 2.77196 [1] 56.00000 0.94500 2.75649 [1] 57.000000 0.945000 2.741406 [1] 58.000000 0.944800 2.726695 [1] 59.000000 0.945000 2.712344 [1] 60.000000 0.944600 2.698343 [1] 61.00000 0.94420 2.68468 [1] 62.000000 0.944400 2.671346 [1] 63.000000 0.945200 2.658332 [1] 64.000000 0.945800 2.645629 [1] 65.000000 0.946400 2.633228 [1] 66.000000 0.946800 2.621123 [1] 67.000000 0.946800 2.609306 [1] 68.000000 0.946200 2.597772 [1] 69.000000 0.947000 2.586514 [1] 70.000000 0.947000 2.575528 [1] 71.000000 0.947200 2.564808 [1] 72.000000 0.946800 2.554352 [1] 73.000000 0.947800 2.544154 [1] 74.000000 0.947800 2.534214 [1] 75.000000 0.947800 2.524527 [1] 76.000000 0.948400 2.515094 [1] 77.000000 0.948000 2.505913 [1] 78.000000 0.948400 2.496983 [1] 79.000000 0.947800 2.488307 [1] 80.000000 0.948200 2.479886 [1] 81.000000 0.949400 2.471724 [1] 82.000000 0.948800 2.463824 [1] 83.000000 0.949400 2.456193 [1] 84.000000 0.949400 2.448839 [1] 85.000000 0.949200 2.441772 [1] 86.000000 0.949000 2.435006 [1] 87.000000 0.948400 2.428557 [1] 88.000000 0.947800 2.422446 [1] 89.000000 0.948400 2.416703 [1] 90.000000 0.948000 2.411362 [1] 91.000000 0.948000 2.406473 [1] 92.000000 0.948200 2.402099 [1] 93.00000 0.94820 2.39833 [1] 94.000000 0.948800 2.395298 [1] 95.000000 0.948200 2.393198 [1] 96.000000 0.948400 2.392354 [1] 97.000000 0.948600 2.393349 [1] 98.00000 0.94900 2.39743 [1] 99.000000 0.948600 2.408301
> j_96 > 1-alpha+(alpha*j/(k+1)) [1] 0.998 > alpha*j/(k+1) [1] 0.048
> x_c(-1.7,-1.3,-5.0,0.2,6.0,1.5,2.3,2.7) > y_c(12,4,5,12,6,7,2,3,4,5,2) > t.test(x) One-sample t-Test data: x t = 0.4988, df = 7, p-value = 0.6332 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -2.197641 3.372641 sample estimates: mean of x 0.5875 > y_c(12,4,5,12,6,7,2,3,4,5,2) > t.test(y) One-sample t-Test data: y t = 5.3401, df = 10, p-value = 0.0003 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 3.284595 7.988132 sample estimates: mean of x 5.636364 > t.test(x,y) Standard Two-Sample t-Test data: x and y t = -3.166, df = 17, p-value = 0.0056 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.413395 -1.684332 sample estimates: mean of x mean of y 0.5875 5.636364 > t.test(x,y,var.equal=F) Welch Modified Two-Sample t-Test data: x and y t = -3.1923, df = 15.679, p-value = 0.0058 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.407220 -1.690507 sample estimates: mean of x mean of y 0.5875 5.636364 > t.test(x,y,paired=T)
In this hand out, we see how the t confidence intervals do not have the required level for non normal distributions. To do we do simulations and find the proportion of confidence intervals containing the mean. This proportion should be roughly 1-alpha. The following program generates N=10,000 confidence intervals for the mean. Each confidence interval is obtained by taking a random sample of size 10 from a N(1,3) distribution. The proportion of confidence interval containing the true mean is stored in the object "prop". prop should be close to 1-alpha.
************6a************ # The following program finds N confidence intervals # for the mean using the # t-distribution and the normal approximation # using the unbiased and the biased sample variance. # Each sample is taken from a normal distribution # with mean 0 and standard deviation one. The sample size is n=10. # prop.t= proportion of intervals containing the true mean # leng.t= average length of these confidence intervals # The ending .nu is using the normal approximation # and the unbiased sample variance. # The ending .nb is using the normal approximation # and the biased sample variance n_10 N_10000 mu_0 sigma_1 alpha_0.05 rm(con1,con2,con) talpha_qt((1-alpha/2),df=n-1) zalpha_qnorm(1-alpha/2) con.t1_rep(1,N) con.t2_rep(1,N) con.nu1_rep(1,N) con.nu2_rep(1,N) con.nb1_rep(1,N) con.nb2_rep(1,N) for (i in 1:N) { x1_rnorm(n,mean=mu, sd=sigma) con.t1[i]_mean(x1)-talpha*(stdev(x1))/sqrt(n) con.t2[i]_mean(x1)+talpha*(stdev(x1))/sqrt(n) con.nu1[i]_mean(x1)-zalpha*(stdev(x1))/sqrt(n) con.nu2[i]_mean(x1)+zalpha*(stdev(x1))/sqrt(n) s2_sqrt(var(x1,unbia=F)) con.nb1[i]_mean(x1)-zalpha*s2/sqrt(n) con.nb2[i]_mean(x1)+zalpha*s2/sqrt(n) print(i) } lengs.t_(con.t2-con.t1) con.t_cbind(con.t1,con.t2,(con.t1If we run this program, we get that approximately 95 % of the confidence intervals contain the true mean: > lengs.t <- (con.t2 - con.t1) > con.t <- cbind(con.t1, con.t2, (con.t1 < mu) * (mu < con.t2)) > prop.t <- mean((con.t1 < mu) * (mu < con.t2)) > leng.t <- mean(lengs.t) > prop.t [1] 0.9545 > leng.t [1] 1.391343 > lengs.nu <- (con.nu2 - con.nu1) > con.nu <- cbind(con.nu1, con.t2,(con.nu1 < mu)*(muprop.nu <- mean((con.nu1 < mu) * (mu < con.nu2)) > leng.nu <- mean(lengs.nu) > prop.nu [1] 0.922 > leng.nu [1] 1.205479 > lengs.nb <- (con.nb2 - con.nb1) > con.nb <- cbind(con.nb1, con.nb2, (con.nb1 < mu) * (mu < con.nb2)) > prop.nb <- mean((con.nb1 < mu) * (mu < con.nb2)) > leng.nb <- mean(lengs.nb) > prop.nb [1] 0.9077 > leng.nb [1] 1.143618 Assume that we observe the number of successes x in n independent trials having the same probability of success p. x has a binomial distribution with parameters n and p. n in known. We want to make inferences about p.
For example, suppose that we observe x=115 and n=300, and we want to estimate p. Splus finds the confidence interval for p using the command prop.test():
> prop.test(115,300,conf.level=.95)$conf.int [1] 0.3285190 0.4411865 attr(, "conf.level"): [1] 0.95 > prop.test(115,300,conf.level=.95) 1-sample proportions test with continuity correction data: 115 out of 300, null probability 0.5 X-square = 15.87, df = 1, p-value = 0.0001 alternative hypothesis: true P(success) in Group 1 is not equal to 0.5 95 percent confidence interval: 0.3285190 0.4411865 sample estimates: prop'n in Group 1 0.3833333The function prop.test() uses the normal approximation to the binomial distribution with continuity correction. We obtain that the confidence interval for p is (0.3285190, 0.4411865).However, other methods are possible. The following program finds the Clopper-Pearson confidence interval for a proportion p:
**********6e*************** x_115 n_300 alpha_.95 p_x/n dfr1_2*(n+1-x) dfr2_2*x q1_qf(.975,df1=dfr1,df2=dfr2) c2_(x/(x+((n+1-x)*q1))) dfr1_2*(x+1) dfr2_2*(n-x) q2_qf(.975,df1=dfr1,df2=dfr2) c2[2]_((x+1)*q2)/(n-x+((x+1)*q2)) ************************* > source("2a") > conf [1] 0.3280441 0.4409513 **************We obtain that the confidence interval for p is (0.3280441,0.4409513).We can do a binomial test using the command:
Comments to: Miguel A. Arcones
![]()