> x_c(-1.7,-1.3,-5.0,0.2,6.0,1.5,2.3,2.7) > mean(x) [1] 0.5875 > var(x) [1] 11.09839 > var(x,unbiased=T) [1] 11.09839 > var(x,unbiased=F) [1] 9.711094 > stdev(x) [1] 3.331425These are the commands for finding quantiles of common distributions:
> qnorm(0.95) [1] 1.644854 > qt(0.95,df=10) [1] 1.812461 > help(qchi) > qchisq(0.95,df=10) [1] 18.30704Splus has build up functions to find different confidence intervals:
> x_c(-1.7,-1.3,-5.0,0.2,6.0,1.5,2.3,2.7)
> y_c(12,4,5,12,6,7,2,3,4,5,2)
> t.test(x)
One-sample t-Test
data: x
t = 0.4988, df = 7, p-value = 0.6332
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-2.197641 3.372641
sample estimates:
mean of x
0.5875
> y_c(12,4,5,12,6,7,2,3,4,5,2)
> t.test(y)
One-sample t-Test
data: y
t = 5.3401, df = 10, p-value = 0.0003
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
3.284595 7.988132
sample estimates:
mean of x
5.636364
> t.test(x,y)
Standard Two-Sample t-Test
data: x and y
t = -3.166, df = 17, p-value = 0.0056
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.413395 -1.684332
sample estimates:
mean of x mean of y
0.5875 5.636364
> t.test(x,y,var.equal=F)
Welch Modified Two-Sample t-Test
data: x and y
t = -3.1923, df = 15.679, p-value = 0.0058
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.407220 -1.690507
sample estimates:
mean of x mean of y
0.5875 5.636364
In this hand out, we see how the t confidence intervals do not have the required level for non normal distributions. To do we do simulations and find the proportion of confidence intervals containing the mean. This proportion should be roughly 1-alpha. The following program generates N=10,000 confidence intervals for the mean. Each confidence interval is obtained by taking a random sample of size 10 from a N(1,3) distribution. The proportion of confidence interval containing the true mean is stored in the object "prop". prop should be close to 1-alpha.
************6b************
# This program simulates confidence intervals for sigma^2
n_10
N_5000
k_99
mu_0
sigma_1
rm(con1,con2,conn1,conn2)
talpha_qt((1-alpha/2),df=n-1)
chi1_qchisq(1-alpha/2,df=n-1)
chi2_qchisq(alpha/2,df=n-1)
chii1_rep(1,k)
chii2_chii1
for (j in 1:k)
{
chii1[j]_qchisq(1-alpha+(alpha*j/(k+1)),df=n-1)
chii2[j]_qchisq(alpha*j/(k+1),df=n-1)
}
con1_rep(1,N)
con2_rep(1,N)
conn1_rep(1,N*k)
conn1_matrix(conn1,ncol=k,byrow=T)
conn2_conn1
for (i in 1:N)
{
x1_rnorm(n,mean=mu, sd=sigma)
con1[i]_(n-1)*var(x1)/chi1
con2[i]_(n-1)*var(x1)/chi2
for (j in 1:k)
{
conn1[i,j]_(n-1)*var(x1)/chii1[j]
conn2[i,j]_(n-1)*var(x1)/chii2[j]
}
print(i)
}
lengs_(con2-con1)
con_cbind(con1,con2,(con1< sigma**2)*(sigma**2< con2))
prop_mean((con1< sigma**2)*(sigma**2< con2))
leng_mean(lengs)
print(c(prop,leng))
propp_rep(1,k)
lengg_rep(1,k)
for (j in 1:k)
{
propp[j]_mean((conn1[,j]< sigma**2)*(sigma**2< conn2[,j]))
lengg[j]_mean(conn2[,j]-conn1[,j])
print(c(j,propp[j],lengg[j]))
}
If we run this program, we get
[1] 1.000000 0.945800 8.718593 [1] 2.000000 0.945200 7.272385 [1] 3.00000 0.94500 6.52363 [1] 4.000000 0.945800 6.031682 [1] 5.000000 0.945800 5.671128 [1] 6.000000 0.946200 5.389518 [1] 7.000000 0.946800 5.160185 [1] 8.000000 0.946200 4.967816 [1] 9.00000 0.94560 4.80285 [1] 10.000000 0.946200 4.658946 [1] 11.000000 0.946800 4.531688 [1] 12.000000 0.946600 4.417892 [1] 13.000000 0.946800 4.315186 [1] 14.000000 0.947000 4.221761 [1] 15.000000 0.946800 4.136207 [1] 16.000000 0.946800 4.057406 [1] 17.000000 0.946800 3.984456 [1] 18.000000 0.947000 3.916621 [1] 19.000000 0.947400 3.853291 [1] 20.000000 0.947000 3.793958 [1] 21.000000 0.946600 3.738193 [1] 22.000000 0.946600 3.685629 [1] 23.000000 0.946200 3.635954 [1] 24.000000 0.946000 3.588898 [1] 25.000000 0.946200 3.544225 [1] 26.000000 0.946000 3.501729 [1] 27.000000 0.946000 3.461231 [1] 28.000000 0.944800 3.422571 [1] 29.000000 0.945400 3.385607 [1] 30.000000 0.944600 3.350214 [1] 31.000000 0.945800 3.316277 [1] 32.000000 0.945800 3.283697 [1] 33.000000 0.946200 3.252382 [1] 34.000000 0.945600 3.222249 [1] 35.000000 0.945600 3.193224 [1] 36.000000 0.945200 3.165239 [1] 37.000000 0.945000 3.138231 [1] 38.000000 0.944400 3.112145 [1] 39.000000 0.944400 3.086928 [1] 40.000000 0.943800 3.062533 [1] 41.000000 0.943600 3.038916 [1] 42.000000 0.943400 3.016037 [1] 43.000000 0.944000 2.993858 [1] 44.000000 0.943600 2.972345 [1] 45.000000 0.944000 2.951466 [1] 46.000000 0.943600 2.931192 [1] 47.000000 0.944000 2.911495 [1] 48.00000 0.94480 2.89235 [1] 49.000000 0.944800 2.873732 [1] 50.000000 0.944800 2.855619 [1] 51.000000 0.944800 2.837992 [1] 52.00000 0.94500 2.82083 [1] 53.000000 0.944400 2.804115 [1] 54.00000 0.94460 2.78783 [1] 55.00000 0.94520 2.77196 [1] 56.00000 0.94500 2.75649 [1] 57.000000 0.945000 2.741406 [1] 58.000000 0.944800 2.726695 [1] 59.000000 0.945000 2.712344 [1] 60.000000 0.944600 2.698343 [1] 61.00000 0.94420 2.68468 [1] 62.000000 0.944400 2.671346 [1] 63.000000 0.945200 2.658332 [1] 64.000000 0.945800 2.645629 [1] 65.000000 0.946400 2.633228 [1] 66.000000 0.946800 2.621123 [1] 67.000000 0.946800 2.609306 [1] 68.000000 0.946200 2.597772 [1] 69.000000 0.947000 2.586514 [1] 70.000000 0.947000 2.575528 [1] 71.000000 0.947200 2.564808 [1] 72.000000 0.946800 2.554352 [1] 73.000000 0.947800 2.544154 [1] 74.000000 0.947800 2.534214 [1] 75.000000 0.947800 2.524527 [1] 76.000000 0.948400 2.515094 [1] 77.000000 0.948000 2.505913 [1] 78.000000 0.948400 2.496983 [1] 79.000000 0.947800 2.488307 [1] 80.000000 0.948200 2.479886 [1] 81.000000 0.949400 2.471724 [1] 82.000000 0.948800 2.463824 [1] 83.000000 0.949400 2.456193 [1] 84.000000 0.949400 2.448839 [1] 85.000000 0.949200 2.441772 [1] 86.000000 0.949000 2.435006 [1] 87.000000 0.948400 2.428557 [1] 88.000000 0.947800 2.422446 [1] 89.000000 0.948400 2.416703 [1] 90.000000 0.948000 2.411362 [1] 91.000000 0.948000 2.406473 [1] 92.000000 0.948200 2.402099 [1] 93.00000 0.94820 2.39833 [1] 94.000000 0.948800 2.395298 [1] 95.000000 0.948200 2.393198 [1] 96.000000 0.948400 2.392354 [1] 97.000000 0.948600 2.393349 [1] 98.00000 0.94900 2.39743 [1] 99.000000 0.948600 2.408301
> j_96 > 1-alpha+(alpha*j/(k+1)) [1] 0.998 > alpha*j/(k+1) [1] 0.048
> x_c(-1.7,-1.3,-5.0,0.2,6.0,1.5,2.3,2.7)
> y_c(12,4,5,12,6,7,2,3,4,5,2)
> t.test(x)
One-sample t-Test
data: x
t = 0.4988, df = 7, p-value = 0.6332
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-2.197641 3.372641
sample estimates:
mean of x
0.5875
> y_c(12,4,5,12,6,7,2,3,4,5,2)
> t.test(y)
One-sample t-Test
data: y
t = 5.3401, df = 10, p-value = 0.0003
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
3.284595 7.988132
sample estimates:
mean of x
5.636364
> t.test(x,y)
Standard Two-Sample t-Test
data: x and y
t = -3.166, df = 17, p-value = 0.0056
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.413395 -1.684332
sample estimates:
mean of x mean of y
0.5875 5.636364
> t.test(x,y,var.equal=F)
Welch Modified Two-Sample t-Test
data: x and y
t = -3.1923, df = 15.679, p-value = 0.0058
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.407220 -1.690507
sample estimates:
mean of x mean of y
0.5875 5.636364
> t.test(x,y,paired=T)
In this hand out, we see how the t confidence intervals do not have the required level for non normal distributions. To do we do simulations and find the proportion of confidence intervals containing the mean. This proportion should be roughly 1-alpha. The following program generates N=10,000 confidence intervals for the mean. Each confidence interval is obtained by taking a random sample of size 10 from a N(1,3) distribution. The proportion of confidence interval containing the true mean is stored in the object "prop". prop should be close to 1-alpha.
************6a************
# The following program finds N confidence intervals
# for the mean using the
# t-distribution and the normal approximation
# using the unbiased and the biased sample variance.
# Each sample is taken from a normal distribution
# with mean 0 and standard deviation one. The sample size is n=10.
# prop.t= proportion of intervals containing the true mean
# leng.t= average length of these confidence intervals
# The ending .nu is using the normal approximation
# and the unbiased sample variance.
# The ending .nb is using the normal approximation
# and the biased sample variance
n_10
N_10000
mu_0
sigma_1
alpha_0.05
rm(con1,con2,con)
talpha_qt((1-alpha/2),df=n-1)
zalpha_qnorm(1-alpha/2)
con.t1_rep(1,N)
con.t2_rep(1,N)
con.nu1_rep(1,N)
con.nu2_rep(1,N)
con.nb1_rep(1,N)
con.nb2_rep(1,N)
for (i in 1:N)
{
x1_rnorm(n,mean=mu, sd=sigma)
con.t1[i]_mean(x1)-talpha*(stdev(x1))/sqrt(n)
con.t2[i]_mean(x1)+talpha*(stdev(x1))/sqrt(n)
con.nu1[i]_mean(x1)-zalpha*(stdev(x1))/sqrt(n)
con.nu2[i]_mean(x1)+zalpha*(stdev(x1))/sqrt(n)
s2_sqrt(var(x1,unbia=F))
con.nb1[i]_mean(x1)-zalpha*s2/sqrt(n)
con.nb2[i]_mean(x1)+zalpha*s2/sqrt(n)
print(i)
}
lengs.t_(con.t2-con.t1)
con.t_cbind(con.t1,con.t2,(con.t1
If we run this program, we get that approximately 95 %
of the confidence intervals contain the true mean:
> lengs.t <- (con.t2 - con.t1)
> con.t <- cbind(con.t1, con.t2, (con.t1 < mu) * (mu <
con.t2))
> prop.t <- mean((con.t1 < mu) * (mu < con.t2))
> leng.t <- mean(lengs.t)
> prop.t
[1] 0.9545
> leng.t
[1] 1.391343
> lengs.nu <- (con.nu2 - con.nu1)
> con.nu <- cbind(con.nu1, con.t2,(con.nu1 < mu)*(mu prop.nu <- mean((con.nu1 < mu) * (mu < con.nu2))
> leng.nu <- mean(lengs.nu)
> prop.nu
[1] 0.922
> leng.nu
[1] 1.205479
> lengs.nb <- (con.nb2 - con.nb1)
> con.nb <- cbind(con.nb1, con.nb2, (con.nb1 < mu) * (mu <
con.nb2))
> prop.nb <- mean((con.nb1 < mu) * (mu < con.nb2))
> leng.nb <- mean(lengs.nb)
> prop.nb
[1] 0.9077
> leng.nb
[1] 1.143618
Assume that we observe the number of successes x in n independent trials having
the same probability of success p. x has a
binomial distribution with parameters n and p. n in known. We want to make inferences about p.
For example, suppose that we observe x=115 and n=300, and we want to estimate p.
Splus finds the confidence interval for p using the command prop.test():
> prop.test(115,300,conf.level=.95)$conf.int
[1] 0.3285190 0.4411865
attr(, "conf.level"):
[1] 0.95
> prop.test(115,300,conf.level=.95)
1-sample proportions test with continuity correction
data: 115 out of 300, null probability 0.5
X-square = 15.87, df = 1, p-value = 0.0001
alternative hypothesis: true P(success) in Group 1 is not equal to 0.5
95 percent confidence interval:
0.3285190 0.4411865
sample estimates:
prop'n in Group 1
0.3833333
The function prop.test() uses the normal approximation to
the binomial distribution with continuity correction.
We obtain that the confidence interval for p is
(0.3285190, 0.4411865).
However, other methods are possible. The following program finds the
Clopper-Pearson confidence interval
for a proportion p:
**********6e***************
x_115
n_300
alpha_.95
p_x/n
dfr1_2*(n+1-x)
dfr2_2*x
q1_qf(.975,df1=dfr1,df2=dfr2)
c2_(x/(x+((n+1-x)*q1)))
dfr1_2*(x+1)
dfr2_2*(n-x)
q2_qf(.975,df1=dfr1,df2=dfr2)
c2[2]_((x+1)*q2)/(n-x+((x+1)*q2))
*************************
> source("2a")
> conf
[1] 0.3280441 0.4409513
**************
We obtain that the confidence interval for p is (0.3280441,0.4409513).
We can do a binomial test using the command:
Comments to:
Miguel A. Arcones