*************************
n_20
N_10000
theta1_c(1:N)
theta2_c(1:N)
for(i in 1:N)
{
x1_rnorm(n)
theta1[i]_mean(x1)
theta2[i]_median(x1)
}
hat.theta.1_mean(theta1)
hat.theta.2_mean(theta2)
mse.theta.1_mean(theta1**2)
mse.theta.2_mean(theta2**2)
eff_mse.theta.1/mse.theta.2
*************************
> hat.mean
[1] -0.001049796
> hat.median
[1] -0.0053298
> mse.mean
[1] 0.05070676
> mse.median
[1] 0.07352595
> eff
[1] 0.6896445
*************************
As expected, hat.mean and hat.median are estimating the bias of these estimators. The bias is zero.
The mean square error of the mean is 1/n=1/20=0.05. We get 0.05070676. The obtained efficiency is
0.6896445, which is close to 0.64.
However, for contaminated data, the sample median is more accurate. For example, for a 5 % contaminated data, we have the following:
*************************
n_19
n2_1
N_10000
theta1_c(1:N)
theta2_c(1:N)
for(i in 1:N)
{
x1_rnorm(n)
y1_rnorm(n2,mean=5)
z1_append(x1,y1)
theta1[i]_mean(z1)
theta2[i]_median(z1)
mse1[i]_theta1[i]**2
mse2[i]_theta2[i]**2
}
hat.mean_mean(theta1)
hat.median_mean(theta2)
mse.mean_mean(mse1)
mse.median_mean(mse2)
eff_mse.mean/mse.median
*************************
> hat.mean
[1] 0.2525606
> hat.median
[1] 0.06930347
> mse.mean
[1] 0.1141752
> mse.median
[1] 0.08155202
> eff
[1] 1.400029
*************************
The sample median is more stable to the outlier. Contaminating the data
with one observation, the sample median almost does not change much.
However, the mean does. The mean is close to 5/20=.25. The median is close to
qnorm(10/19)=0.06601181. Now, the sample median is more efficient.
It is possible to see that the continuity correction approximation in the
central limit theorem works better than the usual approximation:
*************************
n_10
p_.3
x1_c(0:n)
p1_pbinom(x1,n, p)
y1_(x1-n*p)/sqrt(n*p*(1-p))
p2_pnorm(y1)
z1_(x1+.5-n*p)/sqrt(n*p*(1-p))
p3_pnorm(z1)
names
pro_matrix(c(p1,p2,p3),ncol=3)
dimnames(pro)_list(c(0:n),c("binom probab","norm approx","cont correc"),)
d1_sqrt(mean((p1-p2)**2))
d2_sqrt(mean((p1-p3)**2))
***************
> pro
binom probab norm approx cont correc
0 0.02824752 0.01921697 0.04224897
1 0.14930835 0.08377314 0.15031149
2 0.38278279 0.24507648 0.36503486
3 0.64961072 0.50000000 0.63496514
4 0.84973167 0.75492352 0.84968851
5 0.95265101 0.91622686 0.95775103
6 0.98940792 0.98078303 0.99213735
7 0.99840961 0.99711225 0.99904955
8 0.99985631 0.99972005 0.99992629
9 0.99999410 0.99998266 0.99999636
10 1.00000000 0.99999932 0.99999989
> d1
[1] 0.07142332
> d2
[1] 0.008314308
*************************
We have that the sum of the distances of the true probabilities and the normal
approximation is 0.07142332. However, under continuity correction,
this sum of distances is much smaller:
0.008314308.