Section 7.1.



In the previous chapter, we did confidence intervals and hypothesis 
testing for known distributions, such as binomial and normal.
The statistical methods in chapter 6 are known as parametric methods.
We assumed that we have a distribution of a determined type 
and with an unknown parameter. For example, in the estimation of the
mean, we assume a normal distribution with unknown mean and variance.
The methods in Chapter 6 are unaccurate if the actual distribution 
is far from begin normal.  If the sample size is not very large, we 
cannot discern whether we have a normal distribution.
We may have a large sample size, but a distribution which do 
not look normal. Because of these problems, we need statistical
methods which apply to a large collection of distributions. 
In this chapter, we present statistical methods which apply to a huge
collection of distributions.
The methods in this chapter can be used for distributions 
which are far from being normal.
The statistical methods in this chapter are called nonparametric 
methods. We do not need to assume a parametric model.

We see two type of methods; bootstrap and classical 
nonparametric methods.

The withdraw of the methods in this chapter is that for normal
distributions the methods in Chapter 6 are more efficient.

************************************
In this chapter, we will use macros a lot.
A macro is a series of minitab commands stored in the file 
ending in .MTB. This series of commands whould finish in "end".
To create a macro you should use a text editor.
To run the macro, you can do:

MTB > Execute 'C:\MTBSEW\ISTAT\macro-name.MTB' 1000.
Executing from file: C:\MTBSEW\ISTAT\macro-name.MTB

Here, "macro-name.MTB" is just an arbitrary name.

In the previous case, we ran the macro 1000 times.

We also can run a macro, using the windows menu: 
file>other files>run an exec 

There are three types of macro; execs files, global macros 
and local macros. We just use execs macro. A macro can call another
macro. 
********************************
We will use the command "stack". This command stacks blocks of 
columns and constants on top of each other. The last column specified 
in the STACK command is the target column.

********Example 1*************
We want to the test the proportion p of defective plates 
produced by a manufacturer. 
We want to test:Ho:p<=.1 versus H1:p>.1
p is the proportion of defective plates produced by a manufacturer.
We plan to check 50 plates, 
we want to know many plates should we get to reject the null hypothesis. 

The standard method is to use .95 quantiles from the binomial 
distribution with parameters n=50 and p=0.10.

MTB > invcdf .95;
SUBC> bino 50 .1.
       K  P(X LESS OR = K)       K  P(X LESS OR = K)
       8           0.9421        9           0.9755

So, we will reject Ho if X=number of defective plates in the
 sample is 9 or bigger.

Next, we will see another method to do that. First we create 
a empirical  reference distribution:
MTB > set c1
DATA> 90(0) 10(1)
DATA> end

We use 100, but we could have used any other large number.
Then, we create the following macro 
********file 7-1A.MTB******************************
sample 50 c1 c2;
replace.
let k1=mean(c2)
stack c3 k1 c3
end
*********************************************
Next, we run the macro:

MTB > Execute 'C:\MTBSEW\ISTAT\7-1A.MTB' 1000.

Executing from file: C:\MTBSEW\ISTAT\7-1A.MTB

We got the distribution of the simulations in C3.
We can find the 95 % quantile of this distribution, i.e.
X(.95(n+1))=X(950.95)=.05X(950)+.95X(951). We do
MTB > sort c3 c4
MTB > let k1=(.05*c4(950))+(.95*c4(951))
MTB > print k1K1       0.160000
We get that 95 % quantile of  the empirical reference distribution 
is .16. We reject the null hypothesis if the proportion of defectives 
plates found in a sample of n=50 is bigger than .16.This means that
we reject Ho if the number of defective plates is bigger than 
50(0.16)=8, i.e, if get 9 or more defective plates.
the sample is bigger than 50(.16)=8, i.e, if get 9 or more
defective plates.

This is an illustrative example. Since we know that the distribution
of X is binomial and this distribution can be found, it is better 
to use the qunatiles from the binomial distribution.

***********Example 2*****************************
Similarly, we can do the almpin case, in this case. 
We are interested in testing
Ho:mu>=60.1 versus H1:mu<60.1
So, we reject Ho if x-bar is small enough.

MTB > Retrieve  'C:\ISTAT\ALMPIN.MTW'.
Retrieving worksheet from file: C:\ISTAT\ALMPIN.MTW

Since x-bar=60.028 , we recentered the data:
MTB > let c7=c7-60.028+60.1
we run the following exec file 1000 times
*********************************************
sample 70 c7 c8;
replace.
let k1=mean(c8)
stack c9 k1 c9
end
********************************************         
MTB > Execute 'C:\ISTAT\7-1B.MTB' 1000.
Executing from file: C:\MTBSEW\ISTAT\7-1B.MTB

The, we find the 1% quantile, which is X(.01(n+1))=X(10.01),
MTB > sort c9 c10
MTB > let k1=(.01*c10(10))+(.99*c10(11))
MTB > print k1
K1       60.0864

We reject the null hypothesis is x-bar<60.0864
Since x-bar=60.028, we reject the null hypothesis.
The usual statistical method rejects Ho for 
x-bar<60.1-s z(0.01)/sqrt(n)= 60.0862

MTB > invcdf .01 k2
MTB > let k3=stdev(c7)
MTB > let k4=60.1+k2*k3/sqrt(70)
MTB > print k4

K4       60.0866

We got almost the same number.

In this case, the classical method (normal approximation) uses 
the central limit theorem. This is an approximation, which could be 
a bad approximation for a distribution which is far from normal.
It makes more sense to use the method using macros (simulations).

********Example 3******************
We can simulate confidence intervals.
Consider the macro CONFINT.MTB which is in
C:\ISTAT\CHA7\CONFINT.MTB

*****confint.mtb**********
sample k1 c1 c2;
replace.
let k2=mean(c2)-2*stan(c2)/sqrt(k1)
let k3=mean(c2)+2*stan(c2)/sqrt(k1)
stack c3 k2 c3
stack c4 k3 c4
end
***************

We can find 100 confidence intervals for the mean of Tur_Diam
using samples of size n=20, by doing

MTB > let c1=c3
MTB > erase c2-c10
MTB > let k1=20
MTB > Execute 'C:\ISTAT\CHA7\CONFINT.MTB' 100.
The 100 confidence intervals are in c3 c4.
The lower point of the confidence interval is in c3.
The upper point of the confidence interval is in c3.

We can find how many confidence intervals cover the sample mean:

MTB > mean(c1)
   MEAN    =      35.514
MTB > code (0:35.514)1 (35.514:100)0 c3  c5.
MTB > code (0:35.514)0 (35.514:100)1 c4  c6.
MTB > let c7=c5*c6
MTB > mean c7
   MEAN    =     0.90000

We get that 90 of the 100 confidence intervals cover 
mean(c1)=35.514. Since, we are using simulations, you may get 
something else.
Section 7.1.

Comments to: Miguel A. Arcones