Section 7.1.


In the previous chapter, we did confidence intervals and hypothesis testing for known distributions, such as binomial and normal. The statistical methods in chapter 6 are known as parametric methods. We assumed that we have a distribution of a determined type and with an unknown parameter. For example, in the estimation of the mean, we assume a normal distribution with unknown mean and variance. The methods in Chapter 6 are unaccurate if the actual distribution is far from begin normal. If the sample size is not very large, we cannot discern whether we have a normal distribution. We may have a large sample size, but a distribution which do not look normal. Because of these problems, we need statistical methods which apply to a large collection of distributions. In this chapter, we present statistical methods which apply to a huge collection of distributions. The methods in this chapter can be used for distributions which are far from being normal. The statistical methods in this chapter are called nonparametric methods. We do not need to assume a parametric model. We see two type of methods; bootstrap and classical nonparametric methods. The withdraw of the methods in this chapter is that for normal distributions the methods in Chapter 6 are more efficient. ************************************ In this chapter, we will use macros a lot. A macro is a series of minitab commands stored in the file ending in .MTB. This series of commands whould finish in "end". To create a macro you should use a text editor. To run the macro, you can do: MTB > Execute 'C:\MTBSEW\ISTAT\macro-name.MTB' 1000. Executing from file: C:\MTBSEW\ISTAT\macro-name.MTB Here, "macro-name.MTB" is just an arbitrary name. In the previous case, we ran the macro 1000 times. We also can run a macro, using the windows menu: file>other files>run an exec There are three types of macro; execs files, global macros and local macros. We just use execs macro. A macro can call another macro. ******************************** We will use the command "stack". This command stacks blocks of columns and constants on top of each other. The last column specified in the STACK command is the target column. ********Example 1************* We want to the test the proportion p of defective plates produced by a manufacturer. We want to test:Ho:p<=.1 versus H1:p>.1 p is the proportion of defective plates produced by a manufacturer. We plan to check 50 plates, we want to know many plates should we get to reject the null hypothesis. The standard method is to use .95 quantiles from the binomial distribution with parameters n=50 and p=0.10. MTB > invcdf .95; SUBC> bino 50 .1. K P(X LESS OR = K) K P(X LESS OR = K) 8 0.9421 9 0.9755 So, we will reject Ho if X=number of defective plates in the sample is 9 or bigger. Next, we will see another method to do that. First we create a empirical reference distribution: MTB > set c1 DATA> 90(0) 10(1) DATA> end We use 100, but we could have used any other large number. Then, we create the following macro ********file 7-1A.MTB****************************** sample 50 c1 c2; replace. let k1=mean(c2) stack c3 k1 c3 end ********************************************* Next, we run the macro: MTB > Execute 'C:\MTBSEW\ISTAT\7-1A.MTB' 1000. Executing from file: C:\MTBSEW\ISTAT\7-1A.MTB We got the distribution of the simulations in C3. We can find the 95 % quantile of this distribution, i.e. X(.95(n+1))=X(950.95)=.05X(950)+.95X(951). We do MTB > sort c3 c4 MTB > let k1=(.05*c4(950))+(.95*c4(951)) MTB > print k1K1 0.160000 We get that 95 % quantile of the empirical reference distribution is .16. We reject the null hypothesis if the proportion of defectives plates found in a sample of n=50 is bigger than .16.This means that we reject Ho if the number of defective plates is bigger than 50(0.16)=8, i.e, if get 9 or more defective plates. the sample is bigger than 50(.16)=8, i.e, if get 9 or more defective plates. This is an illustrative example. Since we know that the distribution of X is binomial and this distribution can be found, it is better to use the qunatiles from the binomial distribution. ***********Example 2***************************** Similarly, we can do the almpin case, in this case. We are interested in testing Ho:mu>=60.1 versus H1:mu<60.1 So, we reject Ho if x-bar is small enough. MTB > Retrieve 'C:\ISTAT\ALMPIN.MTW'. Retrieving worksheet from file: C:\ISTAT\ALMPIN.MTW Since x-bar=60.028 , we recentered the data: MTB > let c7=c7-60.028+60.1 we run the following exec file 1000 times ********************************************* sample 70 c7 c8; replace. let k1=mean(c8) stack c9 k1 c9 end ******************************************** MTB > Execute 'C:\ISTAT\7-1B.MTB' 1000. Executing from file: C:\MTBSEW\ISTAT\7-1B.MTB The, we find the 1% quantile, which is X(.01(n+1))=X(10.01), MTB > sort c9 c10 MTB > let k1=(.01*c10(10))+(.99*c10(11)) MTB > print k1 K1 60.0864 We reject the null hypothesis is x-bar<60.0864 Since x-bar=60.028, we reject the null hypothesis. The usual statistical method rejects Ho for x-bar<60.1-s z(0.01)/sqrt(n)= 60.0862 MTB > invcdf .01 k2 MTB > let k3=stdev(c7) MTB > let k4=60.1+k2*k3/sqrt(70) MTB > print k4 K4 60.0866 We got almost the same number. In this case, the classical method (normal approximation) uses the central limit theorem. This is an approximation, which could be a bad approximation for a distribution which is far from normal. It makes more sense to use the method using macros (simulations). ********Example 3****************** We can simulate confidence intervals. Consider the macro CONFINT.MTB which is in C:\ISTAT\CHA7\CONFINT.MTB *****confint.mtb********** sample k1 c1 c2; replace. let k2=mean(c2)-2*stan(c2)/sqrt(k1) let k3=mean(c2)+2*stan(c2)/sqrt(k1) stack c3 k2 c3 stack c4 k3 c4 end *************** We can find 100 confidence intervals for the mean of Tur_Diam using samples of size n=20, by doing MTB > let c1=c3 MTB > erase c2-c10 MTB > let k1=20 MTB > Execute 'C:\ISTAT\CHA7\CONFINT.MTB' 100. The 100 confidence intervals are in c3 c4. The lower point of the confidence interval is in c3. The upper point of the confidence interval is in c3. We can find how many confidence intervals cover the sample mean: MTB > mean(c1) MEAN = 35.514 MTB > code (0:35.514)1 (35.514:100)0 c3 c5. MTB > code (0:35.514)0 (35.514:100)1 c4 c6. MTB > let c7=c5*c6 MTB > mean c7 MEAN = 0.90000 We get that 90 of the 100 confidence intervals cover mean(c1)=35.514. Since, we are using simulations, you may get something else.

Comments to: Miguel A. Arcones