Section 6.3.


A hypothesis testing has some similarities with the judicial system.

In the judicial system, we assume that the defendant is innocent, 
unless it is proven guilty.

In a hypothesis testing problem, we assume that the null hypothesis 
is true unless the data strongly support the alternative hypothesis.

In the judicial system, to conclude that the defendant is guilty, we 
need to find characteristics of the perpetrator which match that of 
the defendant and hold for a very small proportion of the population.
If we know the weight, blood type, race and other charateristics of 
the perpetrator, these characteristics match the defendant and
they hold for one over a very small proportion (let us say one 
over a million) of the people, then we conclude that the defendant is 
guilty. It is almost impossible that by chance the defendant matches 
the characteristic of the perpetrator by chance. It might happen by 
chance that the supossedly innocent defendant 
matches the characteristics of the perpetrator, but this is extremely 
unlikable. In the other hand, if the characteristics of the suspect and 
perpetrator which we match hold for a not so small proportion of the
peopple (let us say 10 %), then we do not have enough evidence to conclude
that the suspect is the perpetrator. It is possible by chance the
perpetrator and the suspect have those characteristic in common. So,
we reach a conclussion according to how likely is that the 
matches between defendant and perpetrator occur by chance assuming
that the defendant is an innocent person from the general population.

In a hypothesis testing, we reject the null hypothesis, if the value 
of the test statistic is very unlikely to appear assuming that the 
null hypothesis is true. It might happen that by chance we get 
that value of the statistic assuming the null hypothesis, 
but this is extremely unlikable. In the other hand, if the value of 
the test statistic is between the range of common values of the test 
statistic, then we do not have enough evidence to reject Ho. So, we 
either reject or accept Ho according to how likely is that the value 
of the test statistic to appears. 

The significance level alpha is the biggest value of the type I error. It is
the biggest probability for which we reject Ho assuming that Ho is true. 
The significance level is how much evidence, we require to reject the
null hypothesis.The small alpha is, the more evidence we require to 
reject Ho. If alpha=0.05, we asking for evidence in such a way, that 
being Ho true 5 % of the times we make a mistake, because we get a
value of the statistic in the 5 %  extreme values of the statistic.

The p-value of the test is the smallest significance level 
for which we reject the null hypothesis. The p-value is the probability 
that we get a value of the test statistic as extreme or more extreme than
the value from the data. The small the p-statistic is, the more
evidence we have to reject the null hypothesis. 

For example, we test whether a coin is fair using 1000
throws of a coin. Suppose that we throw the coin 1000 times and 
we get 560 heads. We test Ho:p=1/2 versus Ha:p /=1/2. Using minitab,
we can simulate 1000 throws of coin as many times as we want. Let us
simulate 2000 repeations of 100 throws of a coin:
MTB > random 2000 c1;
SUBC> bino 1000 .5.
MTB > desc c1

                N     MEAN   MEDIAN   TRMEAN    STDEV   SEMEAN
C1           2000   500.25   501.00   500.28    15.83     0.35

              MIN      MAX       Q1       Q3
C1         443.00   554.00   489.00   511.00

MTB > histo c1


Looking to the simulations, we see that 560 is value very far off from the
values that we obtain when we throw a fair coin 1000. We almost never, 
get a value as large as 560. The p-value of the test is 0.00000 and we 
have a huge evidence to reject Ho.

If we do a test a the level .05, we reject Ho, when the number 
of heads X, is either bigger than x(1-alpha/2) or smaller than x(alpha/2)
where P(Bino(1000,.5)= < x(.025))=.025
P(Bino(1000,.5)= < x(.975))=.975.

By the central limit theorem,
x(.025)=np-z(alpha/2)*sqrt(np(1-p))=500-1.96*15.8114=469.010
and
x(.975)=np-z(1-alpha/2)*sqrt(np(1-p))=500+1.96*15.8114=530.990
So, we reject Ho if either X=< 469 or X>=531

We have that approximately 95 % of our simulations fall between
470 and 530. 560 is outside the interval of reasonable outcomes. 
So, we reject Ho.
***********************************************************
We can use minitab to do test for the mean of a normal distribution,
 both knowing and not knowing the variance.

If we do not know the variance, we do a z-test.

For example, in the car data, assuming that sigma=3, 
we can test Ho: mu=34 versus Ha: mu>34:

MTB > Retrieve 'C:\ISTAT\CAR.MTW'.
Retrieving worksheet from file: C:\ISTAT\CAR.MTW
MTB > ztest 35 3 c3;
SUBC> alte 1.
TEST OF MU = 35.000 VS MU G.T. 35.000
The assumed sigma = 3.00

N MEAN STDEV SE MEAN Z P VALUE

Tur_Diam 109 35.514 3.321 0.287 1.79 0.037

We conclude that there is enough evidence at the level .05 to reject H_o. The evidence is mild, not very strong. So, we do: MTB > ztest mu_o sigma variable
We are getting that n=109, x-bar=35.514, s=3.321, sigma/sqrt(n)=0.287, and z=sqrt(n)*(x-bar - mu_o)/sigma=1.79 and the p-value=0.037
By default, minitab do a two sided test. If we do not specify mu, minitab takes mu=0 as default. To do one sided tests, we use the subcommand alter If ALTERNATIVE = -1 then mu < mu_o is used. If ALTERNATIVE = 1, then mu > mu_o is used. If we do not know the variance, we do a t-test: MTB > ttest 35 c3; SUBC> alte 1. TEST OF MU = 35.000 VS MU G.T. 35.000

N MEAN STDEV SE MEAN T P VALUE

Tur_Diam 109 35.514 3.321 0.318 1.62 0.055

We conclude that there is not enough evidence at the level .05 to reject H(o). mu could be 35. We are getting that n=109, x^-=35.514, s=3.321, s/sqrt(n)=0.287, and t=(x^--mu_o)/s/sqrt(n)=1.62 Note that in the z-test, we reject the null hypothesis and in the t-test do not. This happens because sigma=3 is smaller than s=3.321. As small as sigma is as most likely we reject the null hypothesis.

Comments to: Miguel A. Arcones

	N	MEAN	STDEV	SE MEAN	Z	P VALUE
Tur_Diam	109	35.514	3.321	0.287	1.79	0.037