y=c(48, 51, 39, 24, 24, 27, 12, 27, 24) x1=c(1, 1, 1, 0, 0, 1, 0, 1, 1) x2=c(0, 1, 0, 1, 1, 1, 1, 0, 1) r= c(1, 4, 1, 2, 2, 4, 2, 1, 4) r=factor(r) t=1:9 m2=lm(y~x1+x2-1) m1=lm(y~x1-1) m3=lm(y~x1+x2+t-1) m4=lm(y~x1+x2+x1*x2-1) m5=lm(y~x1+x2+x1*x2+t-1) m6=lm(y~x1+x2+x1*x2+t) m7=lm(y~r+t-1) m8=lm(y~r+t+I(t^2)-1) predict(m3) predict(m5) predict(m6) predict(m7) predict(m8) anova(m1,m2) m5 m6 m7 m9=lm(y~x1+x2+r-1) m0=lm(y~x1+r-1) anova(m9) summary(m9) anova(m0) summary(m0) m11=lm(y~x1+x2+t+r-1) anova(m11) q() 2. Among the 8 models below, determine which pair of models are equivalent, in the sense that their $\hat y$ are the same; which pair of models can be simplified; which pair of models that can be simplified using first method of model checking test. Answer to 2.1. m5, m6, m7 are equivalent, it can be seen from predictions or residuala > predict(m3) 1 2 3 4 5 6 7 8 32.714055 49.463651 29.660743 12.169628 10.642973 43.357027 7.589661 22.027464 9 38.777060 > predict(m5) 1 2 3 4 5 6 7 8 48.46386 46.78916 41.48795 24.65060 21.16265 32.83735 14.18675 24.04819 9 22.37349 > predict(m6) 1 2 3 4 5 6 7 8 48.46386 46.78916 41.48795 24.65060 21.16265 32.83735 14.18675 24.04819 9 22.37349 > predict(m7) 1 2 3 4 5 6 7 8 48.46386 46.78916 41.48795 24.65060 21.16265 32.83735 14.18675 24.04819 9 22.37349 Discussion: In both m5 and m7, there are 4 parameters (beta). In m6 there are 5 parameters (beta). m5=lm(y~x1+x2+x1*x2+t-1) m6=lm(y~x1+x2+x1*x2+t) m7=lm(y~r+t-1) Why they are equivalent ? Reason: the (x1,x2,x1*x2,t) matrix for m5 is (1, 1, 1, 0, 0, 1, 0, 1, 1) x1 (0, 1, 0, 1, 1, 1, 1, 0, 1) x2 (0, 1, 0, 0, 0, 1, 0, 0, 1) x1*x2 (1, 2, 3, 4, 5, 6, 7, 8, 9) t the (1,x1,x2,x1*x2,t) matrix for m6 is (1, 1, 1, 1, 1, 1, 1, 1, 1) (1, 1, 1, 0, 0, 1, 0, 1, 1) x1 (0, 1, 0, 1, 1, 1, 1, 0, 1) x2 (0, 1, 0, 0, 0, 1, 0, 0, 1) x1*x2 (1, 2, 3, 4, 5, 6, 7, 8, 9) t is equivalent to (0, 0, 0, 1, 1, 0, 1, 0, 0) (0, 0, 0, 0, 0, 0, 0, 0, 0) 1-2 (0, 0, 0, 1, 1, 0, 1, 0, 0) (0, 0, 0, 0, 0, 0, 0, 0, 0) 3-4 (1, 2, 3, 4, 5, 6, 7, 8, 9) t > m5 x1 x2 t x1:x2 51.952 38.602 -3.488 -36.789 > m6 (Intercept) x1 x2 t x1:x2 36.789 15.163 1.813 -3.488 NA > m7 r1 r2 r4 t 51.952 38.602 53.765 -3.488 B.3. m1 and m2, but not m3 (or others) can apply the goodness-of-fit test, as $m=3$, p=1 or 2 (m-p>0) in m1 or m2, and m=3\$ and p=3 in m3. m3=lm(y~x1+x2+t-1) > m9=lm(y~x1+x2+r-1) > m0=lm(y~x1+r-1) > anova(m9) Df Sum Sq Mean Sq F value Pr(>F) x1 1 7776 7776 61.7143 0.000225 *** x2 1 648 648 5.1429 0.063861 . r 1 576 576 4.5714 0.076351 . Residuals 6 756 126 > anova(m0) Df Sum Sq Mean Sq F value Pr(>F) x1 1 7776 7776 61.7143 0.000225 *** r 2 1224 612 4.8571 0.055663 . Residuals 6 756 126 > anova(m1,m2) Model 1: y ~ x1 - 1 Model 2: y ~ x1 + x2 - 1 Res.Df RSS Df Sum of Sq F Pr(>F) 1 8 1980 2 7 1332 1 648 3.4054 0.1075 > m11=lm(y~x1+x2+t+r-1) > anova(m11) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x1 1 7776.0 7776.0 469.4218 3.886e-06 *** x2 1 648.0 648.0 39.1185 0.001532 ** t 1 160.3 160.3 9.6769 0.026527 * r 1 1088.9 1088.9 65.7333 0.000463 *** Residuals 5 82.8 16.6