Section 3.4. Robust correlation and regression.

We work with the aluminum data in ALMPIN.mtw MTB > Retrieve 'C:\MTBSEW\INDUST~1\MTW\ALMPIN.MTW'. We find the robust correlations as it is done in example 3.8 of the textbook. We only use the 4th and 7-th boards of the data. We take x-variable=tet-dev (c4) and y-variable y-dev (c3) First, we select those boards. MTB > Retrieve 'C:\MTBSEW\INDUST~1\MTW\PLACE.MTW'. MTB > copy c3 c13; SUBC> use c1=4,7. MTB > copy c4 c14; SUBC> use c1=4,7. If we do the boxplots of c13 and c14, we find that these data has outliers. So, it make sense to use robust methods. We are goind to find the robust correlation of c13 and c14, using the method in pages 63-66 of the book. Next, we find the trimmed sample mean and trimmed sample standard deviation MTB > sort c13 c23 MTB > sort c14 c24 MTB > delete 1,2 31,32 c22-c24 MTB > desc c24 N MEAN MEDIAN TRMEAN STDEV SEMEAN C24 28 0.01612 0.00063 0.01453 0.02366 0.00447 MIN MAX Q1 Q3 C24 -0.00332 0.07677 -0.00144 0.03400 MTB > desc c23 N MEAN MEDIAN TRMEAN STDEV SEMEAN C23 28 -0.00209 -0.00210 -0.00210 0.00050 0.00010 MIN MAX Q1 Q3 C23 -0.00281 -0.00113 -0.00257 -0.00176 MTB > let c31=((c14-mean(c24))/stde(c24))+((c13-mean(c23))/stde(c23)) MTB > let c32=((c14-mean(c24))/stde(c24))-((c13-mean(c23))/stde(c23)) If we print c14 c13 c12 c31 c32 we get the Table 3.12 in page 85 of the book: MTB > print c14 c13 c31 c32 Next, we find the robust variances of the samples Z_1 and Z_2 and find the robust correlation: MTB > sort c31 c35 MTB > sort c32 c36 MTB > delete 1,2 31,32 c35 c36 MTB > let k1=stdev(c35)**2 MTB > let k2=stdev(c36)**2 MTB > let k3=(k1-k2)/(k1+k2) MTB > print k1 k2 k3 K1 2.04002 K2 1.70318 K3 0.0899896 MTB > corre c13 c14 Correlation of C13 and C14 = 0.366 We get that the robust correlation is 0.0899896, which is much smaller than the Pearson correlation 0.366. We can also find the Spearman rank-order correlation: MTB > rank c13 c23 MTB > rank c14 c24 MTB > corre c24 c23 Correlation of C24 and C23 = 0.148 **************************************************** Next, we find the robust regression line for the solar cell (socell.dat). We want to find the regression line of c2 by c1. Since there are 16 observations, we use the first 5 observations to find the median for the first group of observations and the 5 last observations to find the other medians: MTB > sort c1 c8 MTB > sort c2 c9; SUBC> by c1. MTB > Copy C8 c10; SUBC> Use 1:5. MTB > Copy C9 c11; SUBC> Use 1:5. MTB > Copy C8 c14; SUBC> Use 12:16. MTB > Copy C9 c15; SUBC> Use 12:16. MTB > let k1=medi(c8) MTB > let k2=medi(c9) MTB > let k3=medi(c10) MTB > let k4=medi(c11) MTB > let k5=medi(c14) MTB > let k6=medi(c15) MTB > let k7=(k6-k4)/(k5-k3) MTB > let k8=k2-(k7*k1) MTB > print k8 k7 K8 0.595205 K7 0.904110 We get that the resistant regression line is Time_2 = 0.595205+ 0.904110Time_1 Meanwhile the usual regression line is given by this equation: MTB > regr c2 1 c1 The regression equation is Time_2 = 0.536 + 0.929 Time_1

Comments to: Miguel A. Arcones