Presentation on theme: "Xuhua Xia Fitting Several Regression Lines Many applications of statistical analysis involves a continuous variable as dependent variable (DV) but both."— Presentation transcript:
Xuhua Xia Fitting Several Regression Lines Many applications of statistical analysis involves a continuous variable as dependent variable (DV) but both continuous and categorical variables as independent variables (IV). –Relationship between DV and continuous IVs is linear and the slope remains the same in different groups: ANCOVA. –Different slopes: Full model. An illustrative data set will make this clear.
Xuhua Xia Fitting Several Regression Lines The muscle strength (MS) depends on the diameter of the muscle fiber and the type of muscle (TM). Identify DV and IV. How do we incorporate the qualitative variable in to the model? The dummy variables. TMDMS A111.5 A213.8 A314.4 A416.8 A518.7 B110.8 B212.3 B313.7 B414.2 B516.6 C113.1 C216.2 C319.0 C422.9 C526.5
Xuhua Xia Two Scenarios Same intercept Different intercepts Different slopes: full model Same slope: ANCOVA
Xuhua Xia Two Scenarios Same intercept Different intercepts Different slopes Same slope Y 1 = a + b 1 X Y 2 = a + b 2 X Y 1 - Y 2 = (b 1 -b 2 )X Y 1 = a 1 + b X Y 2 = a 2 + b X Y 1 - Y 2 = a 1 -a 2 Multiplicative effect Additive effect
Objectives Obtain regression equations relating MS to D for each TM. Compare the mean MS for the three TMs at a given level of D. Is it meaningful to compare the mean MS for the three TMs without specifying the level of D?
Xuhua Xia Explaining the R functions Every 'factor' variable (TM in our case) used in lm model-fitting creates k-1 dummy variable: DUMA = 0 (not created) DUMB = 1 if TM=B = 0 otherwise DUMC = 1 if TM=C = 0 otherwise MS = + 1 DUMB + 2 DUMC + 3 D + 4 DUMB*D + 5 DUMC*D + The solution option prints estimates of the model coefficients.
R functions md <- read.table("DiffSlopeMuscle.txt",header=T) attach(md) minX<-min(D) maxX<-max(D) minY<-min(MS) maxY<-max(MS) plot(D[TM=="A"],MS[TM=="A"],xlab="D",ylab="MS",xlim=c(minX,maxX),ylim=c(minY,m axY),pch=16) points(D[TM == "B"], MS[TM == "B"], col='red',pch=16) points(D[TM=="C"], MS[TM == "C"], col='blue',pch=16) # Will ANOVA reveal the difference between the three teachers? fitANOVA<-aov(D~TM);anova(fitANOVA) # No significant difference in D, so students at the beginning appears # to be similar. Given the same-quality students to begin with, which # teacher will produce high-performing students at the end? fitANOVA<-aov(MS~TM);anova(fitANOVA) # Check the plot for slope heterogeneity # Explicit test of slope heterogeneity fit<-lm(MS~D*TM) anova(fit) # Check for significance: if not significant, then do ANCOVA fit<-lm(MS~D+TM) anova(fit)
R Output > anova(fit) Analysis of Variance Table Response: MS Df Sum Sq Mean Sq F value Pr(>F) D 1 138.245 138.245 704.534 7.392e-10 TM 2 98.001 49.001 249.720 1.306e-08 D:TM 2 22.481 11.240 57.284 7.595e-06 Residuals 9 1.766 0.196 > summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.8200 0.4646 21.137 5.57e-09 D 1.7400 0.1401 12.422 5.73e-07 TMB -0.3500 0.6570 -0.533 0.6071 TMC -0.3300 0.6570 -0.502 0.6275 D:TMB -0.3900 0.1981 -1.969 0.0805 D:TMC 1.6100 0.1981 8.127 1.95e-05 highly significant interaction. MS=9.82+1.74D-0.35B-0.33C-0.39D*B+1.61D*C A: MS = 9.82 + 1.74*D B: MS = 9.82 + 1.74D-0.35-0.39D = 9.47 + 1.35*D C: MS = 9.82 +1.74D-0.33C+1.61D = 9.49 + 3.35*D It might help to show regression with dummy variables in EXCEL
Type I and Type III SS Xuhua Xia > anova(fit) Analysis of Variance Table Response: MS Df Sum Sq Mean Sq F value Pr(>F) D 1 138.245 138.245 704.534 7.392e-10 *** TM 2 98.001 49.001 249.720 1.306e-08 *** D:TM 2 22.481 11.240 57.284 7.595e-06 *** Residuals 9 1.766 0.196 > drop1(fit,~.,test="F") Single term deletions Model: MS ~ D * TM Df Sum of Sq RSS AIC F value Pr(>F) 1.766 -20.090 D 1 30.2760 32.042 21.385 154.294 5.735e-07 *** TM 2 0.0702 1.836 -23.505 0.179 0.839 D:TM 2 22.4807 24.247 15.203 57.284 7.595e-06 *** Type I SS and F-test Type III SS and F-test