# Xuhua Xia Fitting Several Regression Lines Many applications of statistical analysis involves a continuous variable as dependent variable (DV) but both.

## Presentation on theme: "Xuhua Xia Fitting Several Regression Lines Many applications of statistical analysis involves a continuous variable as dependent variable (DV) but both."— Presentation transcript:

Xuhua Xia Fitting Several Regression Lines Many applications of statistical analysis involves a continuous variable as dependent variable (DV) but both continuous and categorical variables as independent variables (IV). –Relationship between DV and continuous IVs is linear and the slope remains the same in different groups: ANCOVA. –Different slopes: Full model. An illustrative data set will make this clear.

Xuhua Xia Fitting Several Regression Lines The muscle strength (MS) depends on the diameter of the muscle fiber and the type of muscle (TM). Identify DV and IV. How do we incorporate the qualitative variable in to the model? The dummy variables. TMDMS A111.5 A213.8 A314.4 A416.8 A518.7 B110.8 B212.3 B313.7 B414.2 B516.6 C113.1 C216.2 C319.0 C422.9 C526.5

Xuhua Xia Two Scenarios Same intercept Different intercepts Different slopes: full model Same slope: ANCOVA

Xuhua Xia Two Scenarios Same intercept Different intercepts Different slopes Same slope Y 1 = a + b 1 X Y 2 = a + b 2 X Y 1 - Y 2 = (b 1 -b 2 )X Y 1 = a 1 + b X Y 2 = a 2 + b X Y 1 - Y 2 = a 1 -a 2 Multiplicative effect Additive effect

Xuhua Xia Plot of MS vs D by TM

Objectives Obtain regression equations relating MS to D for each TM. Compare the mean MS for the three TMs at a given level of D. Is it meaningful to compare the mean MS for the three TMs without specifying the level of D?

Xuhua Xia Explaining the R functions Every 'factor' variable (TM in our case) used in lm model-fitting creates k-1 dummy variable: DUMA = 0 (not created) DUMB = 1 if TM=B = 0 otherwise DUMC = 1 if TM=C = 0 otherwise MS =  +  1 DUMB +  2 DUMC +  3 D +  4 DUMB*D +  5 DUMC*D +  The solution option prints estimates of the model coefficients.

Xuhua Xia Illustration with EXCEL MS TM DDUMBDUMCDUMB*DDUMC*D 11.5A1 0000 13.8A2 0000 14.4A3 0000 16.8A4 0000 18.7A5 0000 10.8B1 1010 12.3B2 1020 13.7B3 1030 14.2B4 1040 16.6B5 1050 13.1C1 0101 16.2C2 0102 19C3 0103 22.9C4 0104 26.5C5 0105

R functions md <- read.table("DiffSlopeMuscle.txt",header=T) attach(md) minX<-min(D) maxX<-max(D) minY<-min(MS) maxY<-max(MS) plot(D[TM=="A"],MS[TM=="A"],xlab="D",ylab="MS",xlim=c(minX,maxX),ylim=c(minY,m axY),pch=16) points(D[TM == "B"], MS[TM == "B"], col='red',pch=16) points(D[TM=="C"], MS[TM == "C"], col='blue',pch=16) # Will ANOVA reveal the difference between the three teachers? fitANOVA<-aov(D~TM);anova(fitANOVA) # No significant difference in D, so students at the beginning appears # to be similar. Given the same-quality students to begin with, which # teacher will produce high-performing students at the end? fitANOVA<-aov(MS~TM);anova(fitANOVA) # Check the plot for slope heterogeneity # Explicit test of slope heterogeneity fit<-lm(MS~D*TM) anova(fit) # Check for significance: if not significant, then do ANCOVA fit<-lm(MS~D+TM) anova(fit)

R Output > anova(fit) Analysis of Variance Table Response: MS Df Sum Sq Mean Sq F value Pr(>F) D 1 138.245 138.245 704.534 7.392e-10 TM 2 98.001 49.001 249.720 1.306e-08 D:TM 2 22.481 11.240 57.284 7.595e-06 Residuals 9 1.766 0.196 > summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.8200 0.4646 21.137 5.57e-09 D 1.7400 0.1401 12.422 5.73e-07 TMB -0.3500 0.6570 -0.533 0.6071 TMC -0.3300 0.6570 -0.502 0.6275 D:TMB -0.3900 0.1981 -1.969 0.0805 D:TMC 1.6100 0.1981 8.127 1.95e-05 highly significant interaction. MS=9.82+1.74D-0.35B-0.33C-0.39D*B+1.61D*C A: MS = 9.82 + 1.74*D B: MS = 9.82 + 1.74D-0.35-0.39D = 9.47 + 1.35*D C: MS = 9.82 +1.74D-0.33C+1.61D = 9.49 + 3.35*D It might help to show regression with dummy variables in EXCEL

Type I and Type III SS Xuhua Xia > anova(fit) Analysis of Variance Table Response: MS Df Sum Sq Mean Sq F value Pr(>F) D 1 138.245 138.245 704.534 7.392e-10 *** TM 2 98.001 49.001 249.720 1.306e-08 *** D:TM 2 22.481 11.240 57.284 7.595e-06 *** Residuals 9 1.766 0.196 > drop1(fit,~.,test="F") Single term deletions Model: MS ~ D * TM Df Sum of Sq RSS AIC F value Pr(>F) 1.766 -20.090 D 1 30.2760 32.042 21.385 154.294 5.735e-07 *** TM 2 0.0702 1.836 -23.505 0.179 0.839 D:TM 2 22.4807 24.247 15.203 57.284 7.595e-06 *** Type I SS and F-test Type III SS and F-test

R functions Xuhua Xia nd1<-subset(md,subset=(TM=="A")) nd2<-subset(md,subset=(TM=="B")) nd3<-subset(md,subset=(TM=="C")) nd1<-nd1[order(nd1\$D),] nd2<-nd2[order(nd2\$D),] nd3<-nd3[order(nd3\$D),] y1<-predict(fit,nd1,interval="confidence") y2<-predict(fit,nd2,interval="confidence") y3<-predict(fit,nd3,interval="confidence") par(mfrow=c(1,3)) plot(D[TM=="A"],MS[TM=="A"],xlab="D",ylab="MS",xlim=c(minX,maxX),ylim=c(minY,maxY),pch=16) points(D[TM == "B"], MS[TM == "B"], col='red',pch=16) points(D[TM=="C"], MS[TM == "C"], col='blue',pch=16) lines(nd1\$D,y1[,1],col="black") lines(nd1\$D,y1[,2],col="black") lines(nd1\$D,y1[,3],col="black") lines(nd2\$D,y2[,1],col="red") lines(nd2\$D,y2[,2],col="red") lines(nd2\$D,y2[,3],col="red") lines(nd3\$D,y3[,1],col="blue") lines(nd3\$D,y3[,2],col="blue") lines(nd3\$D,y3[,3],col="blue") Call plot before lines

95% CI plots Xuhua Xia

Download ppt "Xuhua Xia Fitting Several Regression Lines Many applications of statistical analysis involves a continuous variable as dependent variable (DV) but both."

Similar presentations