# Topic 15: General Linear Tests and Extra Sum of Squares.

## Presentation on theme: "Topic 15: General Linear Tests and Extra Sum of Squares."— Presentation transcript:

Topic 15: General Linear Tests and Extra Sum of Squares

Outline Extra Sums of Squares with applications Partial correlations Standardized regression coefficients

General Linear Tests Recall: A different way to look at the comparison of models Look at the difference –in SSE (reduce unexplained SS) –in SSM (increase explained SS) Because SSM+SSE=SST, these two comparisons are equivalent

General Linear Tests Models we compare are hierarchical in the sense that one (the full model) includes all of the explanatory variables of the other (the reduced model) We can compare models with different explanatory variables such as 1.X 1, X 2 vs X 1 2.X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3 (Note first model includes all Xs of second)

General Linear Tests We will get an F test that compares the two models We are testing a null hypothesis that the regression coefficients for the extra variables are all zero For X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3 –H 0 : β 4 = β 5 = 0 –H 1 : β 4 and β 5 are not both 0

General Linear Tests Degrees of freedom for the F statistic are the number of extra variables and the df E for the larger model Suppose n=100 and we compare models with X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3 Numerator df is 2 Denominator df is n-6 = 94

Notation for Extra SS SSE(X 1,X 2,X 3,X 4,X 5 ) is the SSE for the full model SSE(X 1,X 2,X 3 ) is the SSE for the reduced model SSE(X 4,X 5 | X 1,X 2,X 3 ) is the difference in the SSEs (reduced minus full) SSE(X 1,X 2,X 3 ) - SSE(X 1,X 2,X 3,X 4,X 5 ) Recall can replace SSE with SSM

F test Numerator : (SSE(X 4,X 5 | X 1,X 2,X 3 ))/2 Denominator : MSE(X 1,X 2,X 3,X 4,X 5 ) F ~ F(2, n-6) Reject if the P-value ≤ 0.05 and conclude that either X 4 or X 5 or both contain additional information useful for predicting Y in a linear model that also includes X 1, X 2, and X 3

Examples Predict bone density using age, weight and height; does diet add any useful information? Predict GPA using 3 HS grade variables; do SAT scores add any useful information? Predict yield of an industrial process using temperature and pH; does the supplier of the raw material (categorical) add any useful information?

Extra SS Special Cases Compare models that differ by one explanatory variable, F(1,n-p)=t 2 (n-p) SAS’s individual parameter t-tests are equivalent to the general linear test based on SSM(X i |X 1,…, X i-1, X i+1,…, X p-1 )

Add one variable at a time Consider 4 explanatory variables and the extra sum of squares –SSM (X 1 ) –SSM (X 2 | X 1 ) –SSM (X 3 |X 1, X 2 ) –SSM (X 4 |X 1, X 2, X 3 ) SSM (X 1 ) +SSM (X 2 | X 1 ) + SSM (X 3 | X 1, X 2 ) + SSM (X 4 | X 1, X 2, X 3 ) =SSM(X 1, X 2, X 3, X 4 )

One Variable added Numerator df is 1 for each of these tests F = (SSM / 1) / MSE( full ) ~ F(1, n-p) This is the SAS Type I SS We typically use Type II SS

KNNL Example p 257 20 healthy female subjects Y is body fat X 1 is triceps skin fold thickness X 2 is thigh circumference X 3 is midarm circumference Underwater weighing is the “gold standard” used to obtain Y

Input and data check options nocenter; data a1; infile ‘../data/ch07ta01.dat'; input skinfold thigh midarm fat; proc print data=a1; run;

Proc reg proc reg data=a1; model fat=skinfold thigh midarm; run;

Output Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model3396.98461132.3282021.52<.0001 Error1698.404896.15031 Corrected Total19495.38950 Group of predictors helpful in predicting percent body fat

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1117.0846999.782401.170.2578 skinfold14.334093.015511.440.1699 thigh1-2.856852.58202-1.110.2849 midarm1-2.186061.59550-1.370.1896 None of the individual t-tests are significant.

Summary The P-value for F test is <.0001 But the P-values for the individual regression coefficients are 0.1699, 0.2849, and 0.1896 None of these are below our standard significance level of 0.05 What is the reason?

Look at this using extra SS proc reg data=a1; model fat=skinfold thigh midarm /ss1 ss2; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t|Type I SSType II SS skinfold14.334093.015511.440.1699352.2698012.70489 thigh1-2.856852.58202-1.110.284933.168917.52928 midarm1-2.186061.59550-1.370.189611.54590 Notice how different these SS are for skinfold and thigh

Interpretation Fact: the Type I and Type II SS are very different If we reorder the variables in the model statement we will get –Different Type I SS –The same Type II SS Could variables be explaining same SS and canceling each other out?

Run additional models Rerun with skinfold as the explanatory variable proc reg data=a1; model fat=skinfold; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-1.496103.31923-0.450.6576 skinfold10.857190.128786.66<.0001 Skinfold by itself is a highly significant linear predictor

Use general linear test to see if other predictors contribute beyond skinfold proc reg data=a1; model fat= skinfold thigh midarm; thimid: test thigh, midarm; run;

Output Test thimid Results for Dependent Variable fat SourceDF Mean SquareF ValuePr > F Numerator222.357413.640.0500 Denominator166.15031 Yes they are help after skinfold is in the model. Perhaps best model includes only two predictors

Use general linear test to assess midarm proc reg data=a1; model fat= skinfold thigh midarm; midarm: test midarm; run;

Output Test thimid Results for Dependent Variable fat SourceDF Mean SquareF ValuePr > F Numerator111.545901.880.1896 Denominator166.15031 With skinfold and thigh in the model, midarm is not a significant predictor. This is just the t-test for this coef in full model

Other uses of general linear test The test statement can be used to perform a significance test for any hypothesis involving a linear combination of the regression coefficients Examples –H 0 : β 4 = β 5 –H 0 : β 4 - 3β 5 = 12

Partial correlations Measures the strength of a linear relation between two variables taking into account other variables Procedure to find partial correlation X i, Y –Predict Y using other X’s –Predict X i using other X’s –Find correlation between the two sets of residuals KNNL use the term coefficient of partial determination for the squared partial correlation

Pcorr2 option proc reg data=a1; model fat=skinfold thigh midarm / pcorr2; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Squared Partial Corr Type II Intercept1117.0846999.782401.170.2578. skinfold14.334093.015511.440.16990.11435 thigh1-2.856852.58202-1.110.28490.07108 midarm1-2.186061.59550-1.370.18960.10501 Skinfold and midarm explain the most remaining variation when added last

Standardized Regression Model Can help reduce round-off errors in calculations Puts regression coefficients in common units Units for the usual coefficients are units for Y divided by units for X

Standardized Regression Model Standardized coefs can be obtained from the usual ones by multiplying by the ratio of the standard deviation of X to the standard deviation of Y Interpretation is that a one sd increase in X corresponds to a ‘standardized beta’ increase in Y

Standardized Regression Model Y = … + βX + … = … + β(s X /s Y )(s Y /s X )X + … = … + β(s X /s Y )((s Y /s X )X) + … = … + β(s X /s Y )(s Y )(X/s X ) + …

Standardized Regression Model Standardize Y and all X’s (subtract mean and divide by standard deviation) Then divide by n-1 The regression coefficients for variables transformed in this way are the standardized regression coefficients

STB option proc reg data=a1; model fat=skinfold thigh midarm / stb; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Standardized Estimate Intercept1117.0846999.782401.170.25780 skinfold14.334093.015511.440.16994.26370 thigh1-2.856852.58202-1.110.2849-2.92870 midarm1-2.186061.59550-1.370.1896-1.56142 Skinfold and thigh suggest largest standardized change

Reading We went over 7.1 – 7.5 We used program topic15.sas to generate the output

Download ppt "Topic 15: General Linear Tests and Extra Sum of Squares."

Similar presentations