Download presentation

Presentation is loading. Please wait.

Published byBrooke Fowler Modified over 3 years ago

1
SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)

2
To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to appreciate requirements and limitations of variables used in a multiple regression recognise the dependence of anova results on the order of fitting variables interpret results of anova results when terms are fitted sequentially understand the difference between interpretation of t-probabilities and anova F- probabilities when there are 2 or more xs.

3
To put your footer here go to View > Header and Footer 3 The crimes example again! Recall that in the example on relating number of acts regarded as crimes to age, college years and parents income, the college variable was non-significant. Although a quantitative variable, college had only 3 possible values! This is NOT a problem since college is an x variable, and there were many observations at each of these values. It is a problem if the y-variable had only a few distinct values – normality assumption is then violated.

4
To put your footer here go to View > Header and Footer 4 Points to note about the variables In the regression analyses so far considered, 1.the y-variable is a quantitative measurement, assumed to have an approximate normal distribution. 2.The x-variables are quantitative variates, each contributing 1 d.f. to the model. However, some xs could be categorical factors, each contributing d.f.=number of levels -1 to the model. The latter case will be discussed later!

5
To put your footer here go to View > Header and Footer 5 But – care is sometimes needed… If an x-variable has only a few values, pay attention to the number of observations for each. In practical 6, variable empl was highly significant (p=0.006) The residual plot looked OK, apart from one outlier (where just 1 HH had 3 employed members). But… will empl remain significant if the outlier was removed?

6
To put your footer here go to View > Header and Footer 6 Results after deleting outlier ----------------------------------------- lnexpdf| Coef. Std. Err. t P>|t| -------+--------------------------------- hhsize| -.06194.03031 -2.04 0.047 empl|.23483.28690 0.82 0.418 const.| 9.2177.16843 54.73 0.000 ----------------------------------------- Note that empl is now non-significant! Dangerous to use a model where conclusions depend on just 1 observation!

7
To put your footer here go to View > Header and Footer 7 ANOVA for 2-variables (sequential) We return again to the crimes example to show the effect of the order of fitting terms. ---------+------------------------------------- Source | df Seq.SS MS F Prob>F ---------+------------------------------------- age | 1 92.676 92.676 3.20 0.0808 college | 1 263.387 263.387 9.10 0.0043 Residual | 42 1216.248 28.958 ---------+------------------------------------- Total | 1572.311 44 35.734 ---------+------------------------------------- Here, age is fitted first, then college, hence F- probs need to be interpreted accordingly.

8
To put your footer here go to View > Header and Footer 8 ANOVA for 2-variables (sequential) Consider now the anova with the order of fitting terms changed… ---------+------------------------------------- Source | df Seq.SS MS F Prob>F ---------+------------------------------------- college | 1 2.780 2.781 0.10 0.7582 age | 1 353.282 353.282 12.20 0.0011 Residual | 42 1216.248 28.958 ---------+------------------------------------- Total | 1572.311 44 35.734 ---------+------------------------------------- Here, college is fitted first, then age. Note change in F-probs from previous slide. Why is this?

9
To put your footer here go to View > Header and Footer 9 Discussion… What is the same and what is different aross slides 7 and 8 above? Order of fitting seems to matter! What do the results mean? How do the F-probs from above and the t- probs below for model estimates compare? ----------------------------- crimes | Coef. P>|t| --------+-------------------- age | 1.30876 0.001 college | -6.448684 0.004 const. | 2.324681 0.590 -----------------------------

10
To put your footer here go to View > Header and Footer 10 Exercise: 2 nd example: Q2, Pract. 6 Open penrain.dta from Q2 of previous practical. Note down anova results below from a regression of rain on elevation, then altitude. Sourced.f.S.S.M.S.FProb. Elevation1 Altitude1 Residual13 Total15 Interpretation of F-probs:

11
To put your footer here go to View > Header and Footer 11 Changing order of fitting: Now fit altitude, then elevation. Note down the results below. Sourced.f.S.S.M.S.FProb. Altitude1 Elevation1 Residual13 Total15 Interpretation of F-probs:

12
To put your footer here go to View > Header and Footer 12 Model parameter estimates: Finally, note down the parameter estimates and the corresponding t-probabilities: Parameter Estimate of model parameter t-Prob. Altitude Elevation Constant Overall conclusions:

13
To put your footer here go to View > Header and Footer 13 Adjusted sums of squares Some software packages present adjusted sums of squares, taking results from anova tables in slides 10 and 11 into one single anova: SourcedfAdj. SSAdj MSFProb. Altitude12096.70 18.810.0008 Elevation1165.32 1.480.2450 Residual131449.37111.49 Total1513669.29 Note that the sums of squares now do not add to the total S.S. What do the F-probabilities now represent?

14
To put your footer here go to View > Header and Footer 14 Key Points Recognise the type of variable (y) being modelled. Methods discussed apply when y is quantitative The explanatory variables (the xs) can be variables of any type – but so far we have only considered quantitative xs Take care when interpreting anova F-probs to check whether the sums of squares are sequential or adjusted Note that all t-probabilities (associated with the parameter estimates) are adjusted for all other terms in the model

15
To put your footer here go to View > Header and Footer 15 Practical work follows to ensure learning objectives are achieved…

Similar presentations

OK

SADC Course in Statistics Comparing two proportions (Session 14)

SADC Course in Statistics Comparing two proportions (Session 14)

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google