Presentation is loading. Please wait.

Presentation is loading. Please wait.

SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)

Similar presentations


Presentation on theme: "SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)"— Presentation transcript:

1 SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)

2 To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to appreciate requirements and limitations of variables used in a multiple regression recognise the dependence of anova results on the order of fitting variables interpret results of anova results when terms are fitted sequentially understand the difference between interpretation of t-probabilities and anova F- probabilities when there are 2 or more xs.

3 To put your footer here go to View > Header and Footer 3 The crimes example again! Recall that in the example on relating number of acts regarded as crimes to age, college years and parents income, the college variable was non-significant. Although a quantitative variable, college had only 3 possible values! This is NOT a problem since college is an x variable, and there were many observations at each of these values. It is a problem if the y-variable had only a few distinct values – normality assumption is then violated.

4 To put your footer here go to View > Header and Footer 4 Points to note about the variables In the regression analyses so far considered, 1.the y-variable is a quantitative measurement, assumed to have an approximate normal distribution. 2.The x-variables are quantitative variates, each contributing 1 d.f. to the model. However, some xs could be categorical factors, each contributing d.f.=number of levels -1 to the model. The latter case will be discussed later!

5 To put your footer here go to View > Header and Footer 5 But – care is sometimes needed… If an x-variable has only a few values, pay attention to the number of observations for each. In practical 6, variable empl was highly significant (p=0.006) The residual plot looked OK, apart from one outlier (where just 1 HH had 3 employed members). But… will empl remain significant if the outlier was removed?

6 To put your footer here go to View > Header and Footer 6 Results after deleting outlier ----------------------------------------- lnexpdf| Coef. Std. Err. t P>|t| -------+--------------------------------- hhsize| -.06194.03031 -2.04 0.047 empl|.23483.28690 0.82 0.418 const.| 9.2177.16843 54.73 0.000 ----------------------------------------- Note that empl is now non-significant! Dangerous to use a model where conclusions depend on just 1 observation!

7 To put your footer here go to View > Header and Footer 7 ANOVA for 2-variables (sequential) We return again to the crimes example to show the effect of the order of fitting terms. ---------+------------------------------------- Source | df Seq.SS MS F Prob>F ---------+------------------------------------- age | 1 92.676 92.676 3.20 0.0808 college | 1 263.387 263.387 9.10 0.0043 Residual | 42 1216.248 28.958 ---------+------------------------------------- Total | 1572.311 44 35.734 ---------+------------------------------------- Here, age is fitted first, then college, hence F- probs need to be interpreted accordingly.

8 To put your footer here go to View > Header and Footer 8 ANOVA for 2-variables (sequential) Consider now the anova with the order of fitting terms changed… ---------+------------------------------------- Source | df Seq.SS MS F Prob>F ---------+------------------------------------- college | 1 2.780 2.781 0.10 0.7582 age | 1 353.282 353.282 12.20 0.0011 Residual | 42 1216.248 28.958 ---------+------------------------------------- Total | 1572.311 44 35.734 ---------+------------------------------------- Here, college is fitted first, then age. Note change in F-probs from previous slide. Why is this?

9 To put your footer here go to View > Header and Footer 9 Discussion… What is the same and what is different aross slides 7 and 8 above? Order of fitting seems to matter! What do the results mean? How do the F-probs from above and the t- probs below for model estimates compare? ----------------------------- crimes | Coef. P>|t| --------+-------------------- age | 1.30876 0.001 college | -6.448684 0.004 const. | 2.324681 0.590 -----------------------------

10 To put your footer here go to View > Header and Footer 10 Exercise: 2 nd example: Q2, Pract. 6 Open penrain.dta from Q2 of previous practical. Note down anova results below from a regression of rain on elevation, then altitude. Sourced.f.S.S.M.S.FProb. Elevation1 Altitude1 Residual13 Total15 Interpretation of F-probs:

11 To put your footer here go to View > Header and Footer 11 Changing order of fitting: Now fit altitude, then elevation. Note down the results below. Sourced.f.S.S.M.S.FProb. Altitude1 Elevation1 Residual13 Total15 Interpretation of F-probs:

12 To put your footer here go to View > Header and Footer 12 Model parameter estimates: Finally, note down the parameter estimates and the corresponding t-probabilities: Parameter Estimate of model parameter t-Prob. Altitude Elevation Constant Overall conclusions:

13 To put your footer here go to View > Header and Footer 13 Adjusted sums of squares Some software packages present adjusted sums of squares, taking results from anova tables in slides 10 and 11 into one single anova: SourcedfAdj. SSAdj MSFProb. Altitude12096.70 18.810.0008 Elevation1165.32 1.480.2450 Residual131449.37111.49 Total1513669.29 Note that the sums of squares now do not add to the total S.S. What do the F-probabilities now represent?

14 To put your footer here go to View > Header and Footer 14 Key Points Recognise the type of variable (y) being modelled. Methods discussed apply when y is quantitative The explanatory variables (the xs) can be variables of any type – but so far we have only considered quantitative xs Take care when interpreting anova F-probs to check whether the sums of squares are sequential or adjusted Note that all t-probabilities (associated with the parameter estimates) are adjusted for all other terms in the model

15 To put your footer here go to View > Header and Footer 15 Practical work follows to ensure learning objectives are achieved…


Download ppt "SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)"

Similar presentations


Ads by Google