Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”

Similar presentations


Presentation on theme: "© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”"— Presentation transcript:

1

2 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content” handout on the course webpage. Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If your sole predictor is continuous, MRA is identical to correlational analysis If your sole predictor is dichotomous, MRA is identical to a t-test If your several predictors are categorical, MRA is identical to ANOVA If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, How do you deal with missing data? S052/§I.1(d): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area? Today’s Topic Area

3 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 2 S052/§I.1(d): Applied Data Analysis Printed Syllabus – What Is Today’s Topic? Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials. Syllabus Section I.1(d) Checking the Assumptions on the Residuals Syllabus Section I.1(d), on Checking the Assumptions on the Residuals, includes: The Story So Far – What Have We Missed? (Slide 3). What Are The “Usual” Assumptions On The Residuals, And Why Are They Important? (Slide 4- 5). How Can Residual Assumptions Be Checked Analytically (Slide 6). Estimating Residual Diagnostics With PC-SAS (Slide 7). Testing for Residual Normality (Slide 8-11). Inspecting for Residual Homoscedasticity (Slide 12-15). Appendix 1: Library Of NPP Plots (Slide 16). Syllabus Section I.1(d) Checking the Assumptions on the Residuals Syllabus Section I.1(d), on Checking the Assumptions on the Residuals, includes: The Story So Far – What Have We Missed? (Slide 3). What Are The “Usual” Assumptions On The Residuals, And Why Are They Important? (Slide 4- 5). How Can Residual Assumptions Be Checked Analytically (Slide 6). Estimating Residual Diagnostics With PC-SAS (Slide 7). Testing for Residual Normality (Slide 8-11). Inspecting for Residual Homoscedasticity (Slide 12-15). Appendix 1: Library Of NPP Plots (Slide 16).

4 Is there any reason we might not trust the parameter estimates, statistical inference and goodness-of-fit statistics obtained in this “final model? Two Issues Have Gone Unexamined:  Atypical data points may be present in the point-cloud and driving the findings.  Need to check this.  Make sure all is well.  Need to check that the usual regression assumptions are met! Two Issues Have Gone Unexamined:  Atypical data points may be present in the point-cloud and driving the findings.  Need to check this.  Make sure all is well.  Need to check that the usual regression assumptions are met! © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 3 S052/§I.1(d): Checking Assumptions on the Residuals The Story So Far – What Have We Ignored? Unfortunately, the Two Issues Are Intimately Linked:  You always have to worry about violations of the residual assumptions.  But, if atypical data-points are present – particularly with large PRESS residuals – then it is even more likely that the residual assumptions will be violated. And, if you violate the assumptions, you can’t trust the findings!!!!!!!!!!!!! Unfortunately, the Two Issues Are Intimately Linked:  You always have to worry about violations of the residual assumptions.  But, if atypical data-points are present – particularly with large PRESS residuals – then it is even more likely that the residual assumptions will be violated. And, if you violate the assumptions, you can’t trust the findings!!!!!!!!!!!!!

5 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 4 S052/§I.1(d): Checking Assumptions on the Residuals CAUTION -- More Than The Residual Assumptions Are Critical in OLS Regression Analysis What Does The Assumption Require? Assumption How Does Failure of the Assumption Affect OLS Regression Analysis? All predictors must be perfectly reliable – eg., measured without error If predictors are fallible, OLS-estimated parameters will biased, and you will get the wrong answer to your RQs. Predictor Infallibility Linearity of Outcome/Predictor Relationship Population bivariate relationship between the outcome and each predictor must be linear If the relationship is not linear, then it will be misrepresented by the OLS linear regression analysis, and the fundamental underpinnings of the entire analysis are at risk:  OLS-estimated regression slopes will not represent the population relationship.  Residuals will be mis-estimated.  Assumptions on the residuals will be violated.  Statistical inference will be incorrect It’s important to remember that two other conditions are assumed to hold in OLS regression analysis, before you even begin to worry about the assumptions on the residuals … these are: We deal with the linearity assumption in more detail, later in the course … for now, you should handle the predictor infallibility assumption by making sure your measures are always as reliable as possible!!!

6 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 5 S052/§I.1(d): Checking Assumptions on the Residuals What Assumptions Are Made on the Residuals in OLS Regression Analysis? If residuals are not normally distributed, OLS- estimation of the critical values of test statistics and of the p-values will be wrong. Statistical inference will be incorrect. Residuals must be normally distributed If residuals are correlated with predictors, then OLS-estimates of parameters will be biased, and you will obtain the wrong answer to your RQs. Residuals must be uncorrelated with predictors If residuals are correlated with each other, then OLS-estimated standard errors will be too small. So, t-statistics will be inflated, and null hypotheses rejected more frequently than is correct. Residuals must be independent from case to case in the sample What Does The Assumption Require? Residual Normality Residual Independence Residual Homoscedasticity Assumption How Does Failure of the Assumption Affect OLS Regression Analysis? Residual variance must be identical at each level of every predictor If residuals are heteroscedastic, OLS-estimation will be inefficient, and standard errors will be incorrect. Consequently, t-statistics and statistical inference will be flawed. Once you are sure that the linearity and predictor infallibility conditions have been met, then you should worry about the assumptions on the residuals … in the following order of priority:

7 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 6 S052/§I.1(d): Checking Assumptions on the Residuals How Do You Check Assumptions On The Residuals in OLS Regression Analysis? Register for the S290 course, Spring 2010. Resolving problems caused by correlations between the predictors and the residuals is a topic in its own right. Need to check that:  About 95% of standardized residuals fall within ± 2 of zero.  NPP is coming later today! SW Test is coming later today! Can check visually by:  Inspecting the distribution of the standardized residuals.  Inspecting a Normal Probability Plot. Can conduct a test of residual normality. You can assume residual independence if you have sampled randomly from a non-clustered population. If folks in the population are grouped naturally within larger units, use multilevel modeling not MRA. You can’t check the residual independence assumption empirically, because it’s built into the data by the nature of the population, the research design and the sampling process. Need to check that vertical spread of residuals is approximately equal:  At every predictor value.  At every predicted value. This is a veritable Catch 22 – suggest you avoid formal tests of homoscedasticity. You can check visually by:  Plotting raw residuals vs. predictors.  Plotting raw residuals vs. predicted values. There are tests of homoscedasticity but they are very sensitive to violations of residual normality assumption. Residual Normality Residual Independence Residual Homoscedasticity How Can It Be Checked?AssumptionWhat Do You Look for?

8 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 7 S052/§I.1(d): Checking Assumptions on the Residuals Exploratory Inspection of Simple Residual Diagnostics You already know the “standard practices” – let’s embellish them a little … Data-Analytic Handout I.2(d).1 … PROC REG DATA=ILLCAUSE; VAR ILLCAUSE ILL AGE SES; M6: MODEL ILLCAUSE = ILL AGE ILLxAGE SES; * Output influence stats into temporary SAS dataset for diagnosis; OUTPUT OUT=DIAGNOSE PREDICTED=PRED PRESS=PRESS RSTUDENT=STDPRESS; *--------------------------------------------------------------------* Checking the assumptions on the PRESS residuals *--------------------------------------------------------------------*; * Checking the assumption of residual homoscedasticity; * Working with the raw PRESS residuals; PROC PLOT DATA=DIAGNOSE; PLOT PRESS*(PRED ILL AGE SES) = '+' ; * Checking the assumption of residual normality; * By conducting a formal test, working with the raw PRESS residuals; PROC UNIVARIATE PLOT NORMAL DATA=DIAGNOSE; VAR PRESS; * By inspection, working with the standardized PRESS residuals; * Count the number of cases with extreme values of STDPRESS; PROC UNIVARIATE NEXTROBS=20 DATA=DIAGNOSE; VAR STDPRESS; * Plot STDPRES vs. predicted values and predictors, for inspection; PROC PLOT DATA=DIAGNOSE; PLOT STDPRESS* (PRED ILL AGE SES) = '+' / VREF=2,-2; PROC REG DATA=ILLCAUSE; VAR ILLCAUSE ILL AGE SES; M6: MODEL ILLCAUSE = ILL AGE ILLxAGE SES; * Output influence stats into temporary SAS dataset for diagnosis; OUTPUT OUT=DIAGNOSE PREDICTED=PRED PRESS=PRESS RSTUDENT=STDPRESS; *--------------------------------------------------------------------* Checking the assumptions on the PRESS residuals *--------------------------------------------------------------------*; * Checking the assumption of residual homoscedasticity; * Working with the raw PRESS residuals; PROC PLOT DATA=DIAGNOSE; PLOT PRESS*(PRED ILL AGE SES) = '+' ; * Checking the assumption of residual normality; * By conducting a formal test, working with the raw PRESS residuals; PROC UNIVARIATE PLOT NORMAL DATA=DIAGNOSE; VAR PRESS; * By inspection, working with the standardized PRESS residuals; * Count the number of cases with extreme values of STDPRESS; PROC UNIVARIATE NEXTROBS=20 DATA=DIAGNOSE; VAR STDPRESS; * Plot STDPRES vs. predicted values and predictors, for inspection; PROC PLOT DATA=DIAGNOSE; PLOT STDPRESS* (PRED ILL AGE SES) = '+' / VREF=2,-2; Create a new (temporary) diagnostic dataset, called DIAGNOSE, and OUTput the selected statistics into it. Output the raw and standardized PRESS residuals into the DIAGNOSE dataset, for inspection and analysis. Put PREDICTED values into the DIAGNOSE dataset, and label them PRED. Plot the unstandardized PRESS residuals vs. predicted values & predictors to check residual homoscedasticity. Plot the standardized PRESS residuals vs. predicted values & predictors to check for residual normality using the  2 rule. Obtain univariate descriptive statistics on the PRESS residuals, to check residual normality. Provides tests of residual normality Provides normal probability plot

9 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 8 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Normality? Examining Stem-Leaf Plot of Raw PRESS Residuals? Are the unstandardized PRESS residuals normally distributed?

10 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 9 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Normality? – Plot of Standardized PRESS Residuals vs. Predicted Values? Are the standardized PRESS residuals normally distributed?

11 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 10 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Normality? – Examining Normal Probability Plot of Raw PRESS Residuals? Normal Probability Plot Data-Analytic Handout I.2(d).1 provides a Normal Probability Plot for better diagnosis of residual normality … Decision Rule: If the “data” (asterisks) fall on the reference line, then the PRESS residuals are normally distributed … This looks pretty good! Decision Rule: If the “data” (asterisks) fall on the reference line, then the PRESS residuals are normally distributed … This looks pretty good! “Reference” line “Reference” line vertical axis Actual values of unstandardized PRESS residuals are plotted on the vertical axis of the NPP plot horizontal axis “Normalized” values of the unstandardized PRESS residuals – that is, the values they would have, if they actually were drawn from a normal distribution – are plotted on horizontal axis of the NPP plot.

12 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 11 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Normality? –Tests of Normality on the Raw PRESS Residuals? tests of Residual Normality Data-Analytic Handout I.2(d).1 also provides several tests of Residual Normality … Tests for Normality -Statistic- ---p Value--- Shapiro-Wilk W 0.989 Pr < W 0.124 Kolmogorov-Smirnov D 0.065 Pr > D 0.045 Tests for Normality -Statistic- ---p Value--- Shapiro-Wilk W 0.989 Pr < W 0.124 Kolmogorov-Smirnov D 0.065 Pr > D 0.045 e.g., the Shapiro-Wilk Test: if the PRESS residuals were indeed normally distributed, then their actual and normalized values must be very highly correlated!!:  Shapiro-Wilk “W” statistic  Shapiro-Wilk “W” statistic is the square of their correlation! e.g., the Shapiro-Wilk Test: if the PRESS residuals were indeed normally distributed, then their actual and normalized values must be very highly correlated!!:  Shapiro-Wilk “W” statistic  Shapiro-Wilk “W” statistic is the square of their correlation! H 0 : PRESS residuals normally distributed, in population. W = 0.989, p = 0.124. p>.05, do not reject H 0. PRESS residuals are normally distributed H 0 : PRESS residuals normally distributed, in population. W = 0.989, p = 0.124. p>.05, do not reject H 0. PRESS residuals are normally distributed WS Test Kolmogorov-Smirnov Test WS Test is optimal in small samples, but is replaced by Kolmogorov-Smirnov Test in large samples: Same H 0, p-value to test, as usual. Notice the disagreement with WS Test! WS Test Kolmogorov-Smirnov Test WS Test is optimal in small samples, but is replaced by Kolmogorov-Smirnov Test in large samples: Same H 0, p-value to test, as usual. Notice the disagreement with WS Test!

13 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 12 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Heteroscedasticity? – Plot Raw PRESS Residuals vs. Predicted Values? Does the assumption of residual homoscedasticity hold in Model M6, versus the predicted values?

14 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 13 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Heteroscedasticity? – Plot PRESS Residuals vs. Predictor ILL? Does the assumption of residual homoscedasticity hold, in Model M6, versus predictor ILL?

15 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 14 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Heteroscedasticity? – Plot PRESS Residuals vs. Predictor AGE? Does the assumption of residual homoscedasticity hold, in Model M6, versus predictor AGE?

16 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 15 S052/§I.1(d): Checking Assumptions on the Residuals Evidence of Heteroscedasticity? -- Plot PRESS Residuals vs. Predictor SES? Does the assumption of residual homoscedasticity hold in Model M6, versus predictor SES?

17 © Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 16 S052/§I.1(d): Checking Assumptions on the Residuals Appendix 1: Library of Sample NPP Plots, Supplement With Your Own. Raw PRESS Normed PRESS + + + + + + + + + + + * * * * * * * * * * * + = ref line * = “data” Raw PRESS residuals in the upper tail are pushed too far up – this means that there is positive skewness and a long upper tail Raw PRESS Normed PRESS + + + + + + + + + + + * * * * * * * * * * * + = ref line * = “data” Raw PRESS residuals in the lower tail are pushed too far down – this means that there is negative skewness and a long lower tail Raw PRESS residuals in both the upper & lower tails are pushed too far out – tails are too thick, center is too thin and pointed (positive kurtosis) Raw PRESS residuals in both the upper & lower tails are pulled in – tails are unpopulated, center is too thick and flat (negative kurtosis) Raw PRESS Normed PRESS + + + + + + + + + + + * * * * * * * * * * * + = ref line * = “data” Raw PRESS Normed PRESS + + + + + + + + + + + * * * * * * * * * * * + = ref line * = “data”


Download ppt "© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”"

Similar presentations


Ads by Google