Diagnostics and Remedial Measures

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Objectives (BPS chapter 24)
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Regression Diagnostics - I
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Model Building III – Remedial Measures KNNL – Chapter 11.
Chapter 3: Diagnostics and Remedial Measures
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Autocorrelation in Time Series KNNL – Chapter 12.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
Residual Analysis for ANOVA Models KNNL – Chapter 18.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Statistical Data Analysis - Lecture /04/03
Correlation, Bivariate Regression, and Multiple Regression
Statistics for Managers using Microsoft Excel 3rd Edition
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
CHAPTER 12 More About Regression
Chapter 12: Regression Diagnostics
(Residuals and
Checking Regression Model Assumptions
Regression Model Building - Diagnostics
Diagnostics and Transformation for SLR
Regression Analysis Week 4.
Statistical Methods For Engineers
Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae
Checking Regression Model Assumptions
Single-Factor Studies
Residuals The residuals are estimate of the error
Single-Factor Studies
When You See (This), You Think (That)
Chapter 4, Regression Diagnostics Detection of Model Violation
Ch11 Curve Fitting II.
Regression Model Building - Diagnostics
CHAPTER 12 More About Regression
Multiple Regression Numeric Response variable (Y)
Checking the data and assumptions before the final analysis.
Regression Assumptions
The Examination of Residuals
Chapter 13 Additional Topics in Regression Analysis
Diagnostics and Remedial Measures
CHAPTER 12 More About Regression
Linear Regression and Correlation
Diagnostics and Transformation for SLR
Model Adequacy Checking
Regression Assumptions
Remedial Procedure in SLR: transformation
Presentation transcript:

Diagnostics and Remedial Measures KNNL – Chapter 3

Diagnostics for Predictor Variables Problems can occur when: Outliers exist among X levels X levels are associated with run order when experiment is run sequentially Useful plots of X levels Dot plot for discrete data Histogram Box Plot Sequence Plot (X versus Run #)

Residuals

Model Departures Detected With Residuals and Plots Relation between Y and X is not linear Errors have non-constant variance Errors are not uncorrelated Existence of Outlying Observations Non-normal Errors Missing predictor variable(s) Common Plots Residuals/Absolute Residuals versus Predictor Variable Residuals/Absolute Residuals versus Predicted Values Residuals versus Omitted variables Residuals versus Time Box Plots, Histograms, Normal Probability Plots

Detecting Nonlinearity of Regression Function Plot Residuals versus X Random Cloud around 0  Linear Relation U-Shape or Inverted U-Shape  Nonlinear Relation Maps Distribution Example (Table 3.1,Figure 3.5, p.10)

Non-Constant Error Variance / Outliers / Non-Independence Plot Residuals versus X or Predicted Values Random Cloud around 0  Linear Relation Funnel Shape  Non-constant Variance Outliers fall far above (positive) or below (negative) the general cloud pattern Plot absolute Residuals, squared residuals, or square root of absolute residuals Positive Association  Non-constant Variance Measurements made over time: Plot Residuals versus Time Order (Expect Random Cloud if independent) Linear Trend  Process “improving” or “worsening” over time Cyclical Trend  Measurements close in time are similar

Homoscedasticity?

Non-Normal Errors Box-Plot of Residuals – Can confirm symmetry and lack of outliers Check Proportion that lie within 1 standard deviation from 0, 2 SD, etc, where SD=sqrt(MSE) Normal probability plot of residual versus expected values under normality – should fall approximately on a straight line (Only works well with moderate to large samples) qqnorm(e); qqline(e) in R

Normal probability plots

Problems with normality

Omission of Important Predictors If data are available on other variables, plots of residuals versus X, separate for each level or sub-ranges of the new variable(s) can lead to adding new predictor variable(s) If, for instance, if residuals from a regression of salary versus experience was fit, we could view residuals versus experience for males and females separately, if one group tends to have positive residuals, and the other negative, then add gender to model

Omission of important predictios

Equal (Homogeneous) Variance - I

Equal (Homogeneous) Variance - II

Linearity of Regression

Remedial Measures Nonlinear Relation – Add polynomials, fit exponential regression function, or transform Y and/or X Non-Constant Variance – Weighted Least Squares, transform Y and/or X, or fit Generalized Linear Model Non-Independence of Errors – Transform Y or use Generalized Least Squares Non-Normality of Errors – Box-Cox tranformation, or fit Generalized Linear Model Omitted Predictors – Include important predictors in a multiple regression model Outlying Observations – Robust Estimation

Transformations for Non-Linearity – Constant Variance X’ = 1/X X’ = e-X X’ = X2 X’ = eX X’ = √X X’ = ln(X)

Transformations for Non-Linearity – Non-Constant Variance Y’ = 1/Y Y’ = √Y Y’ = ln(Y)

Box-Cox Transformations Automatically selects a transformation from power family with goal of obtaining: normality, linearity, and constant variance (not always successful, but widely used) Goal: Fit model: Y’ = b0 + b1X + e for various power transformations on Y, and selecting transformation producing minimum SSE (maximum likelihood) Procedure: over a range of l from, say -2 to +2, obtain Wi and regress Wi on X (assuming all Yi > 0, although adding constant won’t affect shape or spread of Y distribution)

Lowess (Smoothed) Plots Nonparametric method of obtaining a smooth plot of the regression relation between Y and X Fits regression in small neighborhoods around points along the regression line on the X axis Weights observations closer to the specific point higher than more distant points Re-weights after fitting, putting lower weights on larger residuals (in absolute value) Obtains fitted value for each point after “final” regression is fit Model is plotted along with linear fit, and confidence bands, linear fit is good if lowess lies within bands