Regression Assumptions

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Uji Kelinearan dan Keberartian Regresi Pertemuan 02 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –
Chapter Topics Types of Regression Models
Regression Diagnostics - I
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
BCOR 1020 Business Statistics Lecture 26 – April 24, 2007.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Relationships Among Variables
Inference for regression - Simple linear regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
TODAY we will Review what we have learned so far about Regression Develop the ability to use Residual Analysis to assess if a model (LSRL) is appropriate.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Residual Analysis Purposes –Examine Functional Form (Linear vs. Non- Linear Model) –Evaluate Violations of Assumptions Graphical Analysis of Residuals.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Linear Regression.
What is normal anyway?! Disclaimer: I am not an expert!
Kin 304 Regression Linear Regression Least Sum of Squares
Inferences for Regression
Statistical Data Analysis - Lecture10 26/03/03
Chapter 12: Regression Diagnostics
ECONOMETRICS DR. DEEPTI.
BPK 304W Regression Linear Regression Least Sum of Squares
(Residuals and
Normal Distribution Farrokh Alemi Ph.D.
Linear Regression Models
Model diagnostics Tim Paine, modified from Zarah Pattison’s slides
Regression model Y represents a value of the response variable.
BA 275 Quantitative Business Methods
CHAPTER 26: Inference for Regression
No notecard for this quiz!!
LESSON 4.4. MULTIPLE LINEAR REGRESSION. Residual Analysis
Residuals and Residual Plots
Chapter 4, Regression Diagnostics Detection of Model Violation
Xbar Chart Farrokh Alemi, Ph.D..
Product moment correlation
Replicated Binary Designs
The Examination of Residuals
Chapter 13 Additional Topics in Regression Analysis
Inferences for Regression
Xbar Chart By Farrokh Alemi Ph.D
Regression Assumptions
Diagnostics and Remedial Measures
Presentation transcript:

Regression Assumptions Farrokh Alemi, Ph.D. This presentation reviews ordinary regression. This brief presentation was organized by Dr. Alemi.

Regression Assumption 1: Normal Distribution of Errors A key assumption of regression is that errors have a normal distribution. This is not always correct.

Normal Distribution of Errors Ordinary regression assumes that the error term is normally distributed. This can be visually depicted in “normal probability plot” or “normal quintile plot” of the residuals. In these plots, quintiles of the observed data are plotted against quintiles in a standard normal distribution. Here we see an example. A normal distribution is symmetric. In contrast, this plot shows a long asymmetric tail for the density of residuals. If it was normally distributed we would see a symmetric density function. The Q-Q plot to the right also shows radical departure from normality. A quick look shows that the quantiles do not fall where normal distribution quartiles are expected. Clearly assumption of normal distribution of error is not reasonable.

Regression Assumption 2: Independent Observations A key assumption of regression is that each observation is independent of the others. This is not always correct.

Independent Observations Autocorrelation among the observations can identify dependence in observations. The Y-axsis shows the correlation. The X-axis shows the lag in the data. A lag of 2 means that variables 2 time periods away from each other were correlated. In this graph, we see that variables with lag of 1 through 30 have relatively small corrections with each other. Therefore it may be reasonable to assume that observations are independent.

Regression Assumption 3: Homoscedasticity Regression assumes that the standard deviation of observation does not change over the independent variables. The violation of homoscedasticity assumption is called heteroscedasticity. It refers to the situation where standard deviation of the sample changes over time.

Heteroscedasticity Heteroscedasticity can be detected by plotting residuals over time or over any of the independent variables. If the dispersion of residuals is increasing or decreasing, then the assumption may be violated. Figure shows a situation where residuals are increasingly becoming larger as values of independent variable increases. The plot of residuals over time suggests that the variation in residuals is changing over time; we are getting less accurate (bigger residuals) over time. The Q-Q plot also shows violation of normal assumption. The variance of the error terms has increased over time.

Regression Assumption 4: Model Form is Correct A key assumption of regression is that the model form is correct and only parameters of the model need to be estimated from data. This is not always correct.

Is Model Form Correct? If the linear assumption of linear relationship between the model and the dependent variable is correct, then you would expect to see a linear relationship between Y and model predictions.

Is Model Form Correct? This is not always the case. Here we see a non-linear relationship between regression and the dependent variable.

Is Model Form Correct? Here again we see another non-linear relationship. Here the true relationship is exponential.

Is Model Form Correct? Here again we see another non-linear relationship. The shape of diagnostic plots can tell us whether the linearity assumption is met. The easiest way to see this is X-Y plots, such as Q-Q plot may also be necessary to see the violations of linearity assumption.

If Non-Linear Transform Data before Regression If the relationship between dependent and independent variables is not linear, then data should be transformed before completing the regression.

If the dependent variable is a function of a constant to the power of the independent variable, then log of the dependent variable will be linearly related to the independent variable. The left hand side shows the relationship between Y and X before transformation. It is not linear. The right hand side shows the relationship between log of Y and X, now it is linear. This example shows the importance of transforming the data before doing a linear regression.

If the dependent variable is a function of independent variable taken to a power, then log of the dependent variable will be linearly related to the log of the independent variable. The left hand side shows the relationship between Y and X before transformation. It is non linear. The right hand side shows the relationship between log of Y and log of X, now linear. This example shows the importance of transforming the data before doing a linear regression.

If the dependent variable is a function of 1 divided by a linear function of independent variable, then 1 divided by Y is linearly related to the independent variable. The left hand side shows the relationship between Y and X before transformation. It is non linear. The right hand side shows the relationship between inverse of Y and X, now linear. These examples show the importance of transforming the data before doing a linear regression.

Find the Best Fit In practice, the relationship between X and Y is not known. It may make sense to fit several different equations to the data so the linearity of the data can be assumed. Excel can be used to create a scatter plot. In scatter plot, a trend line equation can be fitted to the data. Polynomial, power, logarithmic, and exponential are examples of trend line that can be fitted to the data within Excel. Once the best fitted line has been determined then the right transformation of the data can be established.

Regression assumptions must be verified To use ordinary regression, verify assumptions and transform data.