Advanced Quantitative Techniques

Slides:



Advertisements
Similar presentations
13 Multiple Regression Chapter Multiple Regression
Advertisements

Week 13 November Three Mini-Lectures QMM 510 Fall 2014.
Topic 9: Remedies.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r Assumptions.
Lecture 25 Multiple Regression Diagnostics (Sections )
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24 Multiple Regression (Sections )
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Regression Diagnostics Checking Assumptions and Data.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western.
Examining Relationships in Quantitative Research
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Examining Relationships in Quantitative Research
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Residuals. Why Do You Need to Look at the Residual Plot? Because a linear regression model is not always appropriate for the data Can I just look at the.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Metrics Lab Econometric Problems Lab. Import the Macro data from Excel and use first row as variable names Time set the year variable by typing “tsset.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Advanced Quantitative Techniques
Inference for Least Squares Lines
Advanced Quantitative Techniques
Linear Regression.
Multiple Regression Prof. Andy Field.
Advanced Quantitative Techniques
Regression Diagnostics
Regression Analysis Simple Linear Regression
Violations of Regression Assumptions
Chapter 12: Regression Diagnostics
Lab 9 – Regression Diagnostics
Chapter 13 Simple Linear Regression
Diagnostics and Transformation for SLR
Regression Analysis Week 4.
BA 275 Quantitative Business Methods
Chapter 14 – Correlation and Simple Regression
Residuals The residuals are estimate of the error
Chapter 17 – Simple Linear Regression and Correlation
Multiple Linear Regression
The greatest blessing in life is
Chapter 4, Regression Diagnostics Detection of Model Violation
Three Measures of Influence
Product moment correlation
Regression Forecasting and Model Building
The Examination of Residuals
Chapter 13 Additional Topics in Regression Analysis
Diagnostics and Remedial Measures
Diagnostics and Transformation for SLR
Model Adequacy Checking
Diagnostics and Remedial Measures
Presentation transcript:

Advanced Quantitative Techniques Lab 9: regression diagnostics II Nov 10th 2016

today Recap of assumptions of OLS Diagnostics : Homoskedasticity Errors are normally distributed around the mean. Homoskedasticity (errors should not get larger as X gets larger) Errors are independent from eachother (no autocorrelation) Diagnostics : Homoskedasticity Multicollinearity linearity

OLS requirement (assumption) Diagnostics test 1. Errors are normally distributed around the mean. Plot and identify Studentize residuals Leverage Cook’s D DFITS 2. Homoskedasticity (errors should not get larger as X gets larger) rvfplot 3. Errors are independent from eachother (no autocorrelation) Durbin–Watson test: not now! 4. Variables are interval #s (most important for dependent variable) Use a logit or probit regression for 0/1 dependent variables. (or other such as poisson for ranked not here) 5. Linearity Use other form of regression if dependent variable has non-linear distribution. Some dependent variables (like income) make more sense as a ln(Y).

Prep dataset Download from IADB website Open in excel. Format in pivot tables. gen walktrans = estimatedpercentageofcommuterswh + estimatedpercentageofjourneytowo BUT.. only creates sum for observations that have BOTH data points egen greencommute = rowtotal(estimatedpercentageofcommuterswh estimatedpercentageofjourneytowo) Use this instead! Assumes zero for missing variables. Not perfect, but better for what we need. replace greencommute = greencommute[.] if greencommute==0 Make sure that we don’t count cities with no data as zero % of ‘green’ commuters!

REGRESS --Does the level of ‘green’ commuters relate to density, service distribution (schools), safety, centrality (proxmity to transit stop), average travel time? reg greencommute maximumallowabledensityinnewhous whatistheaveragetraveltimeinminu theshareoftheareaofthecityinneig theestimatedpercentageofthecityw theaveragetimeofthejourneytowork

list pr greencommute in 1/10 predict pr list pr greencommute in 1/10 predict res, residual list res in 1/10 _n

Residual rvpplot greencommute

Studentized Residuals Studentized residuals are a type of standardized residual that can be used to identify outliers. predict r, rstudent sort r list id r in 1/10 list id r in -10/ list r id bwt age lwt smoke ht ui ftv black if abs(r) > 2 display 189*0.05 5% * N (>2) 1%* N (>3)

Leverage Leverage is a measure of how far an observation deviates from the mean of that variable. predict lev, leverage Generally, a point with leverage greater than (2k+2)/n should be carefully examined. k =number of predictors (in our example 7) n = number of observations. (in our example 189) display (2*7+2)/189 list bwt age lwt smoke ht ui ftv black id lev if lev >.08465608

Cook’s D Overall measurement of both information on the residual and leverage. The lowest value that Cook's D can assume is zero, and the higher the Cook's D is, the more influential the point. The convention for a cut-off point for undue influence from a single observation as measured through Cook’s D is 4/n. predict d, cooksd list id bwt age lwt smoke ht ui ftv black d if d>4/189

DFITS similar to Cook’s D except that they scale differently, but they give us similar answers. can be either positive or negative, with numbers close to zero corresponding to the points with small or zero influence. The cut-off point for DFITS is 2*sqrt(k/n) predict dfit, dfits list id bwt age lwt smoke ht ui ftv black dfit if abs(dfit)>2*sqrt(7/189)

We find that Id=226 is an observation that both has a large residual and large leverage.  Such points are potentially the most influential. regress bwt age lwt smoke ht ui ftv black if id!=226

Diagnostics 3: Checking Homoscedasticity of Residuals rvfplot, yline(0) A commonly used graphical method is to plot the residuals versus predicted values. If the model is well-fitted, there should be no pattern to the residuals plotted against the fitted values. If the variance of the residuals is non-constant then the residual variance is said to be "heteroscedastic. We do this by the rvfplot command. the yline(0) option is to put a reference line at y=0. We see that the pattern of the data points is getting a little narrower towards the right end, which is an indication of heteroscedasticity. In our case, there is a little narrowing in the error bandwidth, but it is minor.

Diagnostics 4: Checking for Multicollinearity < 10 vif multicollinearity will arise if we have put in too many variables that measure the same thing. Estat vif

Diagnostics 5: Checking Linearity Bivariate regression twoway (scatter bwt lwt) (lfit bwt lwt) (lowess bwt lwt) We will try to illustrate some of the techniques that you can use.

Diagnostics 5: Checking Linearity Multiple regression: the most straightforward thing to do is to plot the residuals against each of the predictor variables in the regression model. If there is a clear nonlinear pattern, there is a problem of nonlinearity. Otherwise, we should see for each of the plots just a random scatter of points. scatter res age scatter res lwt

Diagnostics 6: autocorrelation Durbin-Watson test D-W stat is close to 2.0 if there is no autocorrelation, equal to 0 if there is perfect positive autocorrelation equal to 4.0 if there is perfect negative autocorrelation.

Evaluate importance of independent variables Standardized betas Convert change into SD units. Regress y x1 x2 x3, beta Evaluate importance of independent variables F stat of all – f stat of all + or – variable of interest. Partitioning variance nestreg: regress y x1 x2 x3