1 Module II Lecture 3: Misspecification: Non-linearities Graduate School Quantitative Research Methods Gwilym Pryce.

Slides:



Advertisements
Similar presentations
Multivariate Regression
Advertisements

Polynomial Regression and Transformations STA 671 Summer 2008.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Qualitative Variables and
Bivariate Regression Analysis
The Use and Interpretation of the Constant Term
Graduate School Gwilym Pryce
Lecture 8 Relationships between Scale variables: Regression Analysis
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
Choosing a Functional Form
1 Lecture 4:F-Tests SSSII Gwilym Pryce
1 Module II Lecture 4:F-Tests Graduate School 2004/2005 Quantitative Research Methods Gwilym Pryce
Module II Lecture 6: Heteroscedasticity: Violation of Assumption 3
Chapter 13 Multiple Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Multiple Linear Regression Model
Statistics for the Social Sciences
Statistics for Managers Using Microsoft® Excel 5th Edition
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Multiple Regression
Chapter 5 Heteroskedasticity. What is in this Chapter? How do we detect this problem What are the consequences of this problem? What are the solutions?
Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Review.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Ordinary Least Squares
Chapter 8 Forecasting with Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
9 - 1 Intrinsically Linear Regression Chapter Introduction In Chapter 7 we discussed some deviations from the assumptions of the regression model.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Inference for regression - Simple linear regression
Hypothesis Testing in Linear Regression Analysis
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Specification Error I.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Lecturer: Kem Reat, Viseth, PhD (Economics)
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Module II Lecture 1: Multiple Regression
The simple linear regression model and parameter estimation
Chapter 4 Basic Estimation Techniques
Inference for Least Squares Lines
(Residuals and
Stats Club Marnie Brennan
Prepared by Lee Revere and John Large
Chapter 12 Review Inference for Regression
Simple Linear Regression
Presentation transcript:

1 Module II Lecture 3: Misspecification: Non-linearities Graduate School Quantitative Research Methods Gwilym Pryce

2 Summary of Lecture 2: 1. ANOVA in regression 2. Prediction 3. F-Test 4. Regression assumptions 5. Properties of OLS estimates

3 TSS = REGSS + RSS The sum of squared deviations of y from the mean (i.e. the numerator in the variance of y equation) is called the TOTAL SUM OF SQUARES(TSS) The sum of squared deviations of error e is called the RESIDUAL SUM OF SQUARES * (RSS) * sometimes called the “error sum of squares” The difference between TSS & RSS is called the REGRESSION SUM OF SQUARES # (REGSS) # the REGSS is sometimes called the “explained sum of squares” or “model sum of squares”  TSS = REGSS + RSS R 2 = REGSS/ TSS

4 4. Regression assumptions For estimation of a and b and for regression inference to be correct: 1. Equation is correctly specified: –Linear in parameters (can still transform variables) –Contains all relevant variables –Contains no irrelevant variables –Contains no variables with measurement errors 2. Error Term has zero mean 3. Error Term has constant variance

5 4. Error Term is not autocorrelated –I.e. correlated with error term from previous time periods 5. Explanatory variables are fixed –observe normal distribution of y for repeated fixed values of x 6. No linear relationship between RHS variables –I.e. no “multicolinearity”

6 5. Properties of OLS estimates If the above assumptions are met, OLS estimates are said to be BLUE: –Best I.e. most efficient = least variance –Linear I.e. best amongst linear estimates –Unbiased I.e. in repeated samples, mean of b =  –Estimates I.e. estimates of the population parameters.

7 Plan of Lecture 3: 1. Consequences of non-linearities 2. Testing for non-linearities –(a) visual inspection of plots –(b) t-statistics –(c) structural break tests 3. Solutions –(a) transform variables –(b) split the sample –(c) dummies –(d) use non-linear estimation techniques

8 1. Consequences of non- linearities Depending on how severe the non- linearity is, a, and b will be misleading: –estimates may be “biased” –i.e. they will not reflect the “true” values of 

9  ~ is a biased estimator of 

10 2. Testing for non-linearities: (a) visual inspection of plots scatter plots of two variables: –if you only have two or three variables then looking at scatter plots of these variables can help identify non-linear relationships in the data –but when there are more than 3 variables, non-linearities can be very complex and difficult to identify visually:

11 –What can appear to be random variation of data points around a linear line of best fit in a 2-D plot, can turn out to have a systematic cause when a third variable is included and a 3-D scatter plot is examined. Same is true when comparing 3D with higher dimensions e.g. Suppose that there is a quadratic relationship between x,y and z. But that this is only visible in the data if one controls for the influence of a fourth variable, w. But one does not know this, so looking at x, y and z, they appear to have a linear relationship.

12 2. Testing for non-linearities: (b) t-statistics Sometimes variables that we would expect (from intuition or theory) to have a strong effect on the dependent variable turn out to have low t-values. –If so, then one might suspect non-linearities. –Try transforming the variable (e.g. take logs) and re-examine the t-values e.g. HOUSING DEMAND = a + b AGE OF BORROWER –surprisingly, age of borrower may not be that significant –but this might be because of a non-linearity: housing demand rises with age until mid-life, and starts to decrease as children leave home. Try Age 2 instead and check t-value.

13 There may be non-linearities caused by interactions between variables: –try interacting explanatory variables and examining t-values e.g. HOUSE PRICE = a + b SIZE OF WINDOW + c VIEW But size of window may only add value to a house if there is a nice view, and having a nice view may only add value if there are windows. Try including and interactive term as well/instead: –HOUSE PRICE = a +…+ d SIZE OF WINDOW * VIEW In SPSS you would do this by creating a new variable using the COMPUTE command: –COMPUTE SIZE_VEW = SIZE OF WINDOW * VIEW and then including the new variable in the regression.

14 2. Testing for non-linearities: (c) shifts & structural break tests Sometimes certain observations display consistently higher y values. If this difference can be modelled as a parallel shift of the regression line, then we can incorporate it into our model simply by including an appropriate dummy variable –e.g. male = 1 or 0;

15 Apparent Intercept Shift in data:

16 Data shifts in 3-Dimensions: (NB: the shift is the slightly lower prices for terrace = 1)

17 However, sometimes there is an apparent shift in the slope not just/instead of the intercept. Being able to observe this visually is difficult if you have lots of variables since the visual symptoms will only reveal themselves if the data has been ordered appropriately.

18 Apparent slope shift:

19 Solutions: (a) Transforming Variables Note that “Linear” regression analysis does not preclude analysis of non-linear relationships (a common misconception). –It merely precludes estimation of certain types of non-linear relationships I.e. those that are non-linear in parameters: y = a x + a z + b xz

20 However, so long as the non-linearity can fit within the basic structure of y = a + bx I.e. it is linear in parameters –then we can make suitable transformations of the variables and estimate by OLS:

21 –e.g. 1 y = a + b x 2 we can simply create a new variable, z = x 2 and run a regression of y = a + b z including the square of x is appropriate if the scatter plot of y on x is “n” shaped or “u” shaped –e.g. 2 y = b + bx 3 we can create a new variable, z = x3 and run a regression of y = a + b z including the square of x is appropriate if the scatter plot of y on x is “s” shaped or has a back-to-front “s” shape.

22 E.g.1 Scatter plot suggests a quadratic relationship

23 Regressing y on the square of x should give a better fit

24 E.g. 2 Scatter plot suggests a cubic relationship

25 Regressing y on the cube of x should give a better fit:

26 E.g. 3 Scatter Plot suggests a cubic relationship

27 Cubing x should give a better fit

28 E.g. 4 Scatter plot suggests a quadratic relationship

29 Squaring x should give a better fit

30 Log-log and log-linear models One of the most common transformations of either the dependent variable and/or the the explanatory variables is to take logs. –It is appropriate to transform x if the scatter plot of y on x has either an “r” shape, or an “L” shape.

31 E.g 5 scatter plot suggests a logarithmic relationship

32 Taking the log of x should result in a better fit

33 E.g.6 scatter plot suggests a logarithmic relationship

34 Taking the log of x should result in a better fit

35 E.g. 7 scatter plot suggests an exponential relationship

36 Taking the exponent of x should result in a better fit

37 Solutions:b) Split the sample

38 Quite a drastic measure: –split the sample and estimate two OLS lines separately –in practice its not easy to decide where exactly to split the sample –we can do an F-test to help us test whether there really is a structural break: “Chow Tests” –but even if the F-test shows that there is a break, it can often be remedied by squaring the offending variable, or using slope dummies...

39 Solutions: (c) Dummy variables A dummy variable is one that takes the values 0 or 1: –e.g. 1 if male, 0 if female If we include the dummy as a separate variable in the regression we call it an Intercept Dummy If we multiply it by one of the explanatory variables, then we call it a Slope Dummy

40 Intercept Dummies: Original equation: y = a + bx now add a dummy: (eg.D= 0 if white, D= 1 if non-white) y = a + bx + cD c measures how much higher (lower if c is negative) the dependent variable is for non-whites

41 Slope Dummies: Suppose race has an effect on the slope of the regression line rather than the intercept. You can account for this by simply multiplying the relevant explanatory variable by the race dummy: y = a + bx + cD*x c measures how much higher (lower if c is negative) the b slope parameter would be for non-whites

42 Solutions: (d) Non-linear estimation When you can’t satisfactorily deal with the non-linearity by simply transforming variables, you can fit a non-linear curve to the data These are usually based on some sort of grid search (I.e. trial and error) for the correct value of the non-linear parameter. –E.g. y = a + b 1 e b 2 x + b 3 z cannot be transformed to linearity in a way that would allow us to derive estimates for b 2 and b 3

43 SPSS does allow non-linear estimation –go to Analyse, Regression, non-linear But we shall not cover this topic in any more detail on this course since most types of non-linearity in data can be adequately dealt with using transformations of the variables.

44 Summary 1. Consequences of non-linearities 2. Testing for non-linearities –(a) visual inspection of plots –(b) t-statistics –(c) structural break tests 3. Solutions –(a) transform variables –(b) split the sample –(c) dummies –(d) use non-linear estimation techniques

45 Reading: Kennedy (1998) “A Guide to Econometrics”, Chapters 3, 5 and 6