Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.

Slides:



Advertisements
Similar presentations
A Brief Introduction to Spatial Regression
Advertisements

Class 18 – Thursday, Nov. 11 Omitted Variables Bias
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
Lecture 26 Model Building (Chapters ) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Stat 112: Lecture 19 Notes Chapter 7.2: Interaction Variables Thursday: Paragraph on Project Due.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
SIMPLE LINEAR REGRESSION
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter )
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Stat 112 Notes 11 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Friday. I will Homework 4 tonight, but it will not be due.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Spreadsheet Problem Solving
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
SIMPLE LINEAR REGRESSION
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
Regression Method.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
CHAPTER 14 MULTIPLE REGRESSION
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
Outline When X’s are Dummy variables –EXAMPLE 1: USED CARS –EXAMPLE 2: RESTAURANT LOCATION Modeling a quadratic relationship –Restaurant Example.
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
©2006 Thomson/South-Western 1 Chapter 14 – Multiple Linear Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise.
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
Data Analysis.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
REGRESSION REVISITED. PATTERNS IN SCATTER PLOTS OR LINE GRAPHS Pattern Pattern Strength Strength Regression Line Regression Line Linear Linear y = mx.
Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Understanding Standards Event Higher Statistics Award
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
SIMPLE LINEAR REGRESSION
Presentation transcript:

Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables for categorical variables

Omitted Variable Bias Formula Revisited From paper “Re-examining Criminal Behavior: The Importance of Omitted Variable Bias,” by David Mustard, Review of Economics and Statistics, To what extent do changes in the arrest rate alter the willingness of individuals to engage in criminal activity? Becker’s economic theory of crime: Those involved in illegal activities respond to incentives in much the same way as those who engage in legal activities respond.

Regressions for economic theory of crime Goal: Figure out the causal effect of an increase in the arrest rate on the number of crimes committed. When Y=log number of crimes committed in a city is regressed on X=log crime rate in a city, the coefficient on X is for murder rate, for assault rate and for burglary rate (coefficient of in log-log regression implies that a 1% decrease in the arrest rate for burglaries is associated with a 1.17% decrease in the number of burglaries). Simple regression omits the confounding variable of the conviction rate. What is the direction of the omitted variables bias?

Omitted Variables Bias Formula = the explanatory variable for which we want to find its causal effect on y. = confounding variables we control for by including them in regression. = omitted confounding variable. Then or equivalently Formula tells us about direction and magnitude of bias from omitting a variable in estimating a causal effect. Formula also applies to least squares estimates, i.e.,

Application of OVB formula y=crime rate Here is probably negative. Increase in conviction rate should reduce crimes, holding other variables fixed. Mustard presents evidence that is negative. As more people are arrested for a given offense level, amount of evidence against each arrestee decreases. If and both negative, then. The estimate that a 1% increase would reduce the burglary rate by 1.17% is an underestimate of the impact of increase in arrest rate on reducing burglary rate (i.e., coefficient in log-log regression is <-.0117, reduction is greater than 1.17%).

Specially Constructed Explanatory Variables Interaction variables Squared and higher polynomial terms for curvature Dummy variables for categorical variables.

Interaction Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X 1 and X 2 ). There is an interaction between X 1 and X 2 if the impact of an increase in X 2 on Y depends on the level of X 1. To incorporate interaction in multiple regression model, we add the explanatory variable. There is evidence of an interaction if the coefficient on is significant (t-test has p- value <.05).

Interaction Model for Pollution Data

Polynomials and Interactions Example An analyst working for a fast food chain is asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp

Polynomial Terms for Curvature To model a curved relationship between y and x, we can add squared (and cubic or higher order) terms as explanatory variables. Fit as a multiple regression with two explanatory variables and To draw a plot of the estimated mean of Y|X, after Fit Model, click red triangle next to Response, Save Columns, Predicted Values. Then click Graph, Overlay Plot and Put Predicted Revenue and Revenue into Y, Columns and Income into X. Left Click on the Box next to Predicted Revenue in the legend and select Connect Points.

Interpreting Coefficients and Tests for Polynomial Model Coefficients are not directly interpretable. Change in the mean of Y that is associated with a one unit increase in X depends on X. To test whether the multiple regression model with X and X 2 as predictors provides better predictions than the multiple regression model with just X, use the p-value of the t-test on the X 2 coefficient (null hypothesis is that X 2 has a zero coefficient). Plot residuals vs. X to determine whether quadratic model is appropriate. If there is still a pattern in the mean, can try a cubic model with X, X 2 and X 3.

Regression Model for Fast Food Chain Data Interactions and polynomial terms can be combined in a multiple regression model For fast food chain data, we consider the model This is called a second-order model because it includes all squares and interactions of original explanatory variables.

fastfoodchain.jmp results Strong evidence of a quadratic relationship between revenue and age, revenue and income. Moderate evidence of an interaction between age and income.

Categorical variables Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County). Categorical variables can be incorporated into regression through dummy variables. They can also be directly incorporated, as is done in JMP.

Fedex data set Before it was a well-known company, FedEx undertook a campaign to promote use of its Courier packages (now called Fedex Paks). Sales representatives visited customers and worked to increase their use of the packages. Some of the customers were already aware of the Courier packaging before the promotion began, but it was unknown to others. Response variable: Number of Courier package shipments per month. Explanatory variables: (1) Number of contact hours customer had with sales representative (hours of effort), (2) Categorical variable indicating whether or not promotion was effective for customers who were already aware of product (aware) Question: Was this promotion more effective for customers who were already aware of the product, or was it more effective for those who had been unaware.

Two sample analysis Problem with two sample analysis: Hours of effort may be a confounding variable.

Parallel lines regression model

Parallel Regression Lines Model: