Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.

Slides:



Advertisements
Similar presentations
Class 18 – Thursday, Nov. 11 Omitted Variables Bias
Advertisements

Regression and correlation methods
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Economics 105: Statistics GH 24 due Wednesday. Hypothesis Tests on Several Regression Coefficients Consider the model (expanding on GH 22) Is “race” as.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Multiple Linear Regression Model
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Stat 112 Notes 9 Addendum. Interpreting Multiple Regression Coefficients: Another Example A marketing firm studied the demand for a new type of personal.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Lecture 25 Multiple Regression Diagnostics (Sections )
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Lecture 20 Simple linear regression (18.6, 18.9)
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Regression Diagnostics Checking Assumptions and Data.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Ch. 14: The Multiple Regression Model building
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Stat Notes 4 Chapter 3.5 Chapter 3.7.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Objectives of Multiple Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Correlation & Regression
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Chapter 13 Multiple Regression
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 24 Building Regression Models.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Introduction to Regression Analysis
AP Statistics Chapter 14 Section 1.
Statistics in MSmcDESPOT
Diagnostics and Transformation for SLR
CHAPTER 26: Inference for Regression
Multiple Regression Models
Diagnostics and Transformation for SLR
Presentation transcript:

Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted Variables Bias Discuss final project

Interpreting Multiple Regression Coefficients: Another Example A marketing firm studied the demand for a new type of personal digital assistant (PDA). The firm surveyed a sample of 75 consumers. Each respondent was initially shown the new device and then asked to rate the likelihood of purchase on a scale of 1 to 10, with 1 implying little chance of purchase and 10 indicating almost certain purchase. The age (in years) and income (in thousands of dollars) were recorded for each respondent. The data are in pda.JMP.

Simple Regressions to Predict Rating (Likelihood of Purchase) As income rises, the likelihood of purchase also increases; specifically a $10,000 increase in income is associated with a 0.7 increase in rating. As age increases, the likelihood of purchase also increases; specifically a 10-year increase in age is associated with a 0.9 increase in rating.

Multiple Regression For any fixed level of income, the average rating decreases by 0.7 if Age increases by 10 years. For all fixed income levels, old consumers have higher ratings on average than young consumers and at all fixed age levels, average ratings increase as income rises. Positive association between age and rating is a result of positive association between age and income.

Air Pollution and Mortality Data set pollution.JMP provides information about the relationship between pollution and mortality for 60 cities between The variables are y (MORT)=total age adjusted mortality in deaths per 100,000 population; PRECIP=mean annual precipitation (in inches); EDUC=median number of school years completed for persons 25 and older; NONWHITE=percentage of 1960 population that is nonwhite; NOX=relative pollution potential of No x (related to amount of tons of No x emitted per day per square kilometer); SO2=relative pollution potential of SO 2

Multiple Regression: Steps in Analysis 1.Preliminaries: Define the question of interest. Review the design of the study. Correct errors in the data. 2.Explore the data. Use graphical tools, e.g., scatterplot matrix; consider transformations of explanatory variables; fit a tentative model; check for outliers and influential points. 3.Formulate an inferential model. Word the questions of interest in terms of model parameters.

Multiple Regression: Steps in Analysis Continued 4.Check the Model. (a) Check the model assumptions of linearity, constant variance, normality. (b) If needed, return to step 2 and make changes to the model (such as transformations or adding terms for interaction and curvature); (c) Drop variables from the model that are not of central interest and are not significant. 5.Infer the answers to the questions of interest using appropriate inferential tools (e.g., confidence intervals, hypothesis tests, prediction intervals). 6.Presentation: Communicate the results to the intended audience.

Air Pollution and Mortality Question of interest: What is the association between the air pollution variables (NOX and S02) once environmental variables (precipitation) and demographic variables have been taken into account?

Curvature in relationship between Mortality and S02. Tukey’s Bulging Rule suggests transforming S02 to log S02 as a possible remedy. The scatterplot of Mortality vs. NOX is “crunched.” When a scatterplot between a response and explanatory variable “crunched,” transforming the explanatory variable to log(explanatory variable) is a good idea.

Initial Model Checking for influential points: New Orleans has Cook’s distance of 1.75 and leverage 0.45>(3*6/60). We should remove New Orleans, noting that it has unusual explanatory variables and that our conclusions do not apply to explanatory variables in the range of New Orleans.

Because New Orleans is an influential point and has leverage 0.45>(3*6/60)=0.30, we remove it and note that our model does apply to observations in the range of explanatory variables of New Orleans.

Checking the Model

Model Building Model Parsimony: If a variable is not of central interest and is not significant, we remove it from the model. We can remove Education. We don’t remove log NOX since it is of central interest.

Inference About Questions of Interest Strong evidence that mortality is positively associated with S02 for fixed levels of precipitation, education, nonwhite, NOX. No strong evidence that mortality is associated with NOX for fixed levels of precipitation, education, nonwhite, S02.

Multiple Regression and Causal Inference Goal: Figure out what the causal effect on mortality would be of decreasing air pollution (and keeping everything else in the world fixed) Lurking variable: A variable that is associated with both air pollution in a city and mortality in a city. In order to figure out whether air pollution causes mortality, we want to compare mean mortality among cities with different air pollution levels but the same values of the confounding variables. If we include all of the lurking variables in the multiple regression model, the coefficient on air pollution represents the change in the mean of mortality that is caused by a one unit increase in air pollution.

Omitted Variables What happens if we omit a lurking variable from the regression, e.g., percentage of smokers? Suppose we are interested in the causal effect of on y and believe that there are lurking variables and that is the causal effect of on y. If we omit the confounding variable,, then the multiple regression will be estimating the coefficient as the coefficient on. How different are and.

Omitted Variables Bias Formula Suppose that Then Formula tells us about direction and magnitude of bias from omitting a variable in estimating a causal effect. Omitted variable bias: Formula also applies to least squares estimates, i.e.,