Multiple Regression Models: Interactions and Indicator Variables

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
To accompany Quantitative Analysis for Management, 9e by Render/Stair/Hanna 4-1 © 2006 by Prentice Hall, Inc., Upper Saddle River, NJ Chapter 4 RegressionModels.
Chapter 12 Multiple Regression
Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter 11 Multiple Regression.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
REGRESSION AND CORRELATION
Chapter 12: Multiple Regression and Model Building
Ch. 14: The Multiple Regression Model building
Multiple Regression Models
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Chapter 13: Inference in Regression
Chapter 12 Multiple Regression and Model Building.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chapter 14 Introduction to Multiple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Chapter 13 Multiple Regression
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Chapter 9 Minitab Recipe Cards. Contingency tests Enter the data from Example 9.1 in C1, C2 and C3.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
John Loucks St. Edward’s University . SLIDES . BY.
Business Statistics Multiple Regression This lecture flows well with
Prepared by Lee Revere and John Large
Regression and Categorical Predictors
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multiple Regression Models: Interactions and Indicator Variables

Today’s Data Set A collector of antique grandfather clocks knows that the price received for the clocks increases linearly with the age of the clocks. Moreover, the collector hypothesizes that the auction price of the clocks will increase linearly as the number of bidders increases. (Let’s hypothesize a first order MLR model.)

First order model y = β0 + β1x1 + β2x2 + ε y = auction price x1= age of clock (years) x2 = number of bidders

The regression equation is… AuctionPrice = - 1339 + 12.7 AgeOfClock + 86.0 Bidders Analysis of Variance Source DF SS MS F P Regression 2 4283063 2141531 120.19 0.000 Residual Error 29 516727 17818 Total 31 4799790

Tests of Individual β Parameters Predictor Coef SE Coef T P Constant -1339.0 173.8 -7.70 0.000 AgeOfClock 12.7406 0.9047 14.08 0.000 Bidders 85.953 8.729 9.85 0.000

What if the relationship between E(y) and either of the independent variables depends on the other? In this case, the two independent variables interact, and we model this as a cross-product of the independent variables.

For our example Do age and the number of bidders interact? In other words, is the rate of increase of the auction price with age driven upward by a large number of bidders? In this case, as the number of bidders increases, the slope of the price versus age line increases. To facilitate investigation, number of bidders has been separated into: A: 0-6 bidders B: 7-10 bidders C: 11-15 bidders

Are these slopes parallel, or do they change with the number of bidders?

Caution! Once an interaction has been deemed important in a model, all associated first-order terms should be kept in the model, regardless of the magnitude of their p-values.

Another Example: Graph and interpret the following findings Let’s say we want to study how hard students work on tests. We have some achievement-oriented students and some achievement-avoiders. We create two random halves in each sample, and give half of each sample a challenging test, the other an easy test. We measure how hard the students work on the test. The means of this study are: Achievement-oriented (n=100) Achievement –avoiders (n=100) Challenging test 10 5 Easy test

Conclusions E(y)= β0 + β1x1 + β2x2+ β3x1x2 The effect of test difficulty (x1) on effort (y) depends on a student’s achievement orientation (x2). Thus, the type of achievement orientation and test difficulty interact in their effect on effort. This is an example of a two-way interaction between achievement orientation and test difficulty.

Basic premises up to this point We have used continuous variables (we can assume that having a value of 2 on a variable means having twice as much of it as a 1.) We often work with categorical variables in which the different values have no real numerical relationship with each other (race, political affiliation, sex, marital status…) Democrat(1), Independent(2), Republican(3) Is a Republican three times as politically affiliated as a Democrat? How do we resolve this problem?

Dummy Variables A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. Dummy variables have two values: 0, 1 "Republican" variable: someone assigned a 1 on this variable is Republican and someone with an 0 is not. They act like 'switches' that turn various parameters on and off in an equation.

Creating Dummy Variables In Minitab, we can recode the categorical variable into a set of dummy variables, each of which has two levels. In the regression model, we will use all but one of the original levels. The level which is not included in the analysis is the category to which all other categories will be compared (base level.) You decide this. The coefficient on the variable in your regression will show the effect that being that variable has on your dependent variable.

Returning to the Clocks at Auction Data Set The collector of antique grandfather clocks knows that the price received for the clocks increases linearly with the age of the clocks and he hypothesized that the auction price of the clocks will increase linearly as the number of bidders increases. But let’s say he doesn’t have the exact number of bidders, only knows if there was a high number of bidders (we’ll say 9 and above) or a low number (below 9.)

Let’s Create a Dummy Variable in Minitab We’ll use the Bidders2Cat column Calc Make Indicator Variables In top box, specify that you want to make indicator variables for Bidders2Cat Let’s store results in C10 - C11 Once you have created the variables, name columns 10 and 11 ManyBidders and FewBidders Which is which? Why?

Before we run the analysis… Let’s say we decide to include ManyBidders (FewBidders is the base level.) Because FewBidders is not included, we can determine if ManyBidders predicts a different Auction Price than FewBidders. If ManyBidders is significant in our regression, with a positive β coefficient, we conclude that ManyBidders has a significant effect on the price of the clocks at auction.

Thinking Through the Variables What is x1? Let’s hypothesize the model in plain English… just looking at high/low bidders.) What’s the Null Hypothesis?

Run the Analysis Results of the t-test? What would happen if we used ManyBidders as our base?

Let’s look at your Journal Application What does it mean to create a dummy variable and when is it appropriate to do this? What are all the terms in the original model? This researcher started with a complex model and simplified it. Which model was better? How can we know?

Nested Models Two models are nested if both contain the same terms and one has at least one additional term. Example: The first (straight-line) model is nested within the second (curvilinear) model. The first model is the reduced model and the second is the full or complete model.

Which is better? How do we decide? In this example, we would test: To test, we compare the SSE for the reduced model (SSER) and the SSE for the complete model (SSEC). Which will be larger? At least one

Error is Always Greater for the Reduced Model SSER>SSEC Is the drop in SSE from fitting the complete model ‘large enough’? We use an F-test to compare models Here, we test the null hypothesis that our curvature coefficients simultaneously equal zero.

Test statistic: F F= drop in SSE/number of βs being tested s2 for larger model Table C4 (p. 766) gives you the critical value for F df for numerator (v1): Number of βs being tested df for denominator (v2): Number of βs in the complete model If F ≥ The critical F value, reject H0. At least one of the additional terms contributes information about the response.

Conclusions? Parsimonious models are preferable to big models as long as both have similar predictive power. A model is parsimonious when it has a small number of predictors In the end, choice of model is subjective.

Question 3 (Journal) What type of error do we risk making by conducting multiple t-tests? Pages 184, 188