Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will e-mail Homework 6 by Friday. It will be due on Friday,

Slides:



Advertisements
Similar presentations
Multiple Regression. Multiple regression Typically, we want to use more than a single predictor (independent variable) to make predictions Regression.
Advertisements

Class 18 – Thursday, Nov. 11 Omitted Variables Bias
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 13 Multiple Regression
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Korelasi Ganda Dan Penambahan Peubah Pertemuan 13 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 1 Outline: Tue, Jan 13 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Stat 512 – Lecture 18 Multiple Regression (Ch. 11)
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Stat 112: Lecture 22 Notes Chapter 9.1: One-way Analysis of Variance. Chapter 9.3: Two-way Analysis of Variance Homework 6 is due on Friday.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Stat 112: Lecture 19 Notes Chapter 7.2: Interaction Variables Thursday: Paragraph on Project Due.
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Ch. 14: The Multiple Regression Model building
Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter )
Class 24: Tues., Dec. 7th Today: Two-way analysis of variance Thursday: Design of Experiments Schedule: –Thurs., Dec. 9 th – Final class –Mon., Dec. 13.
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Inference for regression - Simple linear regression
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Modeling Possibilities
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Moderation & Mediation
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
CHAPTER 27 PART 1 Inferences for Regression. YearRate This table.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Stat 112 Notes 23. Quiz 4 Info 4 double sided sheets of notes Covers interactions, models with categorical variables and interactions, one way analysis.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Lecture 1 Outline: Thu, Sep 4 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
AP Statistics Chapter 24 Notes “Comparing Two Sample Means”
Stat 112 Notes 8 Today: –Chapters 4.3 (Assessing the Fit of a Regression Model) –Chapter 4.4 (Comparing Two Regression Models) –Chapter 4.5 (Prediction.
Stats Methods at IC Lecture 3: Regression.
Chapter 14 Introduction to Multiple Regression
Lecture 12 More Examples for SLR More Examples for MLR 9/19/2018
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Regression and Categorical Predictors
Presentation transcript:

Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday, Dec. 1 st (the Friday after Thanksgiving)

Interaction Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X 1 and X 2 ). There is an interaction between X 1 and X 2 if the impact of an increase in X 2 on Y depends on the level of X 1. To incorporate interaction in multiple regression model with two continuous variables, we add the explanatory variable. There is evidence of an interaction if the coefficient on is significant (t- test has p-value <.05).

Accidents Example The number of car accidents on a stretch of highway seems to be related to the number of vehicles that travel over it and the speed at which they are traveling. A city alderman has decided to ask the county sheriff to provide him with statistics covering the last few years with the intention of examining these data statistically so that she can introduce new speed laws that will reduce traffic accidents. accidents.JMP contains data for different time periods on the number of cars passing along the stretch of road, the average speed of the cars and the number of accidents during the time period.

Toy Factory Manager Data

Model without Interaction The lines in the plot show the mean time for run given run size for each of the three managers. The lines are parallel because the model assumes no interaction.

Interaction Model involving Categorical Variables in JMP To add interactions involving categorical variables in JMP, follow the same procedure as with two continuous variables. Run Fit Model in JMP, add the usual explanatory variables first, then highlight one of the variables in the interaction in the Construct Model Effects box and highlight the other variable in the interaction in the Columns box and then click Cross in the Construct Model Effects box.

Interaction Model

Interaction between run size and Manager: The effect on mean run time of increasing run size by one is different for different managers. Effect Test for Interaction: Manager*Run Size Effect test tests null hypothesis that there is no interaction (effect on mean run time of increasing run size is same for all managers) vs. alternative hypothesis that there is an interaction between run size and managers. p-value = Evidence that there is an interaction.

The runs supervised by Manager A appear abnormally time consuming. Manager b has higher initial fixed setup costs than Manager c ( > ) but has lower per unit production time (0.136<0.259).

Interaction Profile Plot Lower left hand plot shows mean time for run vs. run size for the three managers a, b and c.

Interactions Involving Categorical Variables: General Approach First fit model with an interaction between categorical explanatory variable and continuous explanatory variable. Use effect test on interaction to see if there is evidence of an interaction. If there is evidence of an interaction (p-value <0.05 for effect test), use interaction model. If there is not strong evidence of an interaction (p-value >0.05 for effect test), use model without interactions. The model without interactions is easier to interpret but should only be used if there is not strong evidence for interactions.

Example: A Sex Discrimination Lawsuit Did a bank discriminatorily pay higher starting salaries to men than to women. Harris Trust and Savings Bank was sued by a group of female employees who accused the bank of paying lower starting salries to women. The data in harrisbank.JMP are the starting salaries for all 32 male and all 61 female skilled, entry-level clerical employees hired by the bank between 1969 and 1977, as well as the education levels and sex of the employees.

No evidence of an interaction between Sex and Education. Fit model without interactions.

Discrimination Case Regression Results Strong evidence that there is a difference in the mean starting salaries of women and men of the same education level. Estimated difference: Men have =$ higher mean starting salaries than women of the same education level. 95% confidence interval for mean difference = (2*$214.55,2*$477.25)=($429.10,$854.50). Bank’s defense: Omitted variable bias. Variables such as Seniority, Age, Experience also need to be controlled for.

Model Building When we have many potential explanatory variables, how should we decide which to use? 1.Suppose our goal is to estimate the causal effect of a variable, controlling for all lurking variables (e.g., effect of pollution on mortality). Then it is best to include all possible lurking variables. Omitting variables, even if they do not appear significant leads to potential bias. 2.Suppose our goal is to understand the association of certain variable(s) with a response, holding fixed for certain other variables. We should think carefully about what variables we want to hold fixed. 3.Suppose our goal is to predict the response based on explanatory variables and we are not particularly interested in interpreting the coefficients on individual variables. Then it is a good idea to use only those explanatory variables which are of value for predicting the response. Using too many variable costs too many degrees of freedom and will hurt our out of sample predictions.

Model Building for Prediction The handout “Model Building for Prediction” discusses how to select a subset of the explanatory variables to use in the model when our goal is to make the best predictions, and we are not concerned with interpreting the coefficients of variables.