Stat 112 Notes 20 Today: –Interaction Variables (Chapter 7.1-7.2) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)

Slides:



Advertisements
Similar presentations
Probability models- the Normal especially.
Advertisements

Multiple Regression. Multiple regression Typically, we want to use more than a single predictor (independent variable) to make predictions Regression.
Class 18 – Thursday, Nov. 11 Omitted Variables Bias
C 3.7 Use the data in MEAP93.RAW to answer this question
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
1 Home Gas Consumption Interaction? Should there be a different slope for the relationship between Gas and Temp after insulation than before insulation?
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 1 Outline: Tue, Jan 13 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Stat 112: Lecture 22 Notes Chapter 9.1: One-way Analysis of Variance. Chapter 9.3: Two-way Analysis of Variance Homework 6 is due on Friday.
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Stat 112: Lecture 23 Notes Chapter 9.3: Two-way Analysis of Variance Schedule: –Homework 6 is due on Friday. –Quiz 4 is next Tuesday. –Final homework assignment.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Stat 112: Lecture 19 Notes Chapter 7.2: Interaction Variables Thursday: Paragraph on Project Due.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter )
Class 24: Tues., Dec. 7th Today: Two-way analysis of variance Thursday: Design of Experiments Schedule: –Thurs., Dec. 9 th – Final class –Mon., Dec. 13.
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Inference for regression - Simple linear regression
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Modeling Possibilities
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
1 Prices of Antique Clocks Antique clocks are sold at auction. We wish to investigate the relationship between the age of the clock and the auction price.
 In Chapter 10 we tested a parameter from a population represented by a sample against a known population ( ).  In chapter 11 we will test a parameter.
CHAPTER 27 PART 1 Inferences for Regression. YearRate This table.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Stat 112 Notes 23. Quiz 4 Info 4 double sided sheets of notes Covers interactions, models with categorical variables and interactions, one way analysis.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Lecture 1 Outline: Thu, Sep 4 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Nonparametric Tests PBS Chapter 16 © 2009 W.H. Freeman and Company.
Stats Methods at IC Lecture 3: Regression.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
General principles in building a predictive model
Multiple Regression Analysis with Qualitative Information
Lecture 12 More Examples for SLR More Examples for MLR 9/19/2018
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Linear transformations
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
Presentation transcript:

Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)

Interaction Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X 1 and X 2 ). There is an interaction between X 1 and X 2 if the impact of an increase in X 2 on Y depends on the level of X 1.

Interaction Example What would you prefer, a leader with good intentions or a leader with evil intentions? What would you prefer, a smart leader or a dumb leader? Is a dumb leader with evil intentions the worst case or is there an interaction between a leader’s intelligence and intentions?

Toy Factory Manager Data

Model without Interaction The lines in the plot show the mean time for run given run size for each of the three managers. The lines are parallel because the model assumes no interaction.

Interaction Model involving Categorical Variables in JMP To add interactions involving categorical variables in JMP, follow the same procedure as with two continuous variables. Run Fit Model in JMP, add the usual explanatory variables first, then highlight one of the variables in the interaction in the Construct Model Effects box and highlight the other variable in the interaction in the Columns box and then click Cross in the Construct Model Effects box.

Interaction Model

The runs supervised by Alice appear abnormally time consuming. Bob has higher initial fixed setup costs than Carol ( > ) but has lower per unit production time (0.136<0.259). Regression Plot

Interaction Model Interaction between run size and Manager: The effect on mean run time of increasing run size by one is different for different managers. Is there strong evidence that there really is an interaction for the population? Effect Test for Interaction: Manager*Run Size Effect test tests null hypothesis that there is no interaction (effect on mean run time of increasing run size is same for all managers) vs. alternative hypothesis that there is an interaction between run size and managers. p-value = Evidence that there is an interaction.

Interaction Profile Plot Lower left hand plot shows mean time for run vs. run size for the three managers Alice, Bob and Carol. Upper right hand plot shows mean run time for three managers for a low run size of 58 and a high run size of 345.

Interactions Involving Categorical Variables: General Approach First fit model with an interaction between categorical explanatory variable and continuous explanatory variable. Use effect test on interaction to see if there is evidence of an interaction. If there is evidence of an interaction (p-value <0.05 for effect test), use interaction model. If there is not strong evidence of an interaction (p-value >0.05 for effect test), use model without interactions. The model without interactions is easier to interpret but should only be used if there is not strong evidence for interactions.

Example: A Sex Discrimination Lawsuit Did a bank discriminatorily pay higher starting salaries to men than to women? Harris Trust and Savings Bank was sued by a group of female employees who accused the bank of paying lower starting salries to women. The data in harrisbank.JMP are the starting salaries for all 32 male and all 61 female skilled, entry-level clerical employees hired by the bank between 1969 and 1977, as well as the education levels and sex of the employees.

No evidence of an interaction between Sex and Education. Fit model without interactions.

Discrimination Case Regression Results Strong evidence that there is a difference in the mean starting salaries of women and men of the same education level. Estimated difference: Men have ( )=$ higher mean starting salaries than women of the same education level. 95% confidence interval for mean difference = (2*$214.55,2*$477.25)=($429.10,$854.50). Bank’s defense: Omitted variable bias. Variables such as Seniority, Age, Experience also need to be controlled for.

Interpreting Coefficients When Y is Logged But Not X Example: In an industrial laboratory, under uniform conditions, batches of electrical insulating fluid were subject to constant voltages until the insulating property of the fluids broke down. Goal: Estimate E(Y|X), where Y=Breakdown Time, X=Voltage Level The log Y transformation works well for this data:

Interpreting Coefficients When Y is Logged But Not X

Deciding on Variables When we have many potential explanatory variables, how should we decide which to use? 1.Suppose our goal is to estimate the causal effect of a variable, controlling for all lurking variables (e.g., effect of pollution on mortality). Then it is best to include all possible lurking variables. Omitting variables, even if they do not appear significant leads to potential bias. 2.Suppose our goal is to understand the association of certain variable(s) with a response, holding fixed for certain other variables. We should think carefully about what variables we want to hold fixed.

Model Building for Prediction: General Principles 1.Include all variables that for substantive reasons, might be expected to be important in predicting the outcome. 2.For explanatory variables with large efects, consider their interactions as well. For each interaction considered, add the interaction if the p-value < 0.05 on the interaction term.

Excluding Variables From a Model Should we ever drop a variable from a model? If there are many explanatory variables, then including variables that are not useful can worsen the estimates of the variables that are useful. We suggest the following strategy: 1.If an explanatory variable is statistically significant (p-value < 0.05 for two sided test) and has the expected, then we should definitely keep it in the model. 2.If an explanatory variable is not statistically significant and has the expected sign, it is generally fine to keep the variable in the model. 3.If an explanatory variable is not statistically significant and does not have the expected sign and is not of primary interest in and of itself, we can consider removing it from the model. 4.If an explanatory variable is statistically significant but does not have the expected sign, then we should keep in the model, but think hard if the sign makes sense. Try to gather data on potential lurking variabls and include them in the analysis.

SAT Data Y = Average score on 1982 SAT for the state. Explanatory Variables: –X1=Takers (% of Total Eligible Students in the state who took the exam). –X2=Income (Median Income of Families of Test Takers) –X3=Years (Average Number of Years That Test Takers Had Formal Studies in Social Sciences, Natural Sciences and Humanities) –X4=Public (Percentage of Test Takers who attend public schools) –X5=Expend (Total State Expenditure on Secondary Schools, expressed in hundreds of dollars per student) –X6=Rank (Median percentile ranking of test takers within their secondary classes)

Expected Signs –X1=Takers (% of Total Eligible Students in the state who took the exam) (Expected sign: -) –X2=Income (Median Income of Families of Test Takers) (Expected sign: +) –X3=Years (Average Number of Years That Test Takers Had Formal Studies in Social Sciences, Natural Sciences and Humanities) (Expected sign: +) –X4=Public (Percentage of Test Takers who attend public schools) (Expected sign: -) –X5=Expend (Total State Expenditure on Secondary Schools, expressed in hundreds of dollars per student) (Expected sign: +) –X6=Rank (Median percentile ranking of test takers within their secondary classes) (Expected sign: +)