Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.

Slides:



Advertisements
Similar presentations
1 Chapter 9 Supplement Model Building. 2 Introduction Introduction Regression analysis is one of the most commonly used techniques in statistics. It is.
Advertisements

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
Agresti/Franklin Statistics, 1 of 52 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
Lecture 26 Model Building (Chapters ) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Lecture 25 Multiple Regression Diagnostics (Sections )
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Lecture 20 Simple linear regression (18.6, 18.9)
1 Lecture Eleven Probability Models. 2 Outline Bayesian Probability Duration Models.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
1 Lecture Eleven Probability Models. 2 Outline Bayesian Probability Duration Models.
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter )
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Chapter 10 Correlation and Regression
Economics 173 Business Statistics Lecture 22 Fall, 2001© Professor J. Petry
Outline When X’s are Dummy variables –EXAMPLE 1: USED CARS –EXAMPLE 2: RESTAURANT LOCATION Modeling a quadratic relationship –Restaurant Example.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
Lecture 27 Chapter 20.3: Nominal Variables HW6 due by 5 p.m. Wednesday Office hour today after class. Extra office hour Wednesday from Final Exam:
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.
1 Chapter 20 Model Building Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Lecture Eleven Probability Models.
Inference for Least Squares Lines
Basic Estimation Techniques
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 13 Multiple Regression
Basic Estimation Techniques
Regression and Categorical Predictors
Presentation transcript:

Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters 9.4 – 9.5) Specially Constructed Explanatory Variables (Chapter 9.3) –Polynomial terms for curvature –Interaction terms –Sets of indicator variables for nominal variables

Interpreting Coefficients Multiple Linear Regression Model Interpretation of Coefficient : The change in the mean of Y that is associated with increasing X j by one unit and not changing X 1,…,X j-1, X j+1,…,X p Interpretation holds even if X 1,…,X p are correlated. Same warning about extrapolation beyond the observed X 1,…,X p points as in simple linear regression.

Coefficients in Mammal Study It is estimated that –A 1 kg increase in body weight with gestation period and litter size held fixed is associated with a 0.90 g mean increase in brain weight [95% CI: (0.80,1.17)] –A 1 day increase in gestation period with body weight and litter size held fixed is associated with a 1.81g mean increase in brain weight [95% CI : (1.10,2.51)] –A 1 animal increase in litter size with body weight and gestation period held fixed is associated with a 27.65g mean increase in brain weight [95% CI: (-6.94, 62.23)]

Prediction from Multiple Regression Estimated mean brain weight (=predicted brain weight) for a mammal which has a body weight of 3kg, a gestation period of 180 days and a litter size of 1

Strategy for Data Analysis and Graphics Strategy for Data Analysis: Display 9.9 in Chapter 9.4 Good graphical method for initial exploration of data is a matrix of pairwise scatterplots. To display this in JMP, click on Analyze, Multivariate and then put all the variables in Y, Columns.

Specially Constructed Explanatory Variables The scope of multiple linear regression can be dramatically expanded by using specially constructed explanatory variables: –Powers of the explanatory variables X j k can be used to model curvature in regression function. –Indicator variables can be used to model the effect of nominal variables –Products of explanatory variables can be used to model interactive effects of explanatory variables

Curved Regression Functions Linearity assumption in simple linear regression is violated. Transformations wouldn’t work because function isn’t monotonic.

Squared Term for Curvature Multiple Linear Regression Model:

Terms for Curvature Two ways to incorporate squared or higher polynomial terms for curvature in JMP –Fit Model, create a variable rainfall 2 –Fit Y by X, under red triangle next to Bivariate Fit of Yield by Rainfall, click Fit Polynomial then 2, Quadratic instead of Fit Line (a model with both a squared and cubed term can be fit by clicking 3, Cubic) Coefficients are not directly interpretable. Change in the mean of Y that is associated with a one unit increase in X depends on X

Interaction Terms Two variables are said to interact if the effect that one of them has on the mean response depends on the value of the other. An explanatory variable for interaction can be constructed as the product of the two explanatory variables that are thought to interact.

Interaction in Meadowfoam Does the effect of light intesnity on mean number of flowers depend on the timing of light regime? Multiple linear regression model that has term for interaction: Model is equivalent to Change in mean of flowers for a one unit increase in light intensity depends on timing onset. Coefficients are not easily interpretable. Best method for communicating findings with interaction is table or graph of estimated means at various combinations of interacting variables.

Interaction in Meadowfoam There is not much evidence of an interaction. The p-value for the test that the interaction coefficient is zero is

Displaying Interaction – Coded Scatterplots (Section 9.5.2) A coded scatterplot is a scatterplot with different symbols to distinguish two or more groups

Coded Scatterplots in JMP Split the Y variable by the group identity variables (Click Tables, Split, then put Y variable in Split and Group Identity variable in Col ID). Graph, Overlay Plot, put the columns corresponding to the Y’s for the different group identity variables in Y and put the X variable (light intensity) in X.

Parallel vs. Separate Regression Lines Model without interaction between time onset and light intensity is a “parallel regression lines” model Model with interaction is a “separate regression lines” model

Polynomials and Interactions Example An analyst working for a fast food chain is asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp Relationship between y and each explanatory variable might be quadratic because restaurants attract mostly middle-income households and children in the mid age ranges.

fastfoodchain.jmp results Strong evidence of a quadratic relationship between revenue and age, revenue and income. Moderate evidence of an interaction between age and income.

Nominal Variables To incorporate nominal variables in multiple regression analysis, we use indicator variables. Indicator variable to distinguish between two groups: The time onset (early vs. late is a nominal variable). To incorporate it into multiple regression analysis, we used indicator variable early which equals 1 if early, 0 if late.

Nominal Variables with More than Two Categories To incorporate nominal variables with more than two categories, we use multiple indicator variables. If there are k categories, we need k-1 indicator variables.

Nominal Explanatory Variables Example: Auction Car Prices A car dealer wants to predict the auction price of a car. –The dealer believes that odometer reading and the car color are variables that affect a car’s price (data from sample of cars in auctionprice.JMP) –Three color categories are considered: White Silver Other colors Note: Color is a nominal variable.

I 1 = 1 if the color is white 0 if the color is not white I 2 = 1 if the color is silver 0 if the color is not silver The category “Other colors” is defined by: I 1 = 0; I 2 = 0 Indicator Variables in Auction Car Prices

Solution –the proposed model is –The data White car Other color Silver color Auction Car Price Model

Odometer Price Price = (Odometer) (0) (1) Price = (Odometer) (1) (0) Price = (Odometer) (0) + 148(0) (Odometer) (Odometer) (Odometer) The equation for an “other color” car. The equation for a white color car. The equation for a silver color car. From JMP we get the regression equation PRICE = (Odometer)+90.48(I-1) (I-2) Example: Auction Car Price The Regression Equation

From JMP we get the regression equation PRICE = (Odometer)+90.48(I-1) (I-2) A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $ more than a car of the “Other color” category. For one additional mile the auction price decreases by 5.55 cents. Example: Auction Car Price The Regression Equation

There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Xm18-02b Example: Auction Car Price The Regression Equation

Shorthand Notation for Nominal Variables Shorthand Notation for regression model with Nominal Variables. Use all capital letters for nominal variables –Parallel Regression Lines model: –Separate Regression Lines model:

Nominal Variables in JMP It is not necessary to create indicator variables yourself to represent a nominal variable. Make sure that the nominal variable’s modeling type is in fact nominal. Include the nominal variable in the Construct Model Effects box in Fit Model JMP will create indicator variables. The brackets indicate the category of the nominal variable for which the indicator variable is 1.