Statistics for Business and Economics

Slides:



Advertisements
Similar presentations
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Advertisements

Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multiple Regression [ Cross-Sectional Data ]
Analysis of Economic Data
Chapter 13 Multiple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Multiple Regression
Statistics for Business and Economics
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Multiple Linear Regression
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
EPI809/Spring Testing Individual Coefficients.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Business Statistics: A Decision-Making Approach 8 th Edition Chapter 15 Multiple.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Purpose of Regression Analysis Regression analysis is used primarily to model causality and provide prediction –Predicts the value of a dependent (response)
© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Multiple Regression
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 4 Introduction to Multiple Regression
Lecture 10: Correlation and Regression Model.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
12b - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part II.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Chapter 14 Introduction to Multiple Regression
Chapter 15 Multiple Regression and Model Building
Multiple Regression Analysis and Model Building
CHAPTER 29: Multiple Regression*
Statistics for Business and Economics
Presentation transcript:

Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building

Learning Objectives Explain the Linear Multiple Regression Model Describe Inference About Individual Parameters Test Overall Significance Explain Estimation and Prediction Describe Various Types of Models Describe Model Building Explain Residual Analysis Describe Regression Pitfalls As a result of this class, you will be able to...

Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Multiple Regression Model General form: k independent variables x1, x2, …, xk may be functions of variables e.g. x2 = (x1)2

Regression Modeling Steps Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation

First–Order Multiple Regression Model Relationship between 1 dependent and 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables 11

First-Order Model With 2 Independent Variables Relationship between 1 dependent and 2 independent variables is a linear function Model Assumes no interaction between x1 and x2 Effect of x1 on E(y) is the same regardless of x2 values 11

Population Multiple Regression Model Bivariate model: y (Observed y) Response b e i Plane x2 x1 (x1i , x2i) 12

Sample Multiple Regression Model Bivariate model: y (Observed y) ^ Response b ^ e Plane i x2 x1 (x1i , x2i) 13

Regression Modeling Steps Hypothesize Deterministic Component Estimate Unknown Model Parameters Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error Evaluate Model Use Model for Prediction & Estimation

Multiple Linear Regression Equations Too complicated by hand! Ouch! 16

1st Order Model Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters. You’ve collected the following data: (y) (x1) (x2) Resp Size Circ 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 See ResponsesVsAdsizeAndCirculationData.jmp Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18

0 ^ 1 ^ 2 ^

Interpretation of Coefficients Solution ^ Slope (1) Number of responses to ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in ad size holding circulation constant Y-intercept is difficult to interpret. How can you have any responses with no circulation? ^ Slope (2) Number of responses to ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulation holding ad size constant

Regression Modeling Steps Hypothesize Deterministic Component Estimate Unknown Model Parameters Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error Evaluate Model Use Model for Prediction & Estimation

Estimation of σ2 For a model with k predictors (k+1 parameters)

More About JMP Output s s2 SSE (also called “standard error of the regression”) s2 SSE (also called “mean squared error”)

Regression Modeling Steps Hypothesize Deterministic Component Estimate Unknown Model Parameters Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error Evaluate Model Use Model for Prediction & Estimation

Evaluating Multiple Regression Model Steps Examine variation measures Test parameter significance Individual coefficients Overall model Do residual analysis

Inference for an Individual β Parameter Confidence Interval (rarely used in regression) Hypothesis Test (used all the time!) Ho: βi = 0 Ha: βi ≠ 0 (or < or > ) Test Statistic (how far is the sample slope from zero?) df = n – (k + 1)

Easy way: Just examine p-values Both coefficients significant! Reject H0 for both tests

Testing Overall Significance Shows if there is a linear relationship between all x variables together and y Hypotheses H0: 1 = 2 = ... = k = 0 No linear relationship Ha: At least one coefficient is not 0 At least one x variable affects y Less chance of error than separate t-tests on each coefficient. Doing a series of t-tests leads to a higher overall Type I error than .

Testing Overall Significance Test Statistic Degrees of Freedom 1 = k 2 = n – (k + 1) k = Number of independent variables n = Sample size

Testing Overall Significance Computer Output k Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 MS(Model) n – (k + 1) MS(Error) P-value

Testing Overall Significance Computer Output k n – (k + 1) MS(Model) MS(Error) P-value

Types of Regression Models Explanatory Variable 1st Order Model 3rd 2 or More Quantitative Variables 2nd Inter- Action 1 Qualitative Dummy This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Interaction Model With 2 Independent Variables Hypothesizes interaction between pairs of x variables Response to one x variable varies at different levels of another x variable Contains two-way cross product terms Can be combined with other models Example: dummy-variable model 61

Interaction Model Relationships E(y) = 1 + 2x1 + 3x2 + 4x1x2 E(y) E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1 12 8 E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1 4 x1 0.5 1 1.5 Effect (slope) of x1 on E(y) depends on x2 value 68

Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct a test for interaction. Use α = .05. Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18

Adding Interactions in JMP is Easy Analyze >> Fit Model Click on the response variable and click the Y button Highlight the two X variables and click on the Add button While the two X variables are highlighted, click on the Cross button Run Model You can also combine steps 3 and 4 into one step: Highlight the two X variables and, from the “Macros” pull down menu, chose “Factorial to Degree.” The default for degree is 2, so you will get all two-factor interactions in the model.

JMP Interaction Output Interaction not important: p-value > .05

Types of Regression Models Explanatory Variable 1st Order Model 3rd 2 or More Quantitative Variables 2nd Inter- Action 1 Qualitative Dummy This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Second-Order Model With 1 Independent Variable Relationship between 1 dependent and 1 independent variable is a quadratic function Useful 1st model if non-linear relationship suspected Model Note potential problem with multicollinearity. This is solved somewhat by centering on the mean. Linear effect Curvilinear effect 48

Second-Order Model Relationships 2 > 0 2 > 0 y y x1 x1 2 < 0 2 < 0 y y x1 x1 49

Types of Regression Models Linear (First order) ^ ^  Y     X i 1 i Quadratic (Second order) This teleology is based on the number of explanatory variables & nature of relationship between X & Y. ^ ^ ^  2 Y     X   X i 1 i 2 i Cubic (Third order) ^ ^ ^ ^  2 3 Y     X   X   X i 1 3 i 2 i i 27

2nd Order Model Example The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2nd order model, conduct the global F–test, and test if β2 ≠ 0. Use α = .05 for all tests.

Analyze >> Fit Y by X From hot spot menu choose: Fit Polynomial >> 2, quadratic Could also use: Analyze >> Fit Model, select Y, then highlight X and, from the “Macros” pull down menu, chose “Polynomial to Degree.” The default for degree is 2, so you will get the quadratic (2nd order) polynomial. But from Fit Model, you won’t get the cool fitted line plot.

Types of Regression Models Explanatory Variable 1st Order Model 3rd 2 or More Quantitative Variables 2nd Inter- Action 1 Qualitative Dummy This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Second-Order (Response Surface) Model With 2 Independent Variables Relationship between 1 dependent and 2 independent variables is a quadratic function Useful 1st model if non-linear relationship suspected Model 63

Second-Order Model Relationships y x2 x1 4 + 5 > 0 y 4 + 5 < 0 x2 x1 y 32 > 4 4 5 x2 x1 49

From JMP: To specify the model, all you need to do is: Analyze >> Fit Model Highlight the X variables From the “Macros” pull down menu, chose “Response Surface.” The default for degree is 2, so you will get the full second-order model having all squared terms and all cross products.

Types of Regression Models Explanatory Variable 1st Order Model 3rd 2 or More Quantitative Variables 2nd Inter- Action 1 Qualitative This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Qualitative-Variable Model Involves categorical x variable with 2 levels e.g., male-female; college-no college Variable levels coded 0 and 1 Number of dummy variables is 1 less than number of levels of variable May be combined with quantitative variable (1st order or 2nd order model) 54

56

56

56

56

Qualitative Predictors in JMP Analyze >> Fit Model Specify a qualitative variable JMP will automatically create the needed zero-one variables for you, and run the regression! (All transparent---it does not save the zero-one variables in your data table.) You need to do one thing (only once): Go to JMP >> Preferences. Now click the “Platforms” icon on the left panel, and click on “Fit Least Squares.” Now check the box on the right marked “Indicator Parameterization Estimates.” If you don’t do this your regression will still be correct, JMP will use a different form of zero-one (dummy) variables.

First 32 (out of 100) rows of the salary data. Now do Analyze >> Fit Model

Residual Analysis

Residual Analysis Graphical analysis of residuals Purposes Plot estimated errors versus xi values Difference between actual yi and predicted yi Estimated errors are called residuals Plot histogram or stem-&-leaf of residuals Purposes Examine functional form (linear v. non-linear model) Evaluate violations of assumptions

Residual Plot for Functional Form Add x2 Term Correct Specification x e ^ x e ^ 92

Residual Plot for Equal Variance Unequal Variance Correct Specification x e ^ x e ^ Fan-shaped. Standardized residuals used typically. 93

Residual Plot for Independence Not Independent Correct Specification ^ x e ^ e x Plots reflect sequence data were collected. 94

Residual Analysis Using JMP Fit full model, and examine “residual plot” of residuals (Y) vs. predicted values (Yhat). This plot automatically appears. Look for outliers, or curvature, or non-constant variance. Hope for a random, shotgun scatter---everything is OK then. Save the residuals (red tab >> Save Columns >> residuals) Analyze >> Distribution (of saved residuals) and obtain a normal quantile (probability plot) using the red tabs. Use the red tab to obtain a goodness of fit test for normality of the residuals using the tabs. If the data were collected sequentially over time, obtain a plot of residuals vs. row number to see if there are any patterns related to time. This is one check of the “independence” of residuals assumption. 94

Residual Analysis via JMP Step 1: From regression of SalePrice on three predictors---so far so good

Residual Analysis via JMP Step 2: Save residuals for more analysis Steps 3 and 4: (Normality OK (sort of, approximately anyway) Step 5: Only needed if data in time order. Graph >> Overlay Plot (specify residuals for Y, no X needed!---see next page.)

Residual Analysis via JMP Step 2: Save residuals for more analysis Steps 3 and 4: (Normality OK (sort of, approximately anyway) Step 5: Only needed if data in time order. Graph >> Overlay Plot (no pattern apparent)

Selecting Variables in Model Building

Model Building with Computer Searches Rule: Use as few x variables as possible Stepwise Regression Computer selects x variable most highly correlated with y Continues to add or remove variables depending on SSE Best subset approach Computer examines all possible sets

Subset Selection Simple models tend to work best 1. Give best predictions 2. Simplest explanations of underlying phenomena 3. Avoids multicollinearity (redundant X variables)

Manual Stepwise Regression: 1. Start with full model. 2. If all p-values < .05, stop. Otherwise, drop the variable that has the largest p value. 3. Refit the model. Go to step 2.

Automatic Stepwise Regression: Let the computer do it for you! 1. Stepwise Regression. Backward stepwise automates the manual stepwise procedure 2. Best subsets regression. Computes all possible models and summarizes each.

JMP Stepwise Example: Car Leasing To appropriately price new car leases, car dealers need to accurately predict the value of the cars at the conclusion of the leases. These resale values are generally determined at wholesale auctions. Data collected on 54 1997 new car models are listed on the next two pages (a) Use backward stepwise regression to find the best predictors of resale value, y. (b) Use forward stepwise regression to find the best predictors of resale value. Does you r answer agree with what you had already found in part (a)? (c) Use all possible regressions to find the mode that minimizes s (root mean square error). Does this agree with either parts (a) or (b)? (d) What would you choose for a final model and why?

Leasing Data Y; Resale value in 2000 X1: 1997 Price X2: Price increase in model from 1997-2000 X3: Consumer Reports quality index X4: Consumer Reports reliability index X5: Number of model vehicles sold in 1997 X6: = Yes, if minor change made in model in 1998, 1999, or 2000 = No, if not X7: = Yes, if major change made in model in 1998, 1999, or 2000

Backward Stepwise: Analyze >>Fit Model and specify model Change “Personality” to Stepwise Enter all model terms and press “Go” (Change “Prob to Enter” to .1 or ,.05) Forward Stepwise: Analyze >>Fit Model and specify model Change “Personality” to Stepwise Press “Go” (Change “Prob to Enter” to .1 or ,.05)

All Possible Regressions Output: Under Stepwise Fit, use red hot spot to select “All Possible Models Requested the best (1) model of each model size Best model minimizes RMSE (that is, s---same as maximizing adjusted R2 Or you could choose to minimize AICc (corrected Akaike Information Criterion)

Regression Pitfalls Parameter Estimability Multicollinearity Number of different x–values must be at least one more than order of model Multicollinearity Two or more x–variables in the model are correlated Extrapolation Predicting y–values outside sampled range Correlated Errors

Multicollinearity High correlation between x variables Coefficients measure combined effect Leads to unstable coefficients depending on x variables in model Always exists – matter of degree Example: using both age and height as explanatory variables in same model

Detecting Multicollinearity Significant correlations between pairs of x variables are more than with y variable Non–significant t–tests for most of the individual parameters, but overall model test is significant Estimated parameters have wrong sign Always do a scatterplot matrix of your data before analysis---look for outliers and relationships between x variables (Graph >> Scatterplot Matrix)

Any problems? Outliers? Collinearity? What are the best predictors of Resale (y)?

Solutions to Multicollinearity Eliminate one or more of the correlated x variables Center predictors before computing polynomial terms (squares, crossproducts) (JMP does this automatically!) Avoid inference on individual parameters Do not extrapolate

Extrapolation y x Interpolation Extrapolation Extrapolation Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values Extrapolation Extrapolation x Sampled Range

Conclusion Explained the Linear Multiple Regression Model Described Inference About Individual Parameters Tested Overall Significance Explained Estimation and Prediction Described Various Types of Models Described Model Building Explained Residual Analysis Described Regression Pitfalls As a result of this class, you will be able to...