Multiple Linear Regression Model

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.
Conclusion to Bivariate Linear Regression Economics 224 – Notes for November 19, 2008.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Multicollinearity Multicollinearity - violation of the assumption that no independent variable is a perfect linear function of one or more other independent.
Objectives (BPS chapter 24)
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
Chapter 12 Simple Regression
Simple Linear Regression
Chapter 4 Multiple Regression.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Ch. 14: The Multiple Regression Model building
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Simple Linear Regression Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Ordinary Least Squares
Multiple Linear Regression Analysis
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Lecture 5 Correlation and Regression
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Chapter 11 Simple Regression
Hypothesis Testing in Linear Regression Analysis
Regression Method.
Chapter 12 Multiple Regression and Model Building.
Understanding Multivariate Research Berry & Sanders.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Bivariate Regression Assumptions and Testing of the Model Economics 224, Notes for November 17, 2008.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Managerial Economics Demand Estimation & Forecasting.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
Chapter 16 Data Analysis: Testing for Associations.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1/25 Introduction to Econometrics. 2/25 Econometrics Econometrics – „economic measurement“ „May be defined as the quantitative analysis of actual economic.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Chapter 4: Basic Estimation Techniques
Chapter 14 Introduction to Multiple Regression
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Fundamentals of regression analysis
Simple Linear Regression
Simple Linear Regression
Regression III.
BEC 30325: MANAGERIAL ECONOMICS
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Multiple Linear Regression Model Until now, all we did was establish the relationship between 2 variables 1 independent variable, 1 dependent variable But we would like to do better and several independent variables together could be a better predictor, so we would like to establish a relationship like this - Several independent variables, 1 dependent variable

Analysis of Relationship among several variables Linear Regression estimates the “Line of Best Fit” (‘Line’ because we are talking about a Linear model)

Analysis of Relationship among several variables bj for each X is slope coefficient The slope coefficient measures how much the dependent variable, Y , changes when the independent variable, Xj , changes by one unit, holding all other independent variables constant. a is the intercept or constant for the model The intercept measures the value of Y if all X’s are 0 The way to estimate this is through the process of ‘Least Squares’ In practice, software programs are used to estimate the multiple regression model

Inference about Parameters Parameter Estimate Error t Value Pr>|t| CONST1 -51.8861 27.1756 -1.91 0.0594 X1 0.02065 0.04028 0.51 0.6094 X2 0.47620 0.22749 2.09 0.0391 X3 0.07123 0.14718 0.48 0.6296 X4 -2.02110 1.10141 -1.84 0.0698 X5 0.00447 0.03138 0.14 0.8870 X6 3.79589 2.39372 1.59 0.1163 X7 0.26862 0.11720 2.29 0.0242 X8 -1.91116 1.06776 -1.79 0.0768 X9 3.26388 1.14130 2.86 0.0053 X10 3.64432 1.27539 2.86 0.0053

Assumptions of the Multiple Regression Model The relationship between the dependent variable, Y, and the independent variables, X1, X2, . . . , Xk, is linear. The independent variables (X1, X2, . . . , Xk) are not random. Also, no exact linear relation exists between two or more of the independent variables. The error term is normally distributed The expected value (mean) of the error term, conditioned on the independent variables, is 0 The variance of the error term is the same for all observations

Inference about Model If the model is correctly specified, R2 is an ideal measure Addition of a variable to a regression will increase the R2 (by construction) This fact can be exploited to get regressions with R2 ~ 100% by addition of variables, but this doesn’t mean that the model is any good Adj-R2 should be reported

Inference about Model Adjusted R2 is a measure of goodness of fit that accounts for additional explanatory variables.

Inference about Parameters Coefficients (b1, b2,..,bk) are estimated with a confidence interval To know if a specific independent variable (xi) is influential in predicting the dependent variable (y), we test whether the corresponding coefficient (bi) is statistically different from 0 (i.e. bi = 0). We do so by calculating the t-statistic for the coefficient If the t-stat is sufficient large, it indicates that bi is significantly different from 0 indicating that bi * xi plays a role in determining y

Inference about Parameters Parameter Estimate Error t Value Pr>|t| CONST1 -51.8861 27.1756 -1.91 0.0594 X1 0.02065 0.04028 0.51 0.6094 X2 0.47620 0.22749 2.09 0.0391 X3 0.07123 0.14718 0.48 0.6296 X4 -2.02110 1.10141 -1.84 0.0698 X5 0.00447 0.03138 0.14 0.8870 X6 3.79589 2.39372 1.59 0.1163 X7 0.26862 0.11720 2.29 0.0242 X8 -1.91116 1.06776 -1.79 0.0768 X9 3.26388 1.14130 2.86 0.0053 X10 3.64432 1.27539 2.86 0.0053

Predicting the Dependent Variable To predict the value of a dependent variable using a multiple linear regression model, we follow these three steps: Obtain estimates of the regression parameters. Determine the assumed values of the independent variables.

Predictions Estimate a model Assess the model predictive ability Assess the significance of each independent variable If they are satisfactory, we decide to use the existing model Else we re-estimate using different independent variables

Predictions We then change the parameters to levels that we would like predictions for derive the corresponding y This would be the predicted value of ‘y’ as a function of the changes to the existing parameter values

Using Dummy Variables Some times our independent variable isn’t numeric E.g. establish relationship between day of week and alcohol consumption relationship between major and income More relevant example from finance Industry effect on returns (do technology stocks higher returns on same investments) Do emerging markets provide greater returns for same risks?

Using Dummy Variables A dummy variable is qualitative variable It takes on a value of 1 if a particular condition is true and 0 if that condition is false. In our examples, we would use dummy for the following Relationship between week day and alcohol consumption Each weekday would have a dummy associated if the alcohol consumption on the y corresponded to it

Using Dummy Variables

Month-of-the-Year Effects on Small Stock Returns Suppose we want to test whether total returns to one small-stock index, the Russell 2000 Index, differ by month. We can use dummy variables in estimate the following regression,

Month-of-the-Year Effects on Small Stock Returns

Violations of Regression Assumptions Inference based on an estimated regression model rests on certain assumptions being met. Violations may cause the inferences made to be invalid.

Heteroskedasticity Heteroskedasticity occurs when the variance of the errors differs across observations. does not affect consistency t-tests for the significance of individual regression coefficients are unreliable because heteroskedasticity introduces bias into estimators of the standard error of regression coefficients.

Regressions with Homoskedasticity

Regressions with Heteroskedasticity

Serial Correlation When regression errors are correlated across observations, we say that they are serially correlated (or autocorrelated). Serial correlation most typically arises in time-series regressions. The principal problem caused by serial correlation in a linear regression is that it leads to incorrect calculation of critical values for the test of significance

Multicollinearity Multicollinearity occurs when two or more independent variables (or combinations of independent variables) are highly (but not perfectly) correlated with each other. does not affect the consistency of the regression coefficients estimates become extremely imprecise and unreliable The classic symptom of multicollinearity is a high R2 even though the t-statistics on the estimated slope coefficients are not significant. The most direct solution to multicollinearity is excluding one or more of the regression variables.

Problems in Regression & Solutions Effect Solution Heteroskedasticity Incorrect standard errors Correct for conditional heteroskedasticity Serial Correlation Incorrect t-values Correct for serial correlation Multicollinearity High R2 and low t-statistic Remove 1 or more independent variable

Model Specification Model specification refers to the set of variables included in the regression and the regression equation’s functional form. Possible misspecifications include: One or more important variables could be omitted from regression. One or more of the regression variables may need to be transformed (for example, by taking the natural logarithm of the variable) before estimating the regression. The regression model pools data from different samples that should not be pooled.

Discreet Dependent Variable Models Discreet dependent variables are dummy variables used as dependent variables instead of as independent variables. Mainly 2 models – The probit model, which is based on the normal distribution, estimates the probability that Y = 1 (a condition is fulfilled) given the value of the independent variable X The logit model is identical, except that it is based on the logistic distribution rather than the normal distribution

Discreet Dependent Variable Models Logistic regression