Part 24: Multiple Regression – Part 4 24-1/45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

Part 24: Hypothesis Tests 24-1/33 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Multiple Regression and Model Building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
The Multiple Regression Model.
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Part 7: Estimating the Variance of b 7-1/53 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Objectives (BPS chapter 24)
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Multiple Regression Models
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
SIMPLE LINEAR REGRESSION
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Simple Linear Regression Analysis
Relationships Among Variables
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Example of Simple and Multiple Regression
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
Econ 3790: Business and Economics Statistics
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
CHAPTER 14 MULTIPLE REGRESSION
Introduction to Linear Regression
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 13 Multiple Regression
Environmental Modeling Basic Testing Methods - Statistics III.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Business Research Methods
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Chapter 20 Linear and Multiple Regression
Chapter 4 Basic Estimation Techniques
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 11 Simple Regression
Statistics and Data Analysis
Multiple Regression Chapter 14.
Econometrics I Professor William Greene Stern School of Business
Essentials of Statistics for Business and Economics (8e)
Presentation transcript:

Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Part 24 – Multiple Regression: 4

Part 24: Multiple Regression – Part /45 Hypothesis Tests in Multiple Regression  Simple regression: Test β = 0  Testing about individual coefficients in a multiple regression  R 2 as the fit measure in a multiple regression Testing R 2 = 0 Testing about sets of coefficients Testing whether two groups have the same model

Part 24: Multiple Regression – Part /45 Regression Analysis  Investigate: Is the coefficient in a regression model really nonzero?  Testing procedure: Model: y = α + βx + ε Hypothesis: H 0 : β = 0. Rejection region: Least squares coefficient is far from zero.  Test: α level for the test = 0.05 as usual Compute t = b/StandardError Reject H 0 if t is above the critical value  1.96 if large sample  Value from t table if small sample. Reject H 0 if reported P value is less than α level Degrees of Freedom for the t statistic is N-2

Part 24: Multiple Regression – Part /45 Application: Monet Paintings  Does the size of the painting really explain the sale prices of Monet’s paintings?  Investigate: Compute the regression  Hypothesis: The slope is actually zero.  Rejection region: Slope estimates that are very far from zero. The hypothesis that β = 0 is rejected

Part 24: Multiple Regression – Part /45 An Equivalent Test  Is there a relationship?  H 0 : No correlation  Rejection region: Large R 2.  Test: F=  Reject H 0 if F > 4  Math result: F = t 2. Degrees of Freedom for the F statistic are 1 and N-2

Part 24: Multiple Regression – Part /45 Partial Effects in a Multiple Regression  Hypothesis: If we include the signature effect, size does not explain the sale prices of Monet paintings.  Test: Compute the multiple regression; then H 0 : β 1 = 0.  α level for the test = 0.05 as usual  Rejection Region: Large value of b 1 (coefficient)  Test based on t = b 1 /StandardError Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed The regression equation is ln (US$) = ln (SurfaceArea) Signed Predictor Coef SE Coef T P Constant ln (SurfaceArea) Signed S = R-Sq = 46.2% R-Sq(adj) = 46.0% Reject H 0. Degrees of Freedom for the t statistic is N-3 = N-number of predictors – 1.

Part 24: Multiple Regression – Part /45 Use individual “T” statistics. T > +2 or T < -2 suggests the variable is “significant.” T for LogPCMacs = This is large.

Part 24: Multiple Regression – Part /45 Women appear to assess health satisfaction differently from men.

Part 24: Multiple Regression – Part /45 Or do they? Not when other things are held constant

Part 24: Multiple Regression – Part /45

Part 24: Multiple Regression – Part /45 Confidence Interval for Regression Coefficient  Coefficient on OwnRent Estimate = Standard error = Confidence interval ± 1.96 X (large sample) = ± = to  Form a confidence interval for the coefficient on SelfEmpl. (Left for the reader)

Part 24: Multiple Regression – Part /45 Model Fit  How well does the model fit the data?  R 2 measures fit – the larger the better Time series: expect.9 or better Cross sections: it depends  Social science data:.1 is good  Industry or market data:.5 is routine  Use R 2 to compare models and find the right model

Part 24: Multiple Regression – Part /45 Dear Prof William I hope you are doing great. I have got one of your presentations on Statistics and Data Analysis, particularly on regression modeling. There you said that R squared value could come around.2 and not bad for large scale survey data. Currently, I am working on a large scale survey data set data (1975 samples) and r squared value came as.30 which is low. So, I need to justify this. I thought to consider your presentation in this case. However, do you have any reference book which I can refer while justifying low r squared value of my findings? The purpose is scientific article.

Part 24: Multiple Regression – Part /45 Pretty Good Fit: R 2 =.722 Regression of Fuel Bill on Number of Rooms

Part 24: Multiple Regression – Part /45 A Huge Theorem  R 2 always goes up when you add variables to your model.  Always.

Part 24: Multiple Regression – Part /45 The Adjusted R Squared  Adjusted R 2 penalizes your model for obtaining its fit with lots of variables. Adjusted R 2 = 1 – [(N-1)/(N-K-1)]*(1 – R 2 )  Adjusted R 2 is denoted  Adjusted R 2 is not the mean of anything and it is not a square. This is just a name.

Part 24: Multiple Regression – Part /45 The Adjusted R Squared S = R-Sq = 57.0% R-Sq(adj) = 56.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total If N is very large, R 2 and Adjusted R 2 will not differ by very much is quite large for this purpose.

Part 24: Multiple Regression – Part /45 Success Measure  Hypothesis: There is no regression.  Equivalent Hypothesis: R 2 = 0.  How to test: For now, rough rule. Look for F > 2 for multiple regression (Critical F was 4 for simple regression) F = for Movie Madness

Part 24: Multiple Regression – Part /45 Testing “The Regression” Degrees of Freedom for the F statistic are K and N-K-1

Part 24: Multiple Regression – Part /45 The F Test for the Model  Determine the appropriate “critical” value from the table.  Is the F from the computed model larger than the theoretical F from the table? Yes: Conclude the relationship is significant No: Conclude R 2 = 0.

Part 24: Multiple Regression – Part /45 n 1 = Number of predictors n 2 = Sample size – number of predictors – 1

Part 24: Multiple Regression – Part /45 Movie Madness Regression S = R-Sq = 57.0% R-Sq(adj) = 56.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

Part 24: Multiple Regression – Part /45 Compare Sample F to Critical F  F = for Movie Madness  Critical value from the table is  Reject the hypothesis of no relationship.

Part 24: Multiple Regression – Part /45 An Equivalent Approach  What is the “P Value?”  We observed an F of (or, whatever it is).  If there really were no relationship, how likely is it that we would have observed an F this large (or larger)? Depends on N and K The probability is reported with the regression results as the P Value.

Part 24: Multiple Regression – Part /45 The F Test S = R-Sq = 57.0% R-Sq(adj) = 56.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

Part 24: Multiple Regression – Part /45 A Cost “Function” Regression The regression is “significant.” F is huge. Which variables are significant? Which variables are not significant?

Part 24: Multiple Regression – Part /45 What About a Group of Variables?  Is Genre significant in the movie model? There are 12 genre variables Some are “significant” (fantasy, mystery, horror) some are not. Can we conclude the group as a whole is?  Maybe. We need a test.

Part 24: Multiple Regression – Part /45 Theory for the Test  A larger model has a higher R 2 than a smaller one.  (Larger model means it has all the variables in the smaller one, plus some additional ones)  Compute this statistic with a calculator

Part 24: Multiple Regression – Part /45 Is Genre Significant? Calc -> Probability Distributions -> F… The critical value shown by Minitab is 1.76 With the 12 Genre indicator variables: R-Squared = 57.0% Without the 12 Genre indicator variables: R-Squared = 55.4% The F statistic is F is greater than the critical value. Reject the hypothesis that all the genre coefficients are zero.

Part 24: Multiple Regression – Part /45 Now What?  If the value that Minitab shows you is less than your F statistic, then your F statistic is large  I.e., conclude that the group of coefficients is “significant”  This means that at least one is nonzero, not that all necessarily are.

Part 24: Multiple Regression – Part /45 Application: Part of a Regression Model  Regression model includes variables x1, x2,… I am sure of these variables.  Maybe variables z1, z2,… I am not sure of these.  Model: y = α+β 1 x1+β 2 x2 + δ 1 z1+δ 2 z2 + ε  Hypothesis: δ 1 =0 and δ 2 =0.  Strategy: Start with model including x1 and x2. Compute R 2. Compute new model that also includes z1 and z2.  Rejection region: R 2 increases a lot.

Part 24: Multiple Regression – Part /45 Test Statistic

Part 24: Multiple Regression – Part /45 Gasoline Market

Part 24: Multiple Regression – Part /45 Gasoline Market Regression Analysis: logG versus logIncome, logPG The regression equation is logG = logIncome logPG Predictor Coef SE Coef T P Constant logIncome logPG S = R-Sq = 93.6% R-Sq(adj) = 93.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total R 2 = / =

Part 24: Multiple Regression – Part /45 Gasoline Market Regression Analysis: logG versus logIncome, logPG,... The regression equation is logG = logIncome logPG logPNC logPUC logPPT Predictor Coef SE Coef T P Constant logIncome logPG logPNC logPUC logPPT S = R-Sq = 96.0% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Now, R 2 = / = Previously, R 2 = / =

Part 24: Multiple Regression – Part /45

Part 24: Multiple Regression – Part /45 n 1 = Number of predictors n 2 = Sample size – number of predictors – 1

Part 24: Multiple Regression – Part /45 Improvement in R 2 Inverse Cumulative Distribution Function F distribution with 3 DF in numerator and 46 DF in denominator P( X <= x ) = 0.95 x = The null hypothesis is rejected. Notice that none of the three individual variables are “significant” but the three of them together are.

Part 24: Multiple Regression – Part /45 Application  Health satisfaction depends on many factors: Age, Income, Children, Education, Marital Status Do these factors figure differently in a model for women compared to one for men?  Investigation: Multiple regression  Null hypothesis: The regressions are the same.  Rejection Region: Estimated regressions that are very different.

Part 24: Multiple Regression – Part /45 Equal Regressions  Setting: Two groups of observations (men/women, countries, two different periods, firms, etc.)  Regression Model: y = α+β 1 x1+β 2 x2 + … + ε  Hypothesis: The same model applies to both groups  Rejection region: Large values of F

Part 24: Multiple Regression – Part /45 Procedure: Equal Regressions  There are N1 observations in Group 1 and N2 in Group 2.  There are K variables and the constant term in the model.  This test requires you to compute three regressions and retain the sum of squared residuals from each: SS1 = sum of squares from N1 observations in group 1 SS2 = sum of squares from N2 observations in group 2 SSALL = sum of squares from NALL=N1+N2 observations when the two groups are pooled.  The hypothesis of equal regressions is rejected if F is larger than the critical value from the F table (K numerator and NALL-2K-2 denominator degrees of freedom)

Part 24: Multiple Regression – Part / |Variable| Coefficient | Standard Error | T |P value]| Mean of X| Women===|=[NW = 13083]================================================ Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | Men=====|=[NM = 14243]================================================ Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | Both====|=[NALL = 27326]============================================== Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | German survey data over 7 years, 1984 to 1991 (with a gap). 27,326 observations on Health Satisfaction and several covariates. Health Satisfaction Models: Men vs. Women

Part 24: Multiple Regression – Part /45 Computing the F Statistic | Women Men All | | HEALTH Mean = | | Standard deviation = | | Number of observs. = | | Model size Parameters = | | Degrees of freedom = | | Residuals Sum of squares = | | Standard error of e = | | Fit R-squared = | | Model test F (P value) = (.000) (.000) (.0000) |

Part 24: Multiple Regression – Part /45 Summary  Simple regression: Test β = 0  Testing about individual coefficients in a multiple regression  R 2 as the fit measure in a multiple regression Testing R 2 = 0 Testing about sets of coefficients Testing whether two groups have the same model