Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe.

Similar presentations


Presentation on theme: "Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe."— Presentation transcript:

1 Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

2 Which group are you in? 1.Group 1 2.Group 2 3.Group 3 4.Group 4 5.Group 5 6.Group 6 7.Group 7 8.Group 8

3 Key Goals of the Week What is multiple regression? How to interpret regression results: estimated regression coefficients significance tests for coefficients Violations of OLS assumptions Diagnostics What to do MG461, Week 3 Seminar 3

4 MULTIPLE REGRESSION

5 When to use Regression We want to know whether the outcome, y, varies depending on x Continuous variables (but many exceptions) Observational data (mostly) The relationship between x and y is linear MG461, Week 3 Seminar 5

6 Simple Linear Model MG461, Week 3 Seminar 6

7 1.of one variable on another variable. 2.of one variable on one or more other variables.

8 Multiple Regression Compensation Performance Size of Company Years worked Ratings of Supervisor Opportunity to learn Critical of poor performance Handles complaints

9 1.Does not allow special privileges. 2.Opportunity to learn. 3.Too critical of poor performance. 4.Handles employee complaints.

10 Simple linear model: Rating vs. No Special Privileges Estimate (s.e.) (Constant)42.11*** (9.27) No special privileges 0.42* (0.17) n= R 2 = 30 0.15 Note on significance of coefficients: ***p < 0.001 **p < 0.01 *p < 0.05. p < 0.1 Source: Chatterjee et al, Regression Analysis by Example

11 SPSS output -> Regression Table Estimate (s.e.) (Constant)42.11*** (9.27) No special privileges 0.42* (0.17) n= R 2 = 30 0.15 βhat 0 βhat 1 se(βhat 0) se(βhat 1) ignore t(βhat 0 -0) t(βhat 1 -0) x variable

12 42% of employees value supervisors who don’t grant special privileges? 1.Yes 2.No 32% 68% Estimate (s.e.) (Constant)42.11*** (9.27) No special privileges 0.42* (0.17) n= R 2 = 30 0.15

13 Simple linear model #2: Rating vs. Opportunity to Learn Estimate (s.e.) (Constant)28.17*** (8.81) Opportunity to learn 0.65* (0.15) n= R 2 = 30 0.37 Note on significance of coefficients: ***p < 0.001 **p < 0.01 *p < 0.05. p < 0.1

14 Model 1Model 2Model 3Model 4Model 5Model 6 (Constant)42.11*** (9.27) 28.17*** (8.81) 14.38* (6.62) 19.98  (11.69) 50.24** (17.31) 56.76*** (9.74) No special privileges0.42* (0.17) Opportunity to learn0.65* (0.15) Handles complaints 0.75*** (0.15) Raises based on performance 0.69*** (0.18) Too critical of poor performance 0.19 (0.23) Rate of advancing to better jobs 0.18 (0.22) n= R 2 = 30 0.15 30 0.37 30 0.68 30 0.35 30 0.02 30 0.02

15 1.Yes 2.No

16 Multiple potential explanations… Experimental Controls: Random assignment Experimental Design Observational data analysis: Statistical Controls Ratings of Supervisor No special privileges Opportunity to learn Critical of poor performance Handles complaints

17 Multiple Regression Model MG461, Week 3 Seminar 17 Dependent Variable Independent Variables Intercept Coefficients Error Observation or data point, i, goes from 1…n

18 1.Β 0 2.x 1,i 3.β p 4.σ 2

19 Multiple Regression OLS Estimates (matrix) Y = Xβ +ε

20 Model 1Model 2Model 3Model 4Model 5Model 6ALL (Constant)42.11*** (9.27) 28.17*** (8.81) 14.38* (6.62) 19.98  (11.69) 50.24** (17.31) 56.76*** (9.74) 10.79 (11.59) No special privileges0.42* (0.17) -0.07 (0.14) Opportunity to learn0.65* (0.15) 0.32  (0.16) Handles complaints 0.75*** (0.15) 0.61*** (0.16) Raises based on performance 0.69*** (0.18) 0.082 (0.22) Too critical of poor performance 0.19 (0.23) 0.038 (0.14) Rate of advancing to better jobs 0.18 (0.22) -0.21 (0.17) n= R 2 = 30 0.15 30 0.37 30 0.68 30 0.35 30 0.02 30 0.02 30 0.73

21 Significance of Results Model Significance H 0 : None of the 1 (or more) independent variables covary with the dependent variable H A : At least one of the independent variables covaries with d.v. Application: compare two fitted models Test: Anova/F-Test **assumes errors (e i ) are normally distributed Coefficient Significance H 0 : ß 1 =0, there is no relationship (covariation) between x and y H A : ß 1 ≠0, there is a relationship (covariation) between x and y Application: a single estimated coefficient Test: t-test **assumes errors (e i ) are normally distributed MG461, Week 3 Seminar 21

22 Comparing Models: Anova Complaints only Complaints & Learn ALL (Constant)14.38* (6.62) 9.87 (7.06) 10.79 (11.59) No special privileges-0.07 (0.14) Opportunity to learn0.21 (0.13) 0.32  (0.16) Handles complaints 0.75*** (0.15) 0.64*** (0.12) 0.61*** (0.16) Raises based on performance 0.082 (0.22) Too critical of poor performance 0.038 (0.14) Rate of advancing to better jobs -0.21 (0.17) n= R 2 = 30 0.68 30 0.71 30 0.73 Anova Model Comparison All Variables (Full) vs. Complaints & Learn: F=0.53 p=0.72 Complaints & Learn vs. Complaints: F=2.47 p=0.13

23 SPEED PRACTICE: INTERPRETING REGRESSION RESULTS 1) p-values & significance 2) Coefficients significant from tables 2) substantive interpretation of coefficients

24 Does “Critical” have an effect on supervisor ratings? 33% 67% Coefficients.e.tp-value (sig) (Constant)10.7911.590.930.36 No special privileges-0.070.14-0.540.60 Opportunity to learn0.320.163.810.07 Handles complaints0.610.161.900.009 Raises based on performance0.0820.220.260.80 Too critical of poor performance0.0380.140.370.72 Rate of advancing to better jobs-0.210.17-1.220.24 R2nR2n 0.73 336 1.Yes 2.No 0%

25 Coefficients.e.tp-value (sig) (Intercept)-149.6117.9e+02-1.270.21 Average Income5.077e-061.640e-030.0030.998 % Metropolitan-5.062e-033.129e-01-0.0160.987 Average Taxes-3.974e-021.505e-02-2.640.012 Average Education2.731.222.250.030 Temperature0.760.900.840.41 R2nR2n 0.28 48 Does Income have an effect on Immigration Rate? 50% 1.Yes 2.No 0%

26 Does having a HS Degree affect salary? Coefficients.e.tp-value (sig) Intercept11031.81383.2228.790.000 Years Experience546.1830.5217.900.000 HS Degree-2996.21411.75-7.280.000 B.S. Degree147.82387.660.380.705 Management (1=Yes)6883.53313.921.900.000 R2nR2n 0.957 46 1.Yes 2.No 0% Countdown 10

27 Coefficients.e.tp-value (sig) (Intercept)5.320.1050.860.000 Runs0.00450.0041.000.32 Hits0.0120.0025.140.00 Home Runs0.0390.0084.810.00 Strike Outs-0.0080.002-3.630.0003 R2nR2n 0.49 337 Do strike outs affect salary? 95% 5% 1.Yes 2.No 0%

28 Coefficients.e.tp-value (sig) (Intercept)103.3245.60.420.67 Average age4.523.221.400.17 % with HS Degree-0.0620.81-0.0760.94 Average Income0.0190.0101.860.070 % Black0.360.480.730.47 % Female-1.055.56-0.190.85 Avg. Price of Cigarettes-3.251.03-3.160.0029 R2nR2n 0.32 50 Does %Female affect Cigarette Sales? 11% 89% 1.Yes 2.No 0%

29 PRACTICE 2: SIGNIFICANT COEFFICIENTS IN TABLES

30 Does Total Employment affect CEO Compensation? 1.Yes 2.No 86% 14%

31 Does Restructuring Affect Firm ROA? 1.Yes 2.No 14% 86%

32 Does firm sales growth affect the length of CEO tenure? 1.Yes 2.No 75% 25%

33 Does Total Employment affect CEO Compensation? 1.Yes 2.No 82% 18%

34 Are employees more aggressive when their job is stressful? 1.Yes 2.No 44% 56%

35 Does employee turnover affect Firm Productivity? 1.Yes 2.No 91% 9%

36 PRACTICE 3: INTERPRETING COEFFICIENTS

37 High values of 1983 centralization product a(n) ….. in current centralization 1.Increase 2.Decrease 2% 98%

38 Corporations are more likely to enter petitions when their market share is… 1.High 2.Low 81% 19%

39 Starting compensation is a good predictor of current compensation? 1.True 2.False 68% 32%

40 Managers at larger firms get paid more? 1.True 2.False 18% 82%

41 More centralized companies invest more in Research? 1.True 2.False 60% 40%

42 Participant Scores 15Participant 313C7D 15Participant 313C99 15Participant 254CFE 15Participant 313C41 15Participant 313CB2

43 Fastest Responders (in seconds)

44 Team Scores 14.24Group 2 13.23Group 4 13.15Group 7 12.48Group 8 12.13Group 1 11.72Group 3 11.7Group 5 11.17Group 6

45 Team MVP PointsTeamParticipant 15Group 2313C7D 15Group 4313C99 15Group 7313CB2 14Group 8313D44 15Group 1313C41 14Group 3313C84 14Group 52D180F 14Group 6254D62

46 OLS VIOLATIONS & OTHER ISSUES

47 Assumptions of OLS Regression. correctly specified model linear relationship  Errors are normally distributed Errors have mean of 0: E(ε i )=0 Homoscedastic: Var(ε i )=σ 2 Uncorrelated Errors: Cov(ε i,ε i )=0 No multicollinearity MG461, Week 3 Seminar47

48 When is a model linear? Linear in the parameters Transformations of x and/or y variables can turn a relationship that isn’t linear initially into one that is linear in the parameters

49 Example: The Challenger disaster

50 Example: Challenger Shuttle disaster 30°

51 What the managers didn’t see…

52 Diagnosis of Non-linearity and/or Errors not normally distributed Theoretical expectations Scatterplots of y against x variables prior to estimating model Scatterplot of y i -hat against e i -hat (predicted y-values against predicted residuals) Normal Probability Plot

53 Example: Number of Supervisors & Number of Employees

54 Re-estimated, including x 2

55 Solutions to Non-linearity Better Model of Structure (transformations) Exponential (squared, cubed) Logs or natural logs (heteroscedasticity) Proportional scaling (divide by x or y) If outliers cause the problem, omit them or use robust regression

56 Assumptions of OLS Regression. correctly specified model linear relationship Errors have mean of 0: E(ε i )=0 Homoscedastic: Var(ε i )=σ 2 Uncorrelated Errors: Cov(ε i,ε i )=0 No multicollinearity MG461, Week 3 Seminar56

57 Diagnosis of Heteroscedasticity (like non-linearity) Theoretical expectations Scatterplots of y against x variables prior to estimating model Scatterplot of y i -hat against e i -hat (predicted y- values against predicted residuals) Scatterplot of x i against e i -hat (observed x-values against predicted residuals) Normal Probability Plot Statistical Tests (Breusch Pagan, White, Goldfeld Quant)

58 OLS estimates of Regression Line MG461, Week 3 Seminar 58 Salary = -34 + 27.47*Runs

59 Distribution of D.V. (Salary)

60 Normal Probability Plot of Salary

61 Baseball Salary and Performance: Residuals vs. Fitted Values

62 Transformed Dependent Variable log(Salary) = 5.3 + 0.026*Runs

63 Residual Plot of model with Log (Salary)

64 Normal Probability Plot of Residuals

65 Another Example: Salary Coefficients.e.tp-value (sig) Intercept11031.81383.2228.790.000 Years Experience546.1830.5217.900.000 HS Degree-2996.21411.75-7.280.000 B.S. Degree147.82387.660.380.705 Management (1=Yes)6883.53313.921.900.000 R2nR2n 0.957 46

66 Plot of Residuals vs. Education (I.V.)

67 Plot of Residuals vs. Education × Manager

68 Solution: Include Interaction Term Coefficients.e.tp-value (sig) Intercept11023.5079.07141.70.000 Years Experience496.985.5789.30.000 HS Degree-1730.69105.33-16.40.000 B.S. Degree-349.0397.57-3.60.0009 Management (1=Yes)7047.32102.6068.70.000 HS + Management-3066.04149.33-20.50.000 BS + Management1836.49131.1714.00.000 R2nR2n 0.999 46

69 Results from Salary Model

70 Solutions for Heteroscedasticity: Better Model of Structure: Interaction terms Transformation Robust Standard Errors Weighted GLM ARCH models (in time series)

71 Assumptions of OLS Regression. correctly specified model linear relationship Errors have mean of 0: E(ε i )=0 Homoscedastic: Var(ε i )=σ 2 Uncorrelated Errors: Cov(ε i,ε i )=0 No multicollinearity MG461, Week 3 Seminar71

72 Violation 2: Errors not Independent Across time Across cases (diffusion, network models) Time series data, panel data, cluster samples, hierarchical data, repeated measures data, longitudinal data, and other data with dependencies

73 Example: Consumer Spending vs. Money

74 Diagnosis & Solutions: Diagnosis Type of Data Durbin-Watson Statistic Residual Plots Solution Incorporate dependencies into estimates Difference Variables (Cochrane-Orcutt) Variables for Seasonality Various Time Series Models Various network/spatial dependence models Structural Models (SUR, SEM) GLS (generalized least squares)

75 Assumptions of OLS Regression. correctly specified model linear relationship Errors have mean of 0: E(ε i )=0 Uncorrelated Errors: Cov(ε i,ε i )=0 Homoscedastic: Var(ε i )=σ 2 No multicollinearity MG461, Week 3 Seminar75

76 Problem: Multicollinearity Diagnosis High Correlation between two or more IVs Standard errors “blow up” Large changes in coefficients between estimated models Statistical tests (VIF) Solutions Are the two x’s measuring the same thing: create an index or use PCA Get more data! Centering of x variables Instrumental variables


Download ppt "Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe."

Similar presentations


Ads by Google