Presentation on theme: "Economics 105: Statistics GH 24 due Wednesday. Hypothesis Tests on Several Regression Coefficients Consider the model (expanding on GH 22) Is “race” as."— Presentation transcript:
Multiple Regression: Example where Sign Switches Correlations Rating Age Income Rating 1.000 0.587 0.885 Age 0.587 1.000 0.829 Income 0.885 0.829 1.000 Survey of 75 consumers Rating = rating of likelihood of purchase of a PDA (e.g., palm pilot) on a scale of 1-10, 10 indicating highest likelihood. Age = age in years Income = income in thousands of dollars
Multiple Regression: Example where Sign Switches Regression of Rating on Age Estimate Std Error t Ratio Prob>|t| Intercept 2.067 0.487 4.24 <.0001 Age 0.059 0.009 6.19 <.0001 Regression of Rating on Income Term Estimate Std Error t Ratio Prob>|t| Intercept -0.596 0.352 -1.69 0.0951 Income 0.070 0.004 16.20 <.0001
Multiple Regression: Example where Sign Switches Multiple Regression Estimates Term Estimate Std Err t Ratio Prob>|t| Intercept -0.736 0.295 -2.50 0.0149 Age -0.047 0.008 -5.74 <.0001 Income 0.101 0.006 15.63 <.0001 Conclusions?
Variable Selection (or Model Building) OLS Assumption #1 (and #2 and #5) Use theory and prior research Use your hypotheses But what if you don’t have much theoretical guidance? –Parsimony=f(simplicity, fit) –Using adj R 2 … fit, controlling for complexity
Empirical Indicators in Model Building When adding a variable, check for: –Improved prediction (increase in adj R 2 ) –Statistically and substantively significant estimated coefficients –Stability of model coefficients Do other coefficients change when adding the new one? Particularly look for sign changes
Risks in Model Building Including irrelevant X’s –Increases complexity –Reduces adjusted R 2 –Increases model variability across samples Omitting relevant X’s –Fails to capture fit –Can bias other estimated coefficients Where omitted X is related to both other X’s and to the dependent variable (Y)
More Risks: Samples Can Mislead Remember: we are using sample data –About 5% of the time, our sample will include random observations of X’s that result in betahat’s that meet classical hypothesis tests –Or the beta’s may be important, but the sample data will randomly include observations of X that do not meet the statistical tests That’s why we rely on theory, prior hypotheses, and replication