Download presentation
Presentation is loading. Please wait.
Published byMary Newton Modified over 9 years ago
1
Unit 6: The basics of multiple regression Class 14… Class 15…
© Andrew Ho, Harvard Graduate School of Education
2
Where is Unit 6 in our 11-Unit Sequence?
The basics of multiple regression Unit 7: Statistical control in depth: Correlation and collinearity Unit 10: Interaction and quadratic effects Unit 8: Categorical predictors I: Dichotomies Unit 9: Categorical predictors II: Polychotomies Unit 11: Regression in practice. Common Extensions. Unit 1: Introduction to simple linear regression Unit 2: Correlation and causality Unit 3: Inference for the regression model Building a solid foundation Unit 4: Regression assumptions: Evaluating their tenability Unit 5: Transformations to achieve linearity Mastering the subtleties Adding additional predictors Generalizing to other types of predictors and effects Pulling it all together © Andrew Ho, Harvard Graduate School of Education
3
In this unit, we’re going to cover…
Various representations of the multiple regression model: An algebraic representation A three dimensional graphic representation A two dimensional graphic representation Multiple regression—how it works and helps improve predictions Estimating the parameters of the multiple regression model Holding predictors constant—what does this really mean? Plotting the fitted multiple regression model: Deciding how to construct the plot Choosing prototypical values Learning how to actually construct the plot (and interpret it correctly!) 𝑅2 and the Analysis of Variance (ANOVA) in multiple regression Inference in multiple regression The omnibus 𝐹-test in multiple regression Individual 𝑡-tests How might we summarize MR results in both tables and figures? © Andrew Ho, Harvard Graduate School of Education SEGUE: To do this, we need to learn about hypothesis testing—which is the focus of this unit. But to set our work in context, let’s begin with an example
4
US News and World Report education school rankings from 2006
How do student characteristics like GRE scores and size of the doctoral class predict peer ratings of Ed Schools? Do schools gain in reputation for graduating large numbers of high-achieving students? © Andrew Ho, Harvard Graduate School of Education
5
As always, our starting point
© Andrew Ho, Harvard Graduate School of Education
6
Scatterplot matrix: graph matrix
© Andrew Ho, Harvard Graduate School of Education
7
Use graph combine to be more succinct.
Given the variable (enrollment) and the shape of these plots, a log transformation is worth exploring. © Andrew Ho, Harvard Graduate School of Education
8
Logarithmic transformation of the docgrad variable
Comparison of regression models predicting US News peer ratings from the size and log(size) of the doctoral cohort n=87 Model A on docgrad Model B on log(docgrad) Estimated Slope Estimated S.E. 𝑡 statistic .687* (.128) 5.36 27.274* (5.649) 4.83 𝑅2 25.27% 21.52% * p<.001 A marginal case. Residuals suggest the transformation results in a modest improvement. Keep the original docgrad variable to simplify interpretation. © Andrew Ho, Harvard Graduate School of Education
9
From simple to multiple regression
𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 = 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 = 𝛽 0 + 𝛽 1 𝑔𝑟𝑒+ 𝛽 2 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 More generally, let X1, X2, … Xk represent k predictors How does multiple regression help us? Simultaneous consideration of many contributing factors We explain more of the variation in 𝑌 Equivalently, more accurate predictions (smaller residuals) Provides a separate understanding of each predictor, accounting for the effects of other predictors in the model (an attempt at holding the values of the other predictors constant) Allows us to build models that can help support theories and hypotheses about causal and associative mechanisms. © Andrew Ho, Harvard Graduate School of Education
10
The stata syntax: regress peerrate gre docgrad (𝑌, then 𝑋s)
Population Model: 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒= 𝛽 0 + 𝛽 1 𝑔𝑟𝑒+ 𝛽 2 𝑑𝑜𝑐𝑔𝑟𝑎𝑑+𝜖 Sample Prediction Equation: 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑 A 10-point increment in average GREs predicts a 6.14-point increment in peer ratings, assuming docgrad can be held constant. (Simple linear regression coefficient: 6.91). A 10-student increment in the size of the graduating doctoral cohort predicts a 5.4-point increment in peer ratings, assuming gre can be held constant. (Simple linear regression coefficient: 6.87) © Andrew Ho, Harvard Graduate School of Education
11
© Andrew Ho, Harvard Graduate School of Education
From Unit 1: Three “best-fit” regression lines. The line you want, and why. The OLS criterion minimizes the sum of vertical squared residuals. Other definitions of “best fit” are possible: Vertical Squared Residuals (OLS) Horizontal Squared Residuals (X on Y) Orthogonal Residuals © Andrew Ho, Harvard Graduate School of Education
12
Minimize vertical squared residuals to the best fit plane
𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑 © Andrew Ho, Harvard Graduate School of Education
13
Regression decomposition from Unit 1
Week 1: Regression decomposition from Unit 1 Plane to mean-plane + Point to mean-plane = Point to plane ∑ 𝑌 𝑖 − 𝑌 = ∑ 𝑌 𝑖 − 𝑌 ∑ 𝑌 𝑖 − 𝑌 𝑖 2 Sum of Squares Total (𝑆𝑆𝑇) = Sum of Squares Model (𝑆𝑆𝑀) + Sum of Squares Error (𝑆𝑆𝐸) ∑ 𝑌 𝑖 − 𝑌 2 ∑ 𝑌 𝑖 − 𝑌 2 = 𝑆𝑆𝑀 𝑆𝑆𝑇 = 𝑅 2 The proportion of total variation that is accounted for by the model. What is SSE SST ? 𝑆𝑆𝐸 𝑆𝑆𝑇 = 𝑆𝑆𝑇−𝑆𝑆𝑀 𝑆𝑆𝑇 =1− 𝑆𝑆𝑀 𝑆𝑆𝑇 =1− 𝑅 2 …
14
Analysis of Variance regression decomposition in Stata
Week 1: Analysis of Variance regression decomposition in Stata = 𝑆𝑆𝑀 𝑆𝑆𝑇 =1− 𝑆𝑆𝐸 𝑆𝑆𝑇 = =58.22% Analysis of Variance regression decomposition ∑ 𝑌 𝑖 − 𝑌 2 =∑ 𝑌 𝑖 − 𝑌 2 +∑ 𝑌 𝑖 − 𝑌 𝑖 2 Interpreting R2 58.22 percent of the variation in the peer ratings is “attributable to” or “accounted for by” or “explained by” or “associated with” or “predicted by” the size of the doctoral cohort and the GRE scores of the admits. What about the remaining 41.78%? Research, funding, famous people, location, history, measurement error, random error, individual variation, alien abductions… Error is what we haven’t modeled yet. 175172 ≈ 73190 The Ubiquitous 𝑹𝟐 The variance of 𝑌 that is accounted for by 𝑋… The single most widespread and easily interpretable summary statistic derivable from a single regression analysis. Essential to describing the overall predictive function of the model. © Andrew Ho, Harvard Graduate School of Education
15
Multiple regression supports inferences about “statistical control”
𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑 A 10-point increment in average GREs predicts a 6.14-point increment in peer ratings, assuming 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 can be held constant. (Simple linear regression coefficient: 6.91). A 10-student increment in the size of the graduating doctoral cohort predicts a 5.4-point increment in peer ratings, assuming 𝑔𝑟𝑒 can be held constant. (Simple linear regression coefficient: 6.87) Even accounting for one variable, the other variable still has predictive utility, and vice versa. Let’s see what the model implications are for “holding docgrad constant” or “account/adjust for docgrad”… for a typical small school, a midsized school, and a large school. © Andrew Ho, Harvard Graduate School of Education
16
Distribution of the size of the doctoral cohort in 2004
Let’s pick a small school with a doctoral cohort of 20, a midsized school with a doctoral cohort of 45, and a large school with a doctoral cohort of 80. © Andrew Ho, Harvard Graduate School of Education
17
© Andrew Ho, Harvard Graduate School of Education
Conditional Regression Lines: Visualizing the multiple regression model 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑 Small school, 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 = 20 Midsized school, 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 = 45 Large school, 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 = 80 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒 =− 𝑔𝑟𝑒 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒 = 𝑔𝑟𝑒 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒 = 𝑔𝑟𝑒 Large school Midsized school Small school © Andrew Ho, Harvard Graduate School of Education
18
From simple to multiple regression
𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 = 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 = 𝛽 0 + 𝛽 1 𝑔𝑟𝑒+ 𝛽 2 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 More generally, let X1, X2, … Xk represent k predictors How does multiple regression help us? Simultaneous consideration of many contributing factors We explain more of the variation in 𝑌 Equivalently, more accurate predictions (smaller residuals) Provides a separate understanding of each predictor, accounting for the effects of other predictors in the model (an attempt at holding the values of the other predictors constant) Allows us to build models that can help support theories and hypotheses about causal and associative mechanisms. © Andrew Ho, Harvard Graduate School of Education
19
© Andrew Ho, Harvard Graduate School of Education
Our Data: Peer Rating If each of you were a school of education… With peer ratings, cohort sizes, and average GRE scores determined by your location in the 3D space of Larsen G08… © Andrew Ho, Harvard Graduate School of Education
20
© Andrew Ho, Harvard Graduate School of Education
Our Data: Cohort Size © Andrew Ho, Harvard Graduate School of Education
21
© Andrew Ho, Harvard Graduate School of Education
Our Data: Average GREs © Andrew Ho, Harvard Graduate School of Education
22
Our Data: Peer Rating on Cohort Size
This is clearly a weak, near zero relationship, far weaker than the empirical relationship between peer rating and cohort size. How can you visualize this in this classroom? © Andrew Ho, Harvard Graduate School of Education
23
Our Data: Peer Rating on Average GRE
This is a weak-to-moderate relationship, slightly stronger than the empirical relationship between peer rating and average GRE scores. How can you visualize this in this classroom? © Andrew Ho, Harvard Graduate School of Education
24
Multiple Regression: Finding the best-fit (hyper)plane
Vertical Squared Residuals (OLS) © Andrew Ho, Harvard Graduate School of Education
25
Multiple Regression on Our Data in Stata
𝑜𝑢𝑟𝑟𝑎𝑡𝑒 = 𝑜𝑢𝑟𝑠𝑖𝑧𝑒+.477𝑜𝑢𝑟𝑔𝑟𝑒 A 100-point increment in average GREs predicts a 47.7-point increment in peer ratings, assuming 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 can be held constant. A 100-student increment in the size of the graduating doctoral cohort predicts a 10.8-point increment in peer ratings, assuming 𝑔𝑟𝑒 can be held constant. 48.07% of the variation in ratings is accounted for by the two predictor variables. On your own and in discussion and consultation with your neighbors: Write down your name, and then estimate your PeerRate, DocGrad, and GRE values from your location in this room. Use the prediction equation above and calculate your 𝑌 (your predicted peer rating, your fitted value). Write down your residual. How can you interpret this residual? © Andrew Ho, Harvard Graduate School of Education
26
Constructing a conditional regression plot
- Select 2-5 prototypical values - Substantively interesting values - A range of percentiles (10 or 25, 50, 75 or 90) - The sample mean ± .5 or 1 standard deviation - Easily communicated values, whole numbers or fractions. - When the variable is already ordinal (ordered categories, e.g., coded 0, 1, and 2 for small, medium, and large), so much the better Large (80) Midsized (45) Small (20) We can plot peerrate on gre with different docgrad lines, or we can plot peerrate on docgrad with different gre lines. Generally, we place the primary predictor of interest on the 𝑋 axis (let’s say gre) and the “control” predictor or “covariate” on the legend. With our data, we can pick prototypical oursize values of 20, 60, and 100. © Andrew Ho, Harvard Graduate School of Education
27
Conditional regression lines for our data
Large schools (oursize = 100)* Small schools (oursize = 20)* * Differences between conditional regression lines are not statistically significant. © Andrew Ho, Harvard Graduate School of Education
28
Conditional regression lines for our data
High GRE Schools (ourgre = 700)* Mid-Range GRE Schools (ourgre = 600)* Low GRE Schools (ourgre = 500)* * Slopes of conditional regression lines are not statistically significant (cannot be distinguished from 0). © Andrew Ho, Harvard Graduate School of Education
29
Interpreting the conditional regression plot for empirical data
Large (80) Midsized (45) Small (20) We imagine that the plot extends into the slide along a third axis, docgrad, where larger schools are deeper and smaller schools are closer (or vice versa). The regression model fits the best fit *plane* through this scatterplot in three dimensional space, minimizing the vertical squared residuals between each point and the plane. Instead of a single regression line of peerrate on gre, we have a regression line for every level of docgrad, extending along the plane. Note that the lines are parallel, and they must be given our regression model: 𝑌= 𝛽 0 + 𝛽 1 𝑋 1 + 𝛽 2 𝑋 2 +…+ 𝛽 𝑘 𝑋 𝑘 +𝜖 We loosen this assumption in Unit 10. © Andrew Ho, Harvard Graduate School of Education
30
Interpreting a conditional regression plot
Adding the other predictor improves prediction, reduces residuals, and reduces residual variance, but residual variance obviously remains. The magnitude of the axis predictor can be seen as the slope. The magnitude of the legend predictor can be seen as the spacing between the lines. Large (80) Midsized (45) Small (20) © Andrew Ho, Harvard Graduate School of Education
31
Illustrative contrasts in regression lines
Prediction of peer rating by average GRE scores for large schools (docgrad = 80) Prediction of peer rating by average GRE scores without accounting for school size. The unconditional regression line. This is not the prediction for an average-sized school. This is the prediction if we knew nothing about size! Prediction of peer rating by average GRE scores for the average-size schools (docgrad = 45.68). The conditional regression line. Prediction of peer rating by average GRE scores for small schools (docgrad = 20) © Andrew Ho, Harvard Graduate School of Education
32
Another perspective on the implications of the same model…
Let’s pick a low-GRE school of 500, a midrange GRE school of 550, and a high GRE school of 600. © Andrew Ho, Harvard Graduate School of Education
33
Another perspective on conditional regression lines
𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− 𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑 Low-scoring school, 𝑔𝑟𝑒 = 500 Midrange school, 𝑔𝑟𝑒 = 550 High-scoring school, 𝑔𝑟𝑒 = 600 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− (500) +.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑= 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− (550) +.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑= 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒 =− (600) +.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑= 𝑑𝑜𝑐𝑔𝑟𝑎𝑑 A 10-person increment in the size of the doctoral cohort predicts a 5.4-point increment in the predicted mean peer rating by deans, assuming average GREs can be held constant. An 50-point increment in average GRE scores predicts a 30.7-point increment in the predicted mean peer rating by deans, assuming the size of the doctoral cohort can be held constant. High-scoring school (600) Midrange school (550) Low-scoring school (500) © Andrew Ho, Harvard Graduate School of Education
34
© Andrew Ho, Harvard Graduate School of Education
Week 1: ANalysis Of VAriance (ANOVA) regression decomposition for multiple regression MS stands for Mean Square, an average sum of squares, like a variance. In fact, MS Total *is* your 𝑌 variance, the unconditional variance of the peerrate variable, the square of the standard deviation: 𝑌 𝑖 − 𝑌 2 𝑛−1 = MS Model, the model mean square, is a measure of the variation accounted for by the model (not easily interpretable on the 𝑌 scale). The bigger the better. MS Residual, your residual variance, is interpretable as error variance (𝑅𝑀𝑆𝐸2) Analysis of Variance Total Residual (Error) Model (Regression) Sum of Squares Source Everything to this point is the same as simple linear regression, except that we are paying more attention to the mean squares. MST=SST/dfSST n – 1 MSE=SSE/dfSSE n – k – 1 MSR=SSR/dfSSR k ~ #predictors Mean Square df © Andrew Ho, Harvard Graduate School of Education
35
The 𝐹 distribution, a sampling distribution for variance ratios
Analysis of Variance Total Residual (Error) Model (Regression) Sum of Squares Source MST=SST/dfSST n – 1 MSE=SSE/dfSSE n – k – 1 MSR=SSR/dfSSR k ~ #predictors Mean Square df F distribution: Sampling distributions: If we sample two variances (MSR and MSE) from two populations with equal variances, and if we take repeated ratios (MSR/MSE), sometimes the numerator (MSR) will be larger, and sometimes the denominator (MSE) will be larger. The distribution will be loosely centered on 1 and will always be positive. If one variance is craaaaazy bigger than another, we might conclude that our null hypothesis of equal population variances is incorrect and accept the alternative: unequal population variances. This is the sample size from the first distribution (-1) This is the sample size from the second distribution (-1) © Andrew Ho, Harvard Graduate School of Education
36
The omnibus 𝐹 test for multiple regression
Analysis of Variance Total Residual (Error) Model (Regression) Sum of Squares Source MST=SST/dfSST n – 1 MSE=SSE/dfSSE n – k – 1 MSR=SSR/dfSSR k ~ #predictors Mean Square df MSR and MSE are scaled (by dividing by 𝑑𝑓) such that, under the null hypothesis, they act like sample variances from populations with equal variance. This particular 𝐹 statistic represents variance accounted for by the regression model over error variance (unaccounted for). In other words, good variance over bad variance. We want F to be as large as possible. Under the null hypothesis, there is no predictive value to *any* of your predictors in the population: If the model variance is sufficiently greater than the error variance, we can reject We accept One or more population slopes are nonzero. This is 𝑘, the number of predictors This is 𝑛−𝑘−1, a quantity that increases with sample size © Andrew Ho, Harvard Graduate School of Education
37
© Andrew Ho, Harvard Graduate School of Education
The omnibus 𝐹 test Null hypothesis: The predictors account for no Y variance in the population. The predictors have no predictive utility in the population. Test statistic: 𝐹 = 𝑀𝑆𝑅/𝑀𝑆𝐸. Ratio of regression model variance to error variance. Variance accounted for over variance unaccounted for. Decision rule: If 𝐹 > 𝐹𝑐𝑟𝑖𝑡, reject 𝐻0 in favor of 𝐻𝑎: one or more population slopes are nonzero; the model has some predictive utility, some 𝛽 𝑗 ≠0 Here 𝑗 indexes one of 𝑘 predictor slopes. Critical values of 𝑭 (α=.05) 1000 120 20 10 5 4 3 2 1 𝑑𝑓 for numerator (MSR) inf 100 50 25 𝑑𝑓 for denominator (MSE) Stata returns 𝐹 critical values as: display invFtail(k, n-k-1, .05) 1.00 1.30 1.56 1.72 1.22 1.38 1.64 1.77 1.57 1.68 1.78 2.01 1.83 1.93 2.03 2.24 2.21 2.31 2.40 2.60 2.37 2.46 2.56 2.76 2.70 2.79 2.99 3.00 3.09 3.18 3.39 3.84 3.94 4.03 4.24 k ~ number of predictors n-k-1 The probability of sampling an 𝐹 ratio of or larger under the null hypothesis of no predictive utility is … very, very low, so we reject the null hypothesis and conclude that one or more predictors account for some variance in the population. © Andrew Ho, Harvard Graduate School of Education
38
The omnibus F test vs. slope tests
Omnibus F test Across all of my predictors, is regression helping at all in the population? t-tests for slopes In the population, does this variable help prediction above and beyond other included variables? In the population, regression does not help at all In the population, this set of predictors has some predictive utility In the population, this predictor has no effect when accounting for other predictors in the model. In the population, this predictor has an effect when accounting for other predictors in the model. © Andrew Ho, Harvard Graduate School of Education
39
Three equivalent tests in simple linear regression
Let’s revisit a simple linear regression model analogous to one you’re considering in Assignment #3: Math Achievement (TIMSS) on the log of income (per capita GDP) for countries. Fun fact 2: When k = 1, R-sq = 𝑟2 Test 1: The omnibus 𝐹 test Evaluates whether the any of the populations slopes are nonzero. With only one slope… Test 2: The 𝑡-test for slope Evaluates whether a particular population slope is nonzero… Test 3: The 𝑟-test for correlation (pwcorr command) Evaluates whether a population correlation is nonzero. Fun fact 1: When there is only one predictor (that is, the degrees of freedom in the numerator is 1), then 𝐹= 𝑡 2 , 4.97≈ and the 𝑝-values from the two tests will be identical (.0315). With a single predictor, the significance of the omnibus prediction, the significance of a single slope, and the significance of an association are identical. With k > 1 predictor, this equivalence does not hold. © Andrew Ho, Harvard Graduate School of Education
40
Summarizing regression output from multiple models: estout
Find and install the estout package with the following command: findit estout and scroll past the search results to the third listed package under the Web Resources heading. Click the link, then click the helpful link to the right, labeled, “Click here to install.” As always, you will begin by conducting a thorough review of the univariate and bivariate distributions and statistics, exploring various models and diagnostics, and considering remedial options. At a certain point, you may consider saving promising, helpful, illustrative or otherwise benchmark-setting models for your review. This can be done with the estout package Documentation: © Andrew Ho, Harvard Graduate School of Education
41
Important esttab options
Add a descriptive title and short but descriptive coeflabels. Descriptive model titles (mtitles) should be added for reference and numbers suppressed (nonumbers). With many models, you may need the compress option, although this will limit the length of your coeflabels and mtitles. APA guidelines require that figures and tables be self-contained, so a table note (addnote) can be useful. Finally, note our desired statistics, R-sq, adj-R-sq, F, and our degrees of freedom for the model and for the residual. © Andrew Ho, Harvard Graduate School of Education
42
Other esttab features to note
If you store too many models with eststo, clear its memory with eststo clear. Then you’ll have use eststo quietly regress … to store again from zero. Never forget the importance of help, e.g., help esttab When presenting tables, it is generally more common to include standard errors, not t statistics, in parentheses. To do this, use the se option. Likewise, for parsimony, the F statistic and degrees of freedom are not commonly reported, although they can be helpful for model selection. The R-sq is a necessity. The adjusted R-sq helps in model selection (you’ll soon see why), but it is less commonly reported unless model selection is the focus. © Andrew Ho, Harvard Graduate School of Education
43
Reporting results for the multiple regression model
Reporting statistical significance: The two-predictor model with average GRE score and the size of the doctoral cohort accounts for 58.2% of the variation in peer ratings. The omnibus null hypothesis of no predictive utility can be rejected F(2,84)=58.52, p<.001. (The omnibus p-value is not shown in this table and must be gathered from the original regression output as Prob>F.) The coefficient for the average GRE score is statistically significant, t(84)=8.14, p< The model suggests that an increment of 10 points in average GRE score is associated with an increment of 6.14 points on the peer rating scale, if the size of the doctoral cohort could be held constant. The coefficient for the size of the doctoral cohort is statistically significant, t(84)=5.51, p< The model implies that an increment of 10 students in doctoral degree conferral predicts an increment of 5.40 points on the peer rating scale, accounting for the average GRE score of the students. © Andrew Ho, Harvard Graduate School of Education
44
Beginning to look across the models
Each predictor’s estimated coefficient decreases in magnitude when the other predictor is included in the model. However, the signs are unchanged, the decline does not appear to be substantively significant, and the statistical significance of the two predictors remains unchanged. Our inferences about the prediction of peer ratings are reasonably robust across the model forms and compositions shown in this table. Given that our interpretations are robust to model choice, and the two-predictor model is substantively defensible, we opt for the model that supports the best predictions. The average GRE score alone accounts for 43.1% of the variance in peer ratings, and the size of the graduating doctoral cohort alone accounts for 25.3% of the variance in peer ratings. Together, these two variables account for 58.2% of the variance in peer ratings. Note that 43.1% % = 68.4% of the variance, yet only 58.2% of the variance is accounted for by both variables. Why the difference? The percentages will sum tidily if and only if the predictors have zero correlation with each other, a veritable impossibility with real data outside of a coding error. © Andrew Ho, Harvard Graduate School of Education
45
Diagnostic plots for multiple regression
Our regression assumptions haven’t changed: independent, normally distributed residuals with equal variance centered on 0. Our plots remain the same, as well: residuals vs. fitted values (rvfplot), leverage vs. discrepancy (lvr2plot), and Cook’s distance, to name a few. We speak now in terms of discrepancy from, leverage on, and influence upon the estimated regression plane. © Andrew Ho, Harvard Graduate School of Education
46
© Andrew Ho, Harvard Graduate School of Education
Midterm Checkin Assignment #3 will be passed back by over the next 24 hours. Assignment #4 has been posted, partners mandatory. Notify me by Friday if you do not have a partner. No exceptions. Partnerships take work. Open communication. Constant upkeep. Attendance. Come to class regularly. Resources: Lectures, videos, sections, partnerships, study groups, my office hours. Course Philosophies: Disciplined Perception. Layering. Language. Collaboration. Sensitivity. Statistical vs. Substantive. Exploratory Analysis. © Andrew Ho, Harvard Graduate School of Education
47
What are the takeaways from this unit?
Multiple regression holds several advantages over single-predictor regression Simultaneous consideration of multiple correlated predictors Improved prediction of outcomes Parsimony in model expression Support of inferences like, “the effect of X while accounting for Z” More control over models consistent with our theories and hypotheses Construction and conceptualization of 2D and 3D representations of two-predictor models in terms of best-fit planes ANOVA decomposition and R-sq for multiple regression The omnibus F test vs. t-tests for slopes The estout package and the tabular juxtaposition of models Regression diagnostics in multiple regression © Andrew Ho, Harvard Graduate School of Education
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.