R-Squared Example Measure of proportion of variance in Y explained by the IVs Coef. St.Err T P Bush FT-.165.019 -8.72 0.000 Party Identification7.354.278 26.44 0.000 Constant65.28.962 67.89 0.000 FULL SAMPLE Coef. St.Err T P Bush FT-.090.489 -0.18 0.860 Party Identification12.31 7.47 1.65 0.143 Constant50.16 18.23 2.75 0.028 10 Random Cases R 2 =.5336
First, we need the variance of Y Mean = 66, so: Obama FT = 50.16 + (-.090)(Bush FT) + 12.31(Party Identification) Observed (Observed- Mean) (Observed- Mean) 2 50-16256 30-361296 100341156 50-16256 70416 30-361296 8519361 100341156 8519361 60-636 Variance =6190
Bush FTPIDPredictedObserved (Observed- Predicted) (Observed- Predicted) 2 25.000.0047.9250.002.084.32 0.002.0074.7830.00-44.782005.64 0.003.0087.10100.0012.90166.53 40.000.0046.5850.003.4211.71 30.002.0072.1070.00-2.104.39 60.0032.4830.00-2.486.13 0.003.0087.1085.00-2.104.39 0.003.0087.10100.0012.90166.53 1.00 62.3885.0022.62511.49 0.001.0062.4760.00-2.476.12 SSR (Sum of Squared Residuals) =2887.25 Variance of Y = 6190 R 2 = (6190-2887.25) 6190 =.5336
What is a good R 2 ? Predict feelings about Obama with: –Party ID and feelings about Bush –Education –Zodiac sign
Non-continuous IVs Dealing with Dichotomous and Nominal Variables
Democratic Peace Is sum of democracy scores the right measure? Alternative: Are the pair of countries both democracies? Indicator/dummy/dichotomous variable: –1 if both countries have democracy scores >5 –0 otherwise
Dichotomous IV Coef SE Coef T P Democratic Pair (1=yes) 5.18 0.362 14.31 0.000 Constant 24.35 0.171 142.45 0.000 R-squared = 0.0057 Coef SE Coef T P Democratic Pair (1=yes) 4.74 0.369 12.84 0.000 Military Spending ($mil) 0.053 0.002 25.59 0.000 Constant 22.21 0.204 108.98 0.000 R-squared = 0.0242 DV: Years at peace
Nominal variables Speed dating survey: You have 100 points to distribute among the following attributes -- give more points to those attributes that are more important in a potential date, and fewer points to those attributes that are less important in a potential date. Attractive Fun Intelligent Sincere Ambitious Shared Interests
How do peoples perspective/goals affect whats important to them? What is your primary goal in participating in this event? –Seemed like a fun night out=1 –To meet new people=2 –To get a date=3 –Looking for a serious relationship=4 –To say I did it=5 Does this make sense as a linear scale?
Who is likely to say each of the following is important? Attractiveness? Fun? –Seemed like a fun night out=1 –To meet new people=2 –To get a date=3 –Looking for a serious relationship=4 –To say I did it=5 Does this make sense as a linear scale?
Effects of Nominal Variable One Variable: Seemed like a fun night out=1 To meet new people=2 To get a date=3 Looking for a serious relationship=4 To say I did it=5 Five Variables: Seemed like a fun night out (1=yes) To meet new people (1=yes) To get a date (1=yes) Looking for a serious relationship (1=yes) To say I did it (1=yes)
Importance of Attribute = β 0 + β 1 (Seemed Fun) + β 2 (Meet People) + β 3 (Date) + β 4 (Serious Relationship) + β 5 (Say Did) + u What would β 0 correspond to in this model?
Reference Group Leave one indicator out Importance of Attribute = β 0 + β 1 (Seemed Fun) + β 2 (Meet People) + β 3 (Date) + β 4 (Serious Relationship) + β 5 (Say Did) + u
(Remember: reference group is to say I did it) AttractivenessCoef. SE Coef.Tp Seemed Fun-4.0110.883-4.540.000 Meet People-3.8430.891-4.310.000 Date-3.1861.033-3.090.002 Serious Relationship-6.3201.084-5.830.000 Constant22.5660.84626.680.000 What if we want to know whether people who want a date and those who want a serious relationship differ in how important they think attractiveness is?
Easiest way: change reference category Importance of Attribute = β 0 + β 1 (Seemed Fun) + β 2 (Meet People) + β 3 (Date) + β 4 (Serious Relationship) + β 5 (Say Did) + u AttractivenessCoef.SE Coef.Tp Seemed Fun2.3090.7233.190.001 Meet People2.4770.7333.380.001 Date3.1340.9003.480.001 Say I Did6.3201.0845.830.000 Constant16.2460.67823.950.000 Do people who want a date and those who want a serious relationship differ in how important they think attractiveness is?
Nominal and Dichotomous IVs AttractivenessCoef. SE Coef.Tp Seemed Fun1.8520.6962.660.008 Meet People2.5160.7053.570.000 Date2.9980.8653.460.001 Say I Did6.3031.0426.050.000 Gender (1=male)4.6890.32614.380.000 Constant14.0840.66921.060.000 Estimated points allocated to attractiveness for men who attended because it seemed fun?
F-Tests Testing the joint significance of variables
F-test Way of testing joint significance of variables – i.e., whether set of variables significantly improve explanatory power When to use: –Nominal variables –Variables likely to be highly correlated, but important predictors
Terminology Unrestricted model – includes IVs you want to test joint significance of Restricted model – same model, excluding IVs to be tested SSR – Sum of Squared Residuals
Formula q = # of variables being tested n = number of cases k = number of IVs in unrestricted F = (SSR r - SSR ur )/q SSR ur /(n-(k+1)
Who values fun people? FunCoef. SE Coef.Tp Seemed Fun0.5370.3491.540.124 Meet People-0.0580.354-0.170.869 Date-1.2350.434-2.840.004 Say I Did-0.2710.523-0.520.605 Gender (1=male)0.2540.1641.550.121 Constant17.1390.33651.060.000 What if we want to know whether the reason for attending variables as a group improve the explanatory power of the model?
q = # of variables being tested n = number of cases k = number of IVs in unrestricted F = (SSR r - SSR ur )/q SSR ur /(n-(k+1) UNRESTRICTEDSum of SquaresdfMS Model672.0785134.4156 Residual40819.896247816.47292 Total41491.974248316.71042 RESTRICTED RestrictedSum of SquaresdfMS Model62.841162.84063 Residual41429.133248216.69183 Total41491.974248316.71042 F = (41429.133 - 40819.896)/4 40819.896 /(2484-(5+1)) = 9.25
Statistical significance of F-test What does an F value of 9.25 mean? Similar idea to a t-test, but shape of F- distribution depends (heavily) on degrees of freedom –Numerator = number of IVs being tested –Denominator = N-(number of IVs)-1 –Here: 4 and 2478 ( 2484-5-1 )
Look up critical value in a table or use Minitab Calc Probability Distributions F Note: this will give you area under the curve up to your F-test, so use 1-p Cumulative Distribution Function F distribution with 4 DF in numerator and 2478 DF in denominator x P( X <= x ) 9.25 1.00000
Notes and Next Time Graded homework will be handed back next time and model answers will be posted online early next week New homework will be handed out next time (and due next Thursday) Next time: –Functional form in multivariate regression