Presentation on theme: "Introduction to Statistics: Political Science (Class 5)"— Presentation transcript:
1 Introduction to Statistics: Political Science (Class 5) Non-Linear Relationships
2 Thus far Focus on examining and controlling for linear relationships Each one unit increase in an IV is associated with the same expected change in the DVOrdinary-least-squares regression can only estimate linear relationshipsBut, we can “trick” regression into estimating non-linear relationships buy transforming our independent (and/or dependent) variables
3 When to transform an IV Theoretical expectation Look at the data (sometimes tricky in multivariate analysis or when you have thousands of cases)Today: three types of transformationsLogarithmSquared termsConverting to indicator variables
4 LogarithmThe power to which a base must be raised to produce a given valueWe’ll focus on natural logarithms where ln(x) is the power to which e ( ) must be raised to get xln(4) = because e1.386 = 4
5 1 5 in original measure = 1.609 change in logged value So the effect of a change in a 1 unit change x depends on whether the change is from 1 to 2 or 2 to 3Υ = β0 + β1ln(x) + u
6 When to log an IV “Diminishing returns” as X gets large Data is skewed – e.g., income
7 Income and home value $60,000/year $200,000 home Bill Gates makes about $175 million/year$175,000,000 = 2917 x $60,000Should we expect him to have a 2917 x $200,000 ($583,400,000) home?
18 Quadratic (squared) models Curved like logarithmKey difference: quadratics allow for “U-shaped” relationshipEnter original variable and squared termAllows for a direct test of whether allowing the line to curve significantly improves the predictive power of the model
20 Age and Political Ideology Coef.SETPAge-0.0070.004-1.7400.082Constant0.1220.2090.5800.561What would we conclude from this analysis?Coef.SETPAge-0.0650.025-2.6300.009Age-squared0.0010.0002.3900.017Constant1.5540.6352.4500.015
21 Age and Political Ideology Coef.SETPAge-0.0650.025-2.6300.009Age-squared0.0010.0002.3900.017Constant1.5540.6352.4500.015AgeAge2-0.065*Age*Age2ConstantPredicted Value18324-1.1780.1811.5540.55728784-1.8320.4370.159381444-2.4870.805-0.128482304-3.1411.284-0.303583364-3.7951.875-0.366684624-4.4502.577-0.319786084-5.1043.391-0.159
23 Age and Political Ideology Coef.SETPAge-0.0650.025-2.6300.009Age-squared0.0010.0002.3900.017Constant1.5540.6352.4500.015Note: We are using two variables to measure the relationship between age and ideology.Interpretation:statistically significant relationship between age and ideology (can confirm with an F-test)squared term significantly contributes to the predictive power of the model.
24 If you add a linear and squared term (e. g If you add a linear and squared term (e.g., age and age2) to a model and neither is independently statistically significantThis does not necessarily mean that age is not significantly related to the outcome Why?What we want to know is whether age and age2 jointly improve the predictive power of the model. How can we test this?
25 Check whether value is above critical value in the F-distribution FormulaF =(SSRr - SSRur)/qSSRur/(n-(k+1)q = # of variables being testedn = number of casesk = number of IVs in unrestrictedCheck whether value is above critical value in the F-distribution[depends on degrees of freedom: Numerator = number of IVs being tested; Denominator = N-(number of IVs)-1 ]
26 Don’t worry about the F-test formula The point is:F-tests are a way to test whether adding a set of variables reduces the sum of squared residuals enough to justify throwing these new variables into the modelDepends on:How much sum of squared residuals is reducedHow many variables we’re addingHow many cases we have to work withMore “acceptable” to add variables if you have a lot of casesIntuition: explaining 10 cases with 10 variables v. explaining 1000 cases with 10 variables?
27 TVs and Infant Mortality Squared term or logarithm?Coef.SETPTVs per capita29.9490.000TVs per capita (squared)51.6297.960Constant90.1973.35326.900
28 Which is “better”?Two basic ways to decide:TheoryWhich yields a better fit?
29 Run two models and compare R-squared… or possibly… Coef.SETPTVs per capita74.056-0.4100.683TVs per capita (squared)63.41381.6520.7800.439TVs per capita (logged)5.155-4.7800.000Constant-9.46520.417-0.4600.644What might we conclude from these model estimates?Probably should also do an F-test of joint significance of TVs per capita and TVs per capita-squared. Why?That F-test returned a significance level of So we can conclude that…Ultimately you’re best off relying on theory about the shape of the relationship
30 Ordered IVs Indicators Sometimes we have reason to expect the relationship between an IV and outcome to be more complexCan address this using more polynomials (e.g., variable3, variable4, etc)We won’t go there… instead…Example: Party identification and evaluations of candidates and issues
31 Standard “branching” PID Items Generally speaking, do you usually think of yourself as a Republican, a Democrat, an Independent, or something else?If Republican or Democrat ask: Would you call yourself a strong (Republican/Democrat) or a not very strong (Republican/Democrat)?If Independent or something else ask: Do you think of yourself as closer to the Republican or Democratic party?
32 Party Identification Measure People who say Democrat or Republicanin response to first questionStrong RepublicanWeak RepublicanLeanRepublicanIndependentDemocratWeak DemocratStrong Democrat-3-2-1123Question: Is the change from -2 to -1 (or 1 to 2)the same as the change from 0 to 1 or 2 to 3?
33 Party Identification (-3 to 3) Create IndicatorsParty Identification (-3 to 3)Seven Variables:Strong Republican (1=yes)Weak Republican (1=yes)Lean Republican (1=yes)Pure Independent (1=yes)Lean Democrat (1=yes)Weak Democrat (1=yes)Strong Democrat (1=yes)
37 DV: Obama Favorability Coef.SETPStrong Republican-1.6520.1610.000Weak Republican-0.7040.197-3.580Lean Republican-1.2290.181-6.790Lean Democrat0.6540.1953.3400.001Weak Democrat0.4570.1872.4400.015Strong Democrat0.5790.1583.650Gender (female=1)0.0720.0870.8300.405Age-0.0410.019-2.1400.033Age20.0440.0182.430Constant3.7840.5097.430Predicted value for Pure Independent Male, age 20?Remember!: Always interpret these coefficients as the estimated relationships holding other variables in the model constant (or controlling for the other variables)
38 Notes and Next Time Homework due next Thursday (11/18) Next homework handed out next TuesdayNot due until Tuesday after Fall BreakNext time:Dealing with situations where you expect the relationship between an IV and a DV to depend on the value of another IV