Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.

Similar presentations


Presentation on theme: "Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic."— Presentation transcript:

1 Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic effects –example: earnings and age –plotting F-tests comparing models Example from Sociology of Religion 1

2 Review: Regression with Dummy Variables 2 Create dummy variables for age: why? age is an interval variable, what advantage is there to creating a series of dummies? gen byte age25=0 if age<. /* new variable, age25, will be missing if age is missing */ replace age25=1 if age>=25 & age<=29 gen byte age30=0 if age<. replace age30=1 if age>=30 & age<=34 gen byte age35=0 if age<. replace age35=1 if age>=35 & age<=39 gen byte age40=0 if age<. replace age40=1 if age>=40 & age<=44 gen byte age45=0 if age<. replace age45=1 if age>=45 & age<=49 gen byte age50=0 if age<. replace age50=1 if age>=50 & age<=55 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age25-age50) tab agecheck, missing

3 Stata Shortcut for Dummy Variables 3 gen byte agecat= floor(age/5)*5 tab agecat, gen(age) * floor function deletes decimal places: * e.g., at age 23: floor(23/5)*5 = floor(4.6)*5 = 4*5 = 20 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age1-age6) tab agecheck, missing drop if age 54

4 Regression with Age Dummy Variables 4. regress conrinc age2-age6 if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 5, 719) = 12.79 Model | 3.8044e+10 5 7.6089e+09 Prob > F = 0.0000 Residual | 4.2773e+11 719 594895739 R-squared = 0.0817 -------------+------------------------------ Adj R-squared = 0.0753 Total | 4.6577e+11 724 643334846 Root MSE = 24390 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age2 | 8220.236 3143.413 2.62 0.009 2048.872 14391.6 age3 | 16495.6 3122.571 5.28 0.000 10365.16 22626.05 age4 | 17274.8 3112.55 5.55 0.000 11164.03 23385.57 age5 | 21532.53 3288.812 6.55 0.000 15075.7 27989.35 age6 | 20013.57 3406.607 5.87 0.000 13325.48 26701.66 _cons | 26954.2 2325.541 11.59 0.000 22388.54 31519.86 ------------------------------------------------------------------------------ Same R-squared and overall F, but different b’s and t’s (although same relative order):. regress conrinc age1-age5 if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 5, 719) = 12.79 Model | 3.8044e+10 5 7.6089e+09 Prob > F = 0.0000 Residual | 4.2773e+11 719 594895739 R-squared = 0.0817 -------------+------------------------------ Adj R-squared = 0.0753 Total | 4.6577e+11 724 643334846 Root MSE = 24390 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age1 | -20013.57 3406.607 -5.87 0.000 -26701.66 -13325.48 age2 | -11793.33 3266.455 -3.61 0.000 -18206.26 -5380.405 age3 | -3517.968 3246.403 -1.08 0.279 -9891.531 2855.595 age4 | -2738.771 3236.766 -0.85 0.398 -9093.413 3615.872 age5 | 1518.956 3406.607 0.45 0.656 -5169.13 8207.043 _cons | 46967.77 2489.343 18.87 0.000 42080.52 51855.02 ------------------------------------------------------------------------------

5 Plot Earnings by Age 5. tab age, sum(conrinc) | Summary of respondent income in age of | constant dollars respondent | Mean Std. Dev. Freq. ------------+------------------------------------ 25 | 16277.936 10757.323 47 26 | 22712.5 12540.689 46 27 | 21188.725 11802.539 40 28 | 25593.444 18395.24 54 29 | 27021.244 17314.169 45 30 | 29687.902 16242.466 61 31 | 30723.709 21631.857 55 32 | 30218.871 19739.067 62 33 | 26096.263 15751.154 57 34 | 30685.51 20528 51 35 | 37709.106 26704.259 47 36 | 29178.255 21877.287 51 37 | 33702.843 20378.26 70 38 | 39046.871 30994.531 62 39 | 40338.326 29449.024 43 40 | 35442.909 23448.711 55 41 | 38218.979 31804.641 48 42 | 34377.678 26582.113 59 43 | 37867.069 25189.647 58 44 | 34885.268 23017.34 41 45 | 35212.378 20559.449 45 46 | 41641.308 28233.297 39 47 | 39708.14 29503.584 50 48 | 41391.807 26493.252 57 49 | 38324.964 23601.741 55 50 | 42443.892 29193.688 37 51 | 37255.357 25395.935 42 52 | 35165.655 20471.181 29 53 | 44005.892 30812.439 37 54 | 36918.065 26556.129 31 ------------+------------------------------------ Total | 33571.775 24047.119 1474

6 Regression Test for Curvilinearity 6 test whether x has a curvilinear relationship with y: testing for a quadratic relationship is the most common, but not the only method of testing for curvilinearity. y i = β 0 + β 1 x i + β 2 x i 2 + e i test whether β 2 ≠ 0 o if β 2 > 0, then U-shape curve (or part) o if β 2 < 0, then inverted-U curve (or part) o if β 2 !> 0 & β 2 !< 0, then revert to linear equation by dropping x 2 β 1 is rather irrelevant in this test o if p(β 2 ≠ 0)>.05 and p(β 1 ≠ 0)>.05, that does not mean there is no linear relationship.

7 Curvilinear Regression Equation: β 2 7 y i = β 0 + β 1 x i + β 2 x i 2 + e i β 2 (quadratic coefficient) determines how steeply the curve accelerates: y = 2x 2 ; y = x 2 ; y =.5 x 2

8 Curvilinear Regression Equation: β 2 < 0 8 y i = β 0 + β 1 x i + β 2 x i 2 + e i β 2 (quadratic coefficient) < 0 then curve is inverted-U y = -2x 2 ; y = -x 2 ; y = -.5 x 2

9 Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum 9 y i = β 0 + β 1 x i + β 2 x i 2 + e i inflexion point = value of x when y is a maximum or minimum = - β 1 / 2β 2 y = -20x 2 + 800x + 62000 inflexion= -800 / (-20 * 2) = 20 (i.e., below observed x values) y = -100x 2 + 8000x – 90000 inflexion = -8000 / (-100 *2) = 40 (i.e., within the x range) y = -20x 2 + 2400x + 800 inflexion = -2400 / (-20 * 2) = 60 (i.e., above observed values)

10 Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum 10 y i = β 0 + β 1 x i + β 2 x i 2 + e i for completeness, when β 2 is positive: inflexion point = value of x when y is a maximum or minimum = - β 1 / 2β y = 20x 2 - 800x + 50000 inflexion= --800 / (20 * 2) = 20 (i.e., below observed x values) y = 100x 2 - 8000x + 205000 inflexion = -8000 / (-100 *2) = 40 (i.e., within the x range) y = 20x 2 - 2400x + 114000 inflexion = -2400 / (-20 * 2) = 60 (i.e., above observed values)

11 Example: Regression with Curvilinear Age 11. gen int agesq=age*age. summarize age agesq Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 1860 38.84355 8.309941 25 54 agesq | 1860 1577.839 655.309 625 2916. regress conrinc age agesq if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 2, 722) = 32.08 Model | 3.8016e+10 2 1.9008e+10 Prob > F = 0.0000 Residual | 4.2776e+11 722 592463841 R-squared = 0.0816 -------------+------------------------------ Adj R-squared = 0.0791 Total | 4.6577e+11 724 643334846 Root MSE = 24341 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 4764.733 1134.778 4.20 0.000 2536.875 6992.591 agesq | -50.27083 14.30126 -3.52 0.000 -78.34785 -22.19381 _cons | -65221.92 21786.08 -2.99 0.003 -107993.6 -22450.29 ------------------------------------------------------------------------------ t agesq = -3.52; p <.001, so: curvilinear; b agesq = negative, so: inverted U; inflexion point = -b age / (2 * b agesq) ) = - 4764.7 / (2 * -50.27) = 47.4 so maximum earnings at age 47 and a half.

12 Cubic Polynomials 12 Occasionally (actually, rarely), it is worthwhile to investigate whether a more complex polynomial would better describe the curvilinear relationship. Add a cubic term (x 3 ) to the previous quadratic equation: y i = β 0 + β 1 x i + β 2 x i 2 + β 3 x i 3 + e i Test β 3 ≠ 0 o if you can’t show β 3 ≠ 0, then revert to quadratic model o if p(β 3 ≠ 0) >.05, then don’t interpret β 2 and β 1 if β 3 ≠ 0, then curve has at least two bends (although not necessarily over the range of observed x’s)

13 Cubic Polynomials: Earnings and Age Example. regress conrinc age agesq agecu if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 3, 721) = 21.36 Model | 3.8020e+10 3 1.2673e+10 Prob > F = 0.0000 Residual | 4.2775e+11 721 593278929 R-squared = 0.0816 -------------+------------------------------ Adj R-squared = 0.0778 Total | 4.6577e+11 724 643334846 Root MSE = 24357 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 3971.837 8901.06 0.45 0.656 -13503.26 21446.93 agesq | -29.64795 230.0667 -0.13 0.897 -481.3286 422.0327 agecu | -.1739568 1.936886 -0.09 0.928 -3.976566 3.628653 _cons | -55354.68 112007 -0.49 0.621 -275253.4 164544.1 ------------------------------------------------------------------------------ Note: after age cubed in entered, none of the coefficients are statistically significant (even though age and age squared were in the quadratic model). So, since β agecubed is not statistically significant, revert to the quadratic model (DON’T conclude that age has no relationship with earnings!) 13

14 Cubic Polynomials: Actual Results 14

15 Inferences: F-tests Comparing models 15 Comparing Regression Models, Agresti & Finlay, p 409: Where: R c 2 = R-square for complete model, R r 2 = R-square for reduced model, k = number of explanatory variables in complete model, g = number of explanatory variables in reduced model, and N = number of cases.

16 Example: F-tests Comparing models 16 Complete model: men’s earnings on age, age square, age cubed, education, and currently married dummy. Reduced model: men’s earnings on education and currently married dummy. F-test comparing model is whether age variables, as a group, have a significant relationship with earnings after controls for education and marital status

17 Example: F-tests Comparing models 17 Complete model: men’s earnings. regress conrinc age agesq agecu educ married if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 5, 719) = 45.08 Model | 1.1116e+11 5 2.2233e+10 Prob > F = 0.0000 Residual | 3.5461e+11 719 493199914 R-squared = 0.2387 -------------+------------------------------ Adj R-squared = 0.2334 Total | 4.6577e+11 724 643334846 Root MSE = 22208 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 5627.049 8127.377 0.69 0.489 -10329.18 21583.27 agesq | -75.30909 210.0421 -0.36 0.720 -487.6781 337.0599 agecu |.1985975 1.768176 0.11 0.911 -3.272807 3.670003 educ | 3555.331 317.9738 11.18 0.000 2931.063 4179.599 married | 8664.627 1690.098 5.13 0.000 5346.51 11982.74 _cons | -127148.4 102508.3 -1.24 0.215 -328399.8 74103.01 ------------------------------------------------------------------------------ Note: none of the three age coefficients are, by themselves, statistically significant. R c 2 =.2387; k = 5.

18 Example: F-tests Comparing models 18 Reduced model: men’s earnings. regress conrinc educ married if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 2, 722) = 80.20 Model | 8.4666e+10 2 4.2333e+10 Prob > F = 0.0000 Residual | 3.8111e+11 722 527850916 R-squared = 0.1818 -------------+------------------------------ Adj R-squared = 0.1795 Total | 4.6577e+11 724 643334846 Root MSE = 22975 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | 3650.611 328.1065 11.13 0.000 3006.454 4294.767 married | 10721.42 1716.517 6.25 0.000 7351.457 14091.38 _cons | -16381.3 4796.807 -3.42 0.001 -25798.65 -6963.944 ------------------------------------------------------------------------------ R r 2 =.1818; g = 2.

19 Inferences: F-tests Comparing models 19 F = ( 0.2387 – 0.1818) / (5 – 2)df 1 =5-2; df 1 =725-6 ( 1 -.2387) / (725 – 6) = 0.0569/3 0.7613/719 = 26.87, df=(3,719), p <.001 (Agresti & Finlay, table D, page 673)

20 Next: Regression with Interaction Effects 20 Examples with earnings: married x gender age x gender age x education marital status x gender


Download ppt "Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic."

Similar presentations


Ads by Google