Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression in SPSS GV917. Multiple Regression Multiple Regression involves more than one predictor variable. For example in the turnout model.

Similar presentations


Presentation on theme: "Multiple Regression in SPSS GV917. Multiple Regression Multiple Regression involves more than one predictor variable. For example in the turnout model."— Presentation transcript:

1 Multiple Regression in SPSS GV917

2 Multiple Regression Multiple Regression involves more than one predictor variable. For example in the turnout model Y i = a + b 1 X i1 + b 2 X i2 + e i If Ŷ = a + b 1 X i1 + b 2 X i2 Then Y i – Ŷ = e i Where Y i is the observed value of Reported Turnout X i1 is the observed value of Actual Turnout X i2 is the Effective Number of Parties Index a is the intercept and b j are the slope coefficients of the relationship between Reported and Actual Turnout and Reported Turnout and Electoral Distortion Ŷ is the predicted value of Reported Turnout from the linear relationship with Actual Turnout and Electoral Distortion e i is the residual or error term

3 Add an Effective Number of Parties Index to the Turnout Model This measure was devised by Laakso and Taagepera (Comparative Political Studies 1979). It is designed to summarize the degree of fragmentation of the party system in a country. It is defined as: 1 ------ Σ (P v ) 2 Where Pv is each party’s proportion of the total vote

4 Two Examples Suppose there is a two party system in a country and the votes are shared 60% to 40%. This is not a fragmented system so that: 1 1 ------ = ------------------- = 1.92 Σ (P v ) 2 (0.60) 2 + (0.40) 2 Intuitively this means that the party system contains 1.92 ‘equally sized’ parties. But suppose in the country next door the vote is divided among four parties as follows: 35%, 30%, 20%, 15%. This is much more fragmented: 1 1 ------ = -------------------------------------------- = 3.64 Σ (P v ) 2 (0.35) 2 + (0.30) 2 + (0.20) 2 + (0.15) 2 In this case there are 3.64 ‘equally sized’ parties.

5 CountryReported Turnout Actual TurnoutEffective No Parties Austria 80.88 84.30 3.02 Belgium 78.71 90.60 8.84 Switzerland 54.14 43.20 5.87 Czech Republic 63.43 57.90 4.82 Germany 77.89 79.10 4.09 Denmark 88.33 87.10 4.69 Spain 71.40 68.70 3.12 Finland 71.43 65.30 6.03 France 62.84 60.30 5.22 Britain 67.18 59.40 3.33 Greece 83.37 75.00 2.64 Hungary 78.59 73.50 2.94 Ireland 75.57 62.60 4.13 Israel 71.15 67.80 7.05 Italy 84.28 81.40 6.32 Luxembourg 80.99 79.10 4.71 Netherlands 81.03 75.00 6.04 Norwary 61.42 46.20 6.19 Poland 68.71 62.80 4.50 Portugal 81.38 80.10 3.03 Slovenia 74.10 70.40 5.15

6 Reported Turnout Regression with Two Predictors

7 Why this effect? Note that the fragmentation of parties tends to reduce reported turnout. This effect has been attributed to information processing costs. If the average citizen has to make choices among a lot of alternatives before voting, this raises the costs of voting and it has the effect of reducing turnout The parties effect is independent of the actual turnout effect – since in multiple regression we identify the effects of one predictor controlling for all other predictors.

8 In the Turnout model we are fitting a regression plane to a Three Dimensional Scattergram

9 How Does Controlling Work? Step One: Regress the Effective Number of Parties on Reported Turnout: Y i = a + b 1 X i2 + v i Note that the v i represents the variation in Reported Turnout NOT accounted for by the Effective Number of Parties. We have removed the number of parties as an influence on reported turnout. Step Two: Regress the Effective Number of Parties on Actual Turnout X i1 = a + b 2 X i2 + u i Thus u i represents the variation in Actual Turnout NOT accounted for by the Effective Number of Parties. We have removed the number of parties as an influence on Actual Turnout

10 Controlling in Multiple Regression Step Three: In the Multiple Regression Model Y i = a + b 1 X i1 + b 2 X i2 + e i b 1 or the effect of actual turnout on reported turnout can be found by regressing the residuals v i on the residuals u i because both are independent of the Effective Number of Parties. This is in effect what multiple regression does. Actual Turnout Effective Number of Parties Reported Turnout

11 Controlling in Regression In this model we are regressing the residuals of the Effective Number of Parties (v i ) on the residuals of the Actual Number of Parties (u i ). This produces the same regression coefficient (0.636) as in the earlier multivariate model

12 Another Look at ANOVA and the F test in Multiple Regression The F test compares the Mean Square with the Residual Mean Square. If it has a high value then the regression explains a lot more variation than is left unexplained. If it has a low value then the regression explains very little variation The theoretical F distribution measures the probability that the F statistics will take on a particular value if the Null Hypothesis (the regression explains nothing) is correct

13 F Test in Multiple Regression Mean Square = Regression Sum of Squares 1330.07 _________________ = ______ = 665.04 Degrees of Freedom 2 Residual Mean Square = Residual Sum of Squares = 214.31 ____________________ _____ = 11.91 Degrees of Freedom 18 F = Mean Square/ Residual Mean Square = 665.03 / 11.91 = 55.86

14 What are Degrees of Freedom? – They are useable bits of information Total: If we had one observation we could not say anything about the total variation – we need more than one case. This is why the degrees of freedom or usable bits of information is n-1 or 20 (given 21 cases). Residual: If we had two observations we could fit the regression line in a bivariate model since the shortest distance between two points is a straight line, but there would be no residuals since the line would fit perfectly. In a three variable model we would need three observations to fit the regression line since it is a three dimensional space. So to define residuals we need n-3 degrees of freedom or 18 degrees of freedom Since the Total Variation = Explained Variation + Residual Variation Then Explained Variation = Total Variation – Residual Variation Explained Variation = (N-1) – (N-3) = 2 Degrees of freedom

15 The F test F = Mean Square/ Residual Mean Square is an F distribution. If we start by assuming that the regression explains nothing then the F ratio will not be zero, because by chance we might get a small positive value The F distribution maps the probability that a ratio of a given size will occur if the regression actually explains nothing The larger the value of F, the smaller the likelihood that it will occur by chance if the regression explains nothing. In this case an F of 55.86 occurring due to chance is much smaller than 0.05, so we can say that the F statistic is significant at the 0.05 level.

16 The F Distribution – (named after Ronald Fisher)

17 Another Model – Explaining Happiness in the ESS 2002 Dataset happy How happy are you FrequencyPercentValid Percent Cumulative Percent Valid0 Extremely unhappy 247.6 1 238.6 1.2 2 4501.1 2.2 3 9432.2 4.5 4 11492.7 7.2 5 41289.79.817.0 6 33497.9 24.9 7 716916.917.041.9 8 1185928.028.170.1 9 755517.817.988.0 10 Extremely happy 506912.0 100.0 Total 4215799.5100.0 Missing77 Refusal 29.1 88 Don't know 118.3 99 No answer 54.1 Total 201.5 Total 42358100.0

18 Income Scale in the European Social Survey 2002 hinctnt Household's total net income, all sources FrequencyPercentValid Percent Cumulative Percent Valid1 J 7131.72.1 2 R 17524.15.37.4 3 C 27626.58.315.7 4 M 472211.114.229.9 5 F 473611.214.244.2 6 S 41139.712.456.5 7 K 37388.811.267.8 8 P 31367.49.477.2 9 D 471911.114.291.4 10 H 19784.75.997.4 11 U 5541.31.799.0 12 N 326.81.0100.0 Total 3324878.5100.0 Missing77 Refusal 487611.5 88 Don't know 35738.4 99 No answer 6601.6 Total 911021.5 Total 42358100.0

19 Does Money Buy Happiness? ModelRR SquareAdjusted R SquareStd. Error of the Estimate 1.271 a.073 1.857 a. Predictors: (Constant), income ANOVA b ModelSum of SquaresdfMean SquareFSig. 1Regression 9043.5391 2621.315.000 a Residual 114347.222331443.450 Total 123390.76133145 a. Predictors: (Constant), income b. Dependent Variable: happy How happy are you Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. BStd. ErrorBeta 1(Constant) 6.150.027228.961.000 income.208.004.27151.199.000 a. Dependent Variable: happy How happy are you

20 Is the Specification Correct? Perhaps we should use a Quadratic Version of the Income Variable *Calculating Quadratic Functions in the ESS 2002. Compute income = hinctnt. compute incomsq = hinctnt*hinctnt. Where incomsq is the square of the hinctnt (household income) variable. If we use incomsq in the model in addition to income this captures a non-linear relationship between income and happiness – more income increases happiness but at a declining rate of change

21 Regression of Income on Happiness in the ESS 2002 – Does Money Buy Happiness?

22 Quadratic Relationship Between Two Variables

23 Suppose we want to use Occupational Status as a predictor in the Happiness model – we would have to create this variable This is done with the assistance of the variable ISCOCO. This is a classification of the many occupations which exist in Europe. For example: iscoco Occupation 100 Armed forces 1100 Legislators and senior officials 1110 Legislators, senior government officials 1140 Senior officials of special-interest org 1141 Senior officials of political-party org 1142 Senior officials of economic-interest org To put this in a form which is useable in the regression model we recode it as follows: recode iscoco (2000 thru 2470=6)(1000 thru 1319=5)(3000 thru 3480=4)(4000 thru 4223=3)(5000 thru 8340=2)(9000 thru 9330=1)(else=sysmis) into occup. value labels occup 1 'unskilled or semi-skilled manual workers' 2 'skilled manual workers' 3 'white collar clerical & administrative workers' 4 'white collar technical workers' 5 'middle managers' 6 'professionals and senior managers'.

24 The Recoded Occupational Status Variable in the ESS 2002 Data

25 Suppose we want to add a gender variable – to see if women are happier than men If statements can be used to create new variables in SPSS. These are recodes which are carried out if certain conditions are met. For example: compute female=0. (creates a new variable consisting only of zeroes) if (gndr eq 2) female=1.(changes this new variable to a score of 1 if the existing variable gndr has a score of 2)

26 If Statements in SPSS – gndr and Female

27 Revised Happiness Model ANOVA b ModelSum of SquaresdfMean SquareFSig. 1Regression 9393.55742348.389700.984.000 a Residual 95396.295284753.350 Total 104789.85228479 a. Predictors: (Constant), incomsq, female, occup, income b. Dependent Variable: happy How happy are you Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. BStd. ErrorBeta 1(Constant) 5.050.06380.658.000 female.090.022.0244.160.000 occup.035.007.0294.818.000 income.565.020.74127.645.000 incomsq -.029.002-.481-17.942.000 a. Dependent Variable: happy How happy are you Model Summary ModelRR SquareAdjusted R SquareStd. Error of the Estimate 1.299 a.090 1.830 a. Predictors: (Constant), incomsq, female, occup, income

28 Conclusions Multiple Regression is a relatively simple extension of Two variable regression Unlike two variable regression in multiple regression we are controlling for the influence of additional variables when examining the relationship between the independent variable and the dependent variable – it is a bit like a statistical experiment The great majority of social science models are multivariate models and so commonly we used multiple regression


Download ppt "Multiple Regression in SPSS GV917. Multiple Regression Multiple Regression involves more than one predictor variable. For example in the turnout model."

Similar presentations


Ads by Google