Statistical Analysis SC504/HS927 Spring Term 2008 Session 6: Week 22: 29 th February OLS (3): multiple regression and dummy coding

2 Example from Suicide data:

3 E.g.: Using the data set alcohol

4 Hierarchical Regression

5 ‘Dummy’ variables variables given the values 1 or 0 typically to indicate ‘yes’ or ‘no’ e.g. 1=‘female’ 0=‘not female’ (i.e. male) Dummy coding will change the value of the constant (intercept) but NOT the gradient (b) NB: create one dummy variable even though there are two possible genders

6 Working out Predicted Y Value You need to work out 2 regression equations If Females = 1, then: Y = a + b 1 (age) + b 2 (1) If Males = 0, then: Y = a + b 1 (age) + b 2 (0)

7 dummy variables with multiple categories suppose you want to investigate the effect of housing tenure you have a variable coded: 1=owns outright 2=owns with a mortgage 3=part owns, part rents 4=rents 5=rent free NB: the coding is arbitrary. You could have 5=owns outright. 3= owns with a mortgage, 1=rents, 2=rent free, 4=part owns, part rents

8 Multiple categories (cont) If the categorical variable has z categories, create z-1 dummy variables e.g. d 1 =1 if owns with a mortgage, 0 otherwise d 2 =1 if part owns, part rents, 0 otherwise d 3 =1 if rents, 0 otherwise d 4 =1 if rent free, 0 otherwise The omitted category is known as the ‘reference’ or ‘baseline’ category each case will have a maximum of one dummy coded 1, outright owners will have them all coded 0

9 Choice of reference category Any category can be the reference Choose for ease of and meaningful interpretation e.g. the ‘norm’ or most common category the majority ethnic group in a study of the consequences of being from a minority ethnic group unemployed people if you are studying the effects of unemployment

