Presentation on theme: "Multilevel modelling short course"— Presentation transcript:
1 Multilevel modelling short course Mark Tranmer, CCSR
2 What is multilevel analysis Many populations have a group structure of some kind: hierarchical or non-hierarchical.For example pupils can be grouped into schoolsIndividuals can be grouped into areas.Pupils can be grouped by school, and by neighbourhood.Suppose we wish to assess area variations in income, possibly with respect to other factors.
3 What is multilevel analysis? If we have district level data we can estimate a district level relationship.E.g. average income and average age in each districtIf we have individual level data we can estimate an individual level relationshipE.g. we can relate a person’s income to a person’s age.
4 What is multilevel analysis? But how do we assess the relationships at the district level and the individual level at the same time?We can do this with a multilevel model.We can fit this kind of model with specialist software such as MLwiN, which we will use today.
5 The ecological fallacy We could assume that an equation we estimate at the district level also occurs at the individual level, that is to make a cross level inferenceBut this is generally not sensible – individuals vary within each district with respect to the variables we wish to relate.Hence we could well make invalid inferences about the relationship at the individual levelThis phenomenon is referred to as ‘the ecological fallacy’.
6 Problems of ignoring population structure If we carry out the analysis at the individual level we do not recognise in our analysis that ‘similar’ individuals that live within small sub areas of our population.That is, ‘clustering’ occursIgnoring this clustering may lead to biased estimates of summary statistics, especially variances, standard deviations and standard errors.Hence we might falsely attribute statistical significance (or non significance) to results if we ignore the clustering.
10 Some substantive multilevel examples Time as a level.Level 2: PersonLevel 1: OccasionMultivariate.Level 2: PupilLevel 1: subject of exam score.
11 TerminologyNesting.Level k-1 units contained in level k units. E.g. classes at level 2 nested in schools at level 3. Classes are the level 2 units, schools are the level 3 units.Cross classification.Non overlapping higher level units – school andneighbourhood at level 2, pupil at level 1.
12 Continuous and Binary Response variables For a continuous response we use a multilevel model that is an extension of the standard multiple regression model – as we will see this morning.For a binary response we use a multilevel model that is an extension of the logistic regression model – as we will see this afternoon.
13 Data requirementsWhat are the data requirements for multilevel modelling?The standard requirements are to have available a dataset that includes indicators of the group to which individual unit belongs.For example information for a sample of pupils that includes an indicator of the school that they attend.Another example is a sample of individuals that includes an indicator of the area in which they live.
14 Fixed effects What about fixed effects analysis? If we had information on pupils that attended three schools, we can carry out a fixed effects analysis to compare the three schools based on these sample data.We would do this by doing an analysis that includes two dummy variables that allow us to compare the schools.We could make inferences from our results about how the three schools compare but we would not want to make wider inferences about ‘all schools’ based on information on only 3 schools.
15 Multilevel modellingFor multilevel modelling we would have information on a ‘reasonable number of higher level units’What is ‘reasonable’? Snijders and Bosker (1999) recommend at least 10 groups. 20 or more is better.We essentially assume we have a representative sample of higher level units in multilevel modelling, so 30 is a good number to have in mind.
16 Multilevel modellingSuppose we had data for pupils based on 30 schools.We could carry out a fixed effects analysis on these data by using 29 dummy variables.Or we could use multilevel modelling which assumes the schools are themselves a sample. Hence we do not need to estimate so many model parameters using multilevel modelling and it is desirable in this situation.Multilevel modelling also takes into account group size in estimation – estimates of residuals for groups with small populations – e.g. a school with 2 pupils – are ‘shrunken’ towards the mean.
17 Theory: Single level models Suppose we have data for 4059 pupils in 65 schools.How could we model the data?Model 1: pupil level model based on the 4059 pupilsVar(yi) = 2
18 Single level modelsModel 2: Or a school level model based on aggregate data for the 65 schools; that is, the school means.
19 Multilevel models: model 3 ‘variance components’ model Var(yij) = 2u+2e = 2i is the pupil subscriptj is the school subscript2u measures variation in schools.2e measures variation in pupils.
20 Intra-‘class’ correlation 2u /2 = the intra class correlation: the proportion of the overall variation in exam score attributable to schools. i.e. how similar are exam scores within schools
21 Random intercepts model Model 4: 2 level model: pupils in schools,with an explanatory variable.
22 Random slopes model Model 5: random slopes Where the ‘random slopes coefficient is:Or alternatively, but equivalently, we can writethe model as:
23 Group level variablesWe can also add group level variables to the model, e.g. the type of school (mixed or single sex), or the percentage of pupils taking free school meals in the school.
24 Binary response variables Many response variables are ‘binary’ ‘0/1’ ‘dichotomous’.E.g. whether or not a person is unemployed or has a limiting long term illness.Risk of unemployment may be associated with personal characteristics and/or where people live. We can use Multilevel logistic models to investigate these issues.
25 Binary response variables Let’s suppose we are looking at the risk of people being unemployed given some demographic characteristics, and also given some information about the area in which they live.We can look at this problem using multilevel logistic regression models
26 Multilevel logistic regression models Model 6: The basic (two level) multilevel model for a binary response is written as follows.where yij takes the value 0 or 1 for each individual i in group j (0=not unemployed, 1=employed),pij is the predicted probability of unemployment for individual i in area j.eij is an individual level error,
27 Multilevel logistic regression models Where 0 is the ‘intercept’ and, 1 to p are the coefficients of the p explanatory variables
28 MLwiN for binary response variables. MLwiN could be used to fit a multilevel model based on the example of unemployment as a response variable and some demographic information as explanatory variables.For this analysis we could use 1991 UK Census data from the Samples of Anonymised records (SAR).The MLwiN procedure for binary response variables is slightly more involved than that for continuous response variables.See chapter 9 of the mlwin user guide
29 SPSS for mutilevel modelling In versions of SPSS >= 11.5 it is now possible to fit models for dependent variables with an interval response.The syntax on the next slide shows how variance components, random intercepts and random intercepts/slopes models can be fitted for a 2-level example - pupils in schools.
30 SPSS for multilevel modelling Random intercepts and slopes (on standlrt) model for pupils inSchools. (normexam is continuous response; standlrt is continuous)Explanatory variable. Syntax is as follows.mixed normexam with standlrt/ print = solution/ fixed standlrt/ random intercept standlrt | subject(school) covtype(UN).[ to access via SPSS menus: analyse > mixed models ]
34 variance components model only mixed normexam/ print = solution/ random intercept | subject(school) covtype(UN).random intercepts model onlymixed normexam with standlrt/ fixed standlrt/ random intercept | subject(school) covtype(UN).
35 Reading list http://www.cmm.bristol.ac.uk Books: Plewis, I (1997) ‘Statistics in Education’. Edward ArnoldSnijders T and Bosker R (1999) ‘An introduction to Basic and Advanced Multilevel modelling. Sage Publications.Goldstein, H (1995) Multilevel statisical models. Edward Arnold.Web:Nb: New version of mlwin 2.10 just released : see website