Presentation on theme: "Www.ioe.ac.uk/bedfordgroup Analysing Variability Between Neighbourhoods By Exploiting Survey Design Features Paper for What is Multilevel Modelling? session."— Presentation transcript:
Analysing Variability Between Neighbourhoods By Exploiting Survey Design Features Paper for What is Multilevel Modelling? session Research Methods Festival, Oxford, 2 July Ian Plewis Institute of Education, University of London
Sample surveys with a clustered design tend to be more efficient than surveys using simple random samples. Clustering does, however, introduce complexities in the analysis because cases within a cluster are more similar, on average, than cases in different clusters. The degree of similarity is represented by the intra-cluster (or intra-class) correlation. We can adjust standard errors to allow for clustering within a number of statistical packages.
However, the clustering might be informative in the sense that the clusters represent neighbourhoods (or institutions) that could exert an independent or contextual effect on a social or developmental process. In other words, clustering is not necessarily a statistical nuisance. Rather it can be exploited to throw more light on social processes.
The Millennium Cohort Study population is a population of children defined as: all children born between 1 September 2000 and 31August 2001(for England and Wales), and between 23 November 2000 and 11 January 2002 (for Scotland and Northern Ireland), alive and living in the UK at age nine months, and eligible to receive Child Benefit at that age; and, after nine months: for as long as they remain living in the UK at the time of sampling. The MCS Population
All children living in the selected wards: ENGLAND: Advantaged110 ENGLAND: Disadvantaged71 ENGLAND: Ethnic19 WALES: Advantaged23 WALES: Disadvantaged50 SCOTLAND: Advantaged32 SCOTLAND: Disadvantaged30 N.IRELAND: Advantaged23 N.IRELAND: Disadvantaged40 TOTAL398 MCS Target Sample, Sweep 1
Observed Mean cluster size across the UK is 47 but the range is from 7 to 403.
We can generate a measure of the main respondents perceptions of her neighbourhood from a set of five items about vandalism, pollution etc. This measure can vary from 0 to 15 and, although skewed to the right, will be treated as having a Normal distribution. Example from Sweep 1 of MCS:
We will attempt to explain the variation in this measure initially in terms of individual characteristics using a multiple regression model: Mothers age Number of children Lone parent status Receiving benefits Ethnic group (8 categories) Example from Sweep 1 of MCS:
Figure 1: Within, between, and total regressions. (Snijders, T and Bosker, R (1999), Multilevel Analysis. London: Sage Publications)
Estimates.e. Mothers age No. of children Lone parent status On benefits Ethnic group: Mixed Indian Pakistani Bangladeshi Black Caribbean Black African Other The model also includes dummies for stratum to allow for the unequal probabilities of selection. R 2 = 0.18 Table 1: Multiple Regression Estimates
The multiple regression model ignores ward and we would expect variation between wards for measures of neighbourhoods. We first fit a simple two level model, just including a random intercept to estimate variation between wards (level-two variance) and compare that variation with variation within wards (level-one variance). We can represent the relative strengths of the two sources of variation by the intra-cluster correlation. The estimate is 0.26 which is important and there is, therefore a prima facie case for including ward in any model.
Estimate (MR)s.e.Estimate (MLM)s.e. Mothers age No. of children Lone parent status On benefits Ethnic group: Mixed Indian Pakistani Bangladeshi Black Caribbean Black African Other Between ward variancen.a Within ward variance Table 2: Comparing Estimates from a Multiple Regression and a Two Level Model
We have one external measure at the ward level – the Child Poverty Index (part of the Index of Multiple Deprivation or IMD2000). Does this explain variation between wards in neighbourhood satisfaction?
Table 3: Comparing Estimates from Two Level Models without and with CPI Estimate (MLM)s.e.Estimate (+CPI)s.e. Mothers age No. of chn Lone parent status On benefits Ethnic group: Mixed Indian Pakistani Bangladeshi Black Caribbean Black African Other CPI Between ward variance Within ward variance
If CPI is included in the single level multiple regression model then the estimate is: with a much lower standard error of
Why did the estimated coefficients for some of the ethnic groups change so much when we move from multiple regression (where the estimate is a function of within and between group estimates) to a multilevel model (where the within and between regressions are assumed to be the same)? Perhaps the ethnic group estimates vary from ward to ward. In other words, perhaps there are random slopes. It would be a little difficult to allow each ethnic group to have its own random slope so instead let us look at a white/non white split.
Table 4: Random Slopes Model Estimate (MLM)s.e. Non White % Coverage interval-0.39 to 1.9 Between ward variance (intercept) Between ward variance (slope) Correlation: intercept & slope-0.45 Within ward variance There are good reasons to suppose that some of the ward variation in both intercept and slope can be explained by the proportion of white respondents in the ward.
Table 5: Random Slopes Model with Proportion White Estimate (MLM)s.e. Non white Proportion white Non white*proportion white0.34 Between ward variance (intercept) Between ward variance (slope) Correlation: intercept & slope-0.51 Within ward variance
Conclusions 1.Multilevel modelling, carefully used, can throw light on complex processes and give us a better understanding of within and between group relations. 2.Our results show that there are differences between white and non white respondents in their perceptions of their neighbourhood. However, the differences between the two groups are more marked in wards with a low proportion of white respondents.