Presentation on theme: "Www.ioe.ac.uk/bedfordgroup Analysing Variability Between Neighbourhoods By Exploiting Survey Design Features Paper for What is Multilevel Modelling? session."— Presentation transcript:
www.ioe.ac.uk/bedfordgroup Analysing Variability Between Neighbourhoods By Exploiting Survey Design Features Paper for What is Multilevel Modelling? session Research Methods Festival, Oxford, 2 July 2004. Ian Plewis Institute of Education, University of London email@example.com
Sample surveys with a clustered design tend to be more efficient than surveys using simple random samples. Clustering does, however, introduce complexities in the analysis because cases within a cluster are more similar, on average, than cases in different clusters. The degree of similarity is represented by the intra-cluster (or intra-class) correlation. We can adjust standard errors to allow for clustering within a number of statistical packages.
However, the clustering might be informative in the sense that the clusters represent neighbourhoods (or institutions) that could exert an independent or contextual effect on a social or developmental process. In other words, clustering is not necessarily a statistical nuisance. Rather it can be exploited to throw more light on social processes.
The Millennium Cohort Study population is a population of children defined as: all children born between 1 September 2000 and 31August 2001(for England and Wales), and between 23 November 2000 and 11 January 2002 (for Scotland and Northern Ireland), alive and living in the UK at age nine months, and eligible to receive Child Benefit at that age; and, after nine months: for as long as they remain living in the UK at the time of sampling. The MCS Population
All children living in the selected wards: ENGLAND: Advantaged110 ENGLAND: Disadvantaged71 ENGLAND: Ethnic19 WALES: Advantaged23 WALES: Disadvantaged50 SCOTLAND: Advantaged32 SCOTLAND: Disadvantaged30 N.IRELAND: Advantaged23 N.IRELAND: Disadvantaged40 TOTAL398 MCS Target Sample, Sweep 1
Observed Mean cluster size across the UK is 47 but the range is from 7 to 403.
We can generate a measure of the main respondents perceptions of her neighbourhood from a set of five items about vandalism, pollution etc. This measure can vary from 0 to 15 and, although skewed to the right, will be treated as having a Normal distribution. Example from Sweep 1 of MCS:
We will attempt to explain the variation in this measure initially in terms of individual characteristics using a multiple regression model: Mothers age Number of children Lone parent status Receiving benefits Ethnic group (8 categories) Example from Sweep 1 of MCS:
Figure 1: Within, between, and total regressions. (Snijders, T and Bosker, R (1999), Multilevel Analysis. London: Sage Publications)
Estimates.e. Mothers age0.0590.004 No. of children -0.160.023 Lone parent status -0.440.067 On benefits -0.800.054 Ethnic group: Mixed -0.410.22 Indian 0.440.15 Pakistani 0.990.12 Bangladeshi 1.00.18 Black Caribbean 0.250.19 Black African -0.0920.17 Other 0.280.16 The model also includes dummies for stratum to allow for the unequal probabilities of selection. R 2 = 0.18 Table 1: Multiple Regression Estimates
The multiple regression model ignores ward and we would expect variation between wards for measures of neighbourhoods. We first fit a simple two level model, just including a random intercept to estimate variation between wards (level-two variance) and compare that variation with variation within wards (level-one variance). We can represent the relative strengths of the two sources of variation by the intra-cluster correlation. The estimate is 0.26 which is important and there is, therefore a prima facie case for including ward in any model.
Estimate (MR)s.e.Estimate (MLM)s.e. Mothers age0.0590.0040.0560.004 No. of children-0.160.023-0.170.021 Lone parent status-0.440.067-0.360.063 On benefits-0.800.054-0.720.051 Ethnic group: Mixed-0.410.220.100.21 Indian0.440.150.640.15 Pakistani0.9184.108.40.206 Bangladeshi1.00.181.20.19 Black Caribbean0.250.191.00.19 Black African-0.0920.170.970.17 Other0.280.160.830.16 Between ward variancen.a.1.30.11 Within ward variance8.50.0907.30.078 Table 2: Comparing Estimates from a Multiple Regression and a Two Level Model
We have one external measure at the ward level – the Child Poverty Index (part of the Index of Multiple Deprivation or IMD2000). Does this explain variation between wards in neighbourhood satisfaction?
Table 3: Comparing Estimates from Two Level Models without and with CPI Estimate (MLM)s.e.Estimate (+CPI)s.e. Mothers age 0.0560.0040.0550.004 No. of chn. -0.170.021-0.170.021 Lone parent status -0.360.063-0.350.063 On benefits -0.720.051-0.710.051 Ethnic group: Mixed 0.100.210.090.21 Indian 0.640.150.610.15 Pakistani 220.127.116.11.13 Bangladeshi 18.104.22.168.19 Black Caribbean 1.00.190.980.19 Black African 0.970.170.950.17 Other 0.830.160.810.16 CPI -0.0440.0054 Between ward variance 22.214.171.124.092 Within ward variance 7.30.0787.30.078
If CPI is included in the single level multiple regression model then the estimate is: -0.041 with a much lower standard error of 0.0021
Why did the estimated coefficients for some of the ethnic groups change so much when we move from multiple regression (where the estimate is a function of within and between group estimates) to a multilevel model (where the within and between regressions are assumed to be the same)? Perhaps the ethnic group estimates vary from ward to ward. In other words, perhaps there are random slopes. It would be a little difficult to allow each ethnic group to have its own random slope so instead let us look at a white/non white split.
Table 4: Random Slopes Model Estimate (MLM)s.e. Non White0.750.10 95% Coverage interval-0.39 to 1.9 Between ward variance (intercept)1.30.24 Between ward variance (slope)0.330.15 Correlation: intercept & slope-0.45 Within ward variance7.30.078 There are good reasons to suppose that some of the ward variation in both intercept and slope can be explained by the proportion of white respondents in the ward.
Table 5: Random Slopes Model with Proportion White Estimate (MLM)s.e. Non white1.50.24 Proportion white2.40.69 Non white*proportion white0.34 Between ward variance (intercept)1.30.23 Between ward variance (slope)0.260.14 Correlation: intercept & slope-0.51 Within ward variance7.30.078
Conclusions 1.Multilevel modelling, carefully used, can throw light on complex processes and give us a better understanding of within and between group relations. 2.Our results show that there are differences between white and non white respondents in their perceptions of their neighbourhood. However, the differences between the two groups are more marked in wards with a low proportion of white respondents.