Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nicky Best, Chris Jackson, Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London Studying.

Similar presentations


Presentation on theme: "Nicky Best, Chris Jackson, Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London Studying."— Presentation transcript:

1 Nicky Best, Chris Jackson, Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London http://www.bias-project.org.uk Studying place effects on health by synthesising individual and area-level outcomes using a new class of multilevel models

2 Outline Introduction and motivating example Models for analysing individual and contextual effects  Standard multilevel model  Ecological regression  Hierarchical related regression Concluding remarks

3 A: Introduction and motivating example

4 BIAS project: Overall goals To develop a set of statistical frameworks for combining data from multiple sources To improve our capacity to handle biases inherent in the analysis of observational data. Key statistical tools: Bayesian hierarchical models and ideas from graphical models form the basic building blocks for these developments

5 Example: Socioeconomic predictors of health Question Characterising individual level socio-demographic predictors of limiting long term illness (LLTI) and heart disease Is there evidence of contextual effects? Design Data synthesis using Individual-level survey data: Health Survey for England. Area-level administrative data: Census small-area statistics and Hospital Episode Statistics Methodological issues Sparse individual data per area (0-9 subjects per area) so difficult to estimate contextual effects Can’t separate individual and contextual effects using only aggregate data (ecological bias) Improve power and reduce bias by combining data

6 B: Models for analysing individual and contextual effects

7 Target analysis Individual exposure Aggregate exposure Individual outcome y ij x ij Z i, X i Aggregate exposure YiYiYiYi Aggregate outcome Ecological regression Z i, X i Aggregate outcome Individual exposure Aggregate exposure Individual outcome y ij x ij YiYiYiYi Hierarchical Related Regression (HRR) Z i, X i

8 Multilevel model for individual data  y ij 22 ii  area i person j x ij ZiZi

9 Multilevel model for individual data  y ij 22 ii  area i person j x ij ZiZi logit p ij =  i +  x ij +  Z i y ij ~ Bernoulli(p ij ), person j, area i

10 Multilevel model for individual data  y ij 22 ii  area i person j x ij ZiZi logit p ij =  i +  x ij +  Z i  i ~ Normal(0,   ) y ij ~ Bernoulli(p ij ), person j, area i

11 Multilevel model for individual data  y ij 22 ii  area i person j x ij ZiZi logit p ij =  i +  x ij +  Z i  i ~ Normal(0,   ) Weak priors on  2, ,  y ij ~ Bernoulli(p ij ), person j, area i

12 Multilevel model for individual data  y ij 22 ii  area i person j x ij ZiZi logit p ij =  i +  x ij +  Z i  i ~ Normal(0,   ) Weak priors on  2, ,   = individual-level effects  = contextual effects  i = “unexplained” area effects y ij ~ Bernoulli(p ij ), person j, area i

13 AREA (WARD) DATA Census small area statistics Carstairs deprivation index INDIVIDUAL DATA Health Survey for England Self-reported limiting long term illness Self reported hospitalisation for heart disease age and sex ethnicity social class car access income etc. Data sources Ward codes made available under special license Health outcomes Individual predictors Contextual effect

14 Area deprivation No car Social class IV/V Non white Univariate regression Multiple regression Results from analysis of individual survey data: Heart Disease (n=5226)

15 Results from analysis of individual survey data: Limiting Long Term Illness (n=1155) Area deprivation Female Non white Doubled income Univariate regression Multiple regression

16 Comments CI wide and not significant for most effects Some evidence of contextual effect of area deprivation for both heart disease and LLTI  Adjusting for individual risk factors (compositional effects) appears to explain contextual effect for heart disease  Unclear whether contextual effect remains for LLTI after adjustment for individual factors  Survey data lack power to provide reliable answers about contextual effects What can we learn from aggregate data?

17 AREA (WARD) DATA Census small area statistics Carstairs deprivation index population count by age and sex proportion reporting LLTI proportion non-white proportion in social class IV/V proportion with no car access PayCheck (CACI) mean & variance of household income Hospital Episode Statistics number of admissions for heart disease Area-level data Aggregate health outcomes Aggregate versions of individual predictors Contextual effect

18 Ecological inference exp(b) = odds ratio associated with mean exposure X i Standard ecological model: Y i ~ Binomial(q i, N i ); logit(q i ) = a i + b X i + c Z i Y i is the number of disease cases in area i N i is the population in area i X i is the mean of x ij in area i q i is the area-specific risk of disease This is the group level association. Not necessarily equal to individual-level association i.e. b ≠  → ecological bias

19 Standard ecological regression model YiYi 22 aiai b area i XiXi c ZiZi NiNi

20 Standard ecological regression model YiYi 22 aiai b area i XiXi c ZiZi NiNi logit q i = a i + bX i + cZ i Y i ~ Binomial(q i, N i ), area i

21 Standard ecological regression model YiYi 22 aiai b area i XiXi c ZiZi NiNi Y i ~ Binomial(q i, N i ), area i logit q i = a i + bX i + cZ i a i ~ Normal(0,   ) Y i ~ Binomial(q i, N i ), area i

22 Standard ecological regression model YiYi 22 aiai b area i XiXi c ZiZi NiNi Y i ~ Binomial(q i, N i ), area i logit q i = a i + bX i + cZ i a i ~ Normal(0,   ) Priors on  2, b, c Y i ~ Binomial(q i, N i ), area i

23 Area deprivation No car Social class IV/V Non white Comparison of individual and ecological regressions: Heart Disease Individual Ecological

24 Comparison of individual and ecological regressions: Limiting Long Term Illness Area deprivation Female Non white Doubled income Individual Ecological

25 Ecological bias Bias in ecological studies can be caused by: Confounding  confounders can be area-level (between-area) or individual- level (within-area).  Solution: try to account for confounders in model Non-linear exposure-response relationship, combined with within-area variability of exposure  No bias if exposure is constant in area (contextual effect)  Bias increases as within-area variability increases  …unless models are refined to account for this hidden variability

26 Improving ecological inference Alleviate bias associated with within-area exposure variability. Obtain information on within-area distribution f i (x) of exposures, e.g. from individual-level exposure data. Use this to form well-specified model for ecological data by integrating (averaging) the underlying individual-level model. Y i ~ Binomial(q i, N i ); q i =  p ij (x) f i (x) dx q i is average group-level risk p ij (x) is individual-level risk given covariates x f i (x) is distribution of exposure x within area i (or joint distribution of multiple exposures)

27 Improving ecological inference Suppose we have single binary covariate x Integrated group-level model X i = proportion exposed in area i (mean of x ij ) q i = average risk (prevalence) of disease in area i = ∑ j p ij /N i = e  (1-X i ) + e  X i Individual-level model log p ij =  +  x ij (log link assumed for simplicity) → p ij = e  if person j is unexposed (x ij =0) p ij = e  if person j is exposed (x ij =1) For multiple covariates, need information on joint within-area distribution (not just marginal X’s)

28 Standard ecological regression model YiYi 22 aiai b area i XiXi c ZiZi NiNi Y i ~ Binomial(q i, N i ), area i logit q i = a i + bX i + cZ i a i ~ Normal(0,   ) Priors on  2, b, c

29 Integrated ecological regression model YiYi 22 ii  area i XiXi  ZiZi NiNi Y i ~ Binomial(q i, N i ), area i q i =  p ij (x ij,Z i,  i, ,  )f i (x)dx  i ~ Normal(0,   ) Priors on  2, , 

30 Combining individual and aggregate data Multilevel model for individual data Integrated ecological model YiYi 22 ii  area i XiXi  ZiZi NiNi  y ij 22 ii  area i person j x ij ZiZi

31 Combining individual and aggregate data Hierarchical Related Regression (HRR) model  22 ii  area i person j x ij YiYi XiXi NiNi y ij ZiZi Joint likelihood for y ij and Y i depending on shared parameters , ,  2

32 Extending HRR model to multiple covariates 22 ii area i person j YiYi XiXi NiNi y ij ZiZi x ij  

33 Extending HRR model to multiple covariates 22 ii area i person j x ijQ YiYi X i1 NiNi y ij ZiZi x ij1 X iQ  

34 Extending HRR model to multiple covariates 22 ii area i person j x ijQ YiYi X i1 NiNi y ij ZiZi ii x ij1 X iQ   x dk1 x dkQ person k district d

35 Extending HRR model to multiple covariates Suppose x 1 …x Q are all binary variables R = 2 Q possible combinations  i = [  i1,…,  iR ] where  ir is probability that individual in area i has covariate combination r (r=1,…,R) We estimate  i using Q-way cross-tabulation of covariates in district d (i) from Sample of Anonymised Records (SAR)….. ……with constraint that marginal probabilities for each covariate match observed ward proportions from Census  Assumes within-district correlations are representative of within-ward correlations for all wards in a district

36 AREA (WARD) DATA Census small area statistics PayCheck (CACI) Hospital Episode Statistics aggregate health outcomes aggregate covariates (marginal) INDIVIDUAL DATA Health Survey for England health outcomes and covariates ward code available under special license Combined data Sample of Anoymised Records (SAR) 2% sample of individual data from Census district code available provides estimate of within-area distribution of covariates  assume same distribution for all wards within a district

37 Comparison of results from different regression models: Heart Disease Area deprivation No car Social class IV/V Non white Individual Standard ecological Integrated ecological HRR

38 Comparison of results from different regression models: Limiting Long Term Illness Area deprivation Female Non white Doubled income Individual Standard ecological Integrated ecological HRR

39 Unexplained area variability in risk Random effects account for unexplained differences in risk between areas, after accounting for observed covariates  Large variance  2 → large unexplained differences Median odds ratio (Larsen & Merlo 2005) is a simple transformation of  2 to scale of odds ratio  MOR = exp( √2    1 (0.75) )  MOR = median of the residual odds ratios over all pairs of areas  Directly comparable to odds ratio for an observed covariate

40 Unexplained area variability in risk of Heart Disease Area deprivation No car Social class IV/V MOR Individual HRR Non white

41 Unexplained area variability in risk of LLTI Area deprivation Female Non white MOR Individual HRR Doubled income

42 Comments Integrated ecological model yields odds ratios that are consistent with individual level estimates from survey Large gains in precision achieved by using aggregate data Significant contextual effect of area deprivation for LLTI but not heart disease For LLTI, unexplained area variation is small compared to that explained by deprivation (MOR=1.2, deprivation OR=2.6) For heart disease, there is more unexplained area variation (MOR=1.5)

43 Comments Little difference between estimates based on aggregate data alone and combined individual + aggregate data  Individual sample size very small (~0.1% of population represented by aggregate data) In other applications with larger individual sample sizes and/or less informative aggregate data, combined HRR model yields greater improvements (see simulation study) Care needed to check consistency between data sources

44 Simulation Study Log RR of IHD for smokers True Log RR whites log RR of disease for exposed % exposed: 0-25% (100 areas) % exposed: 0-50% (100 areas) % exposed: 0-100% (100 areas) % exposed: 0-25% (25 areas) Individual data Area data Area data + sample of 10 individuals

45 Health Survey for England aggregated over areas 1991 Census Are aggregate and individual data consistent?

46 LLTI  Health Survey for England: 23%  Census: 13% Similar discrepancies noted by other authors May reflect differences between interview and self-completed surveys Remedy: include fixed offset in regression model for Census data

47 C: Concluding remarks

48 Aggregate data can be used for individual level inference if appropriate integrated model is used  requires large exposure contrasts between areas  requires information on within-area distribution of covariates Combining samples of individual data with administrative data can yield improved inference  improves ability to investigate contextual effects  increase statistical power compared to analysis of survey data alone  requires geographical identifiers for individual data Important to check compatibility of different data sources when combining data Important to explore sensitivity to different model assumptions and data sources

49 Jackson C, Best N and Richardson S. (2008) Studying place effects on health by synthesising individual and area-level outcomes. Social Science and Medicine, to appear. Papers available from www.bias-project.org.uk Thank you for your attention!


Download ppt "Nicky Best, Chris Jackson, Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London Studying."

Similar presentations


Ads by Google