Creating synthetic sub-regional baseline populations Dr Paul Williamson Dept. of Geography University of Liverpool Collaborators: Robert Tanton (NATSEM, Australia) Ludi Simpson (CCSR, UK) Maja Zaloznik (Liverpool, UK)
1. Context a)What do we want?? Local area microdata containing local-area distributions [eg. smoking by income by sub-region] b) What have we got?
Large-scale survey Over-exaggerate problem? 2% sample Minimally multivariate Not based on minorities (e.g. unemployed ethnic minority) Min. geog. threshold: 120k Decadal
Solution Survey distribution [smoking x income] Local smoking distribution Local income distribution Reweight survey data......BUT weighting DOWN instead of up Synthetic microdata
2. IPF (Raking) N.B. IPF = Raking = IPF Understanding IPF… Q. What is IPF/Raking doing? A. Preserving the Odds ratios...
CAVEAT: variation independence
TARGET: MaleFemale 55 Young2 Old8 Guided incremental weight adjustment 3. Combinatorial Optimisation
4. IPF/Raking v CO
Target: age x sex x tenure x economic position (64 counts) at district level (17 districts) % NFC (17 district average) % SAR IPF U IPF N CO Comparison for margin-constrained tables
Simpson & Tranmer (2005) Target: Car ownership (2) x Tenure (3) (6 counts; 3%s) for residents at ward level
5. GREGWT
Understanding GREGWT…
6. GREGWT v CO
Measure of fitGREGWTCO (min. R Z 2 ) OTAE OTAE/HH0.1 OTAPE OR Z Measure of fitGREGWTCO (min. R Z 2 ) OTAE OTAE/HH OTAPE OR Z Fit to constraint variables (74 counts): GREGWT ‘convergent’ SLAs in NSW: Fit to constraint variables (74 counts): GREGWT ‘NON-convergent’ SLAs in NSW
Fit to margin-constrained distribution (household income x mortgage/rent): GREGWT ‘convergent’ SLAs State ABS (Census)GREGWT CO (min. R Z 2 ) Unaffordable households (n) Aust. Capital Territory5,5266,1475,924 New South Wales169,823194,394191,720 Combined175,349200,541197,644 Unaffordable households (%) Aust. Capital Territory New South Wales Combined
7. Variation idependence (again...)
1: Age x Sex 2: Marital status 3: Country of Birth 4: Ethnicity 5: Religion 6: Health 7: Unpaid care 8: Long-term illness 9: Migration 10: Qualifications 11: Time since last worked 12: Economic activity 13: NS-SEC 14: NS-SEC of Ref. Person 15: Distance to work 16: Mode of travel to work 17: Hours worked 18: Accommodation type 19: Tenure 20: Family type 21: No of people in h/hold 22: Comm. Estab. Res. status 23: Communal estab. type 24: Persons per room 25: Household amenities 26: Occupancy rating 27: Floor level 28: Cars in household 29: Households UNIVARIATE constraints (158 constrained counts)
BIVARIATE constraints (586 constrained counts)
% of non-fitting synthetic combinations PARTIALLY CONSTRAINED DISTRIBUTIONS DistributionRural (South West) ‘Middling England’ (East Midlands) Deprived industrial (North) Deprived urban (Outer London) SEG / Household composition0000 SEG / Rooms Household composition / Dependants 0000 Dependants / Tenure0000 Sex / marital status / tenure Illness / sex01.500
% of non-fitting synthetic combinations UNCONSTRAINED DISTRIBUTIONS DistributionRural (South West) ‘Middling England’ (East Midlands) Deprived industrial (North) Deprived urban (Outer London) Economic activity / age /sex0000 Migration / age Cars / adults Headship / age / sex / marital status 1000 Ethnic group / country of birth Qualifications / age / sex
8. Conclusion (a) Accuracy of estimates (fitness for purpose?) (b) Unanswered questions (c) Applications in the real world… Survey data [District-level socio-demographics] Local socio-economics Estimated GP Patient socio-economic characteristics GP Patient age, sex, location Estimated HE Student socio-economic characteristics HE Student age, sex, location