Presentation on theme: "Matching of administrative data to validate the 2011 Census in England and Wales NRS & RSS Edinburgh, October 2012."— Presentation transcript:
Matching of administrative data to validate the 2011 Census in England and Wales NRS & RSS Edinburgh, October 2012
AGENDA Context: 2011 Census quality assurance and the role of administrative data Data matching challenges and solutions Data to be matched Matching methods and interpretation Substantive results so far...
An overview of the methods 5 yr age/sex CCS areas 5 yr age/sex EA /LA level 1 yr age/sex OA level DSE Bias adj Overcount DSE Bias adj Overcount Ratio estimator Nat adj Coverage imputation ProductMethod Supplementary analysis Core checks Main QA Panel High Level QA Panel First Release QA Review and sign-off Quality assurance
Challenges and solutions IssueSolution Matching limited to small QA ‘window’Match selected LAs ahead of QA Some data not available in advanceFlexible data architecture so new sources can be added Research questions only emerge during QA Stratified approach to matching so the methods were tailored to the questions Scale of matching task potentially huge Initially restrict matching to CCS postcode clusters One: many address matchesRevised address data architecture
Data to be matched CensusNon-Census Post-out Address Register NHS Patient Register Address Register History File Higher Education Statistics Agency (HESA) data Census returnsEnglish and Welsh School Censuses ‘Associated Address’ dataElectoral Registers Census Management Information System Valuation Office Agency data
Methods Data cleaning, de-duplication, standardisation, quality analysis Definitional alignment with Census enumeration base Exact matching (dwelling: Address/ person: name, DoB, gender and postcode) Score-based address matching Probabilistic person matching Clerical resolution of candidate pairs from automatch Clerical search for unmatched residuals Resolution of unmatched residuals against the Address Register History file and Census ‘associated addresses’ Evidence-based assessment of residuals
Interpretation: Who is actually present? Non-URsCensus non-usual residents (matched and unmatched to PR) PR records unmatched to Census respondents and assessed as not present Matched to address deactivated in the field Matched to unoccupied or vacant/absent/ 2 nd res dummy Matched to ARHF invalid address UR elsewhere, this is Usual Address 1 Year Ago Matched to Census UR elsewhere UnaccountedUnmatched and unaccounted for PR records unmatched to Census respondents and assessed present PR matched to Census missed/ unaccounted-for address PR matched to address with ‘occupied’ dummy PR validated through other administrative sources PR/ Census confirmed URs PR/ Census matched records Census URs unmatched to PR
Data mining to address specific Census/PR anomalies University Hall of ResidenceGP registrations/Hall capacity
Female students living in halls in April 2011 by NHS Authority acceptance date
Male students living in halls in April 2011 by NHS Authority acceptance date
LA summary: proportion of F4s and proportion unresolved, within CCS postcode clusters
LA summary: concentration of Flag 4s in the PR residual
LA summary: LA types, residual size and Flag 4s
Further investigations Planned analysis of the PR residuals’ addresses and households to identify ‘ghost’ records Longitudinal matching of the 2012 Patient Register to 2011 data to identify registrations that have been cancelled by GP practices in the year following Census Cluster analysis of all E&W LAs to see whether the typology of LAs identified through matching is mirrored in list inflation patterns nationally Multi-level modelling to summarise results, with individual and area level explanatory variables
Your consent to our cookies if you continue to use this website.