Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton Beyond 2011 Programme Director Office for National Statistics.

Similar presentations

Presentation on theme: "Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton Beyond 2011 Programme Director Office for National Statistics."— Presentation transcript:

1 Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton Beyond 2011 Programme Director Office for National Statistics

2 Outline Background to the Census The Beyond 2011 Programme Statistical options for the future Key mathematical challenges Timeframes Next steps

3 The purpose of the census The basis for national decision making: Service planning where to locate schools, hospitals, etc. housing plans transport Resource allocation health and local govt £100bn each per year Policy making and monitoring Equality – age, sex, ethnicity, disability Ageing population – pensions etc Academic and social research

4 Key Census outputs Benchmark statistics on: Population units: people and housing with key demographics (age, sex, ethnicity) Population structures: households, families Population and housing attributes For small areas and small population groups With multivariate analysis Consistent and comparable

5 The 2011 Census Very successful - 94% response overall - Over 90% across London overall - Over 80% response in every Local Authority Significant improvement in key Local Authorities The result of extensive mathematical modelling - Response targets to achieve required output quality - Predicted initial response from key groups / areas - Numbers of field staff required to reach final targets - Daily live response rate modelling to support operational decisions

6 The Beyond 2011 Programme Why change? – Why look beyond 2011? Rapidly changing society Evolving user requirements New opportunities – data sharing Traditional census – costly and infrequent?? UK Statistics Authority to Minister for Cabinet Office “As a Board we have been concerned about the increasing costs and difficulties of traditional Census-taking. We have therefore already instructed the ONS to work urgently on the alternatives, with the intention that the 2011 Census will be the last of its kind.”

7 Beyond 2011 : Statistical options Aggregate analysis 100% linkage to create ‘statistical population spine’ (Intermediate) Sample linkage e.g. 1% of postcodes Address register + Survey Administrative data options Traditional Census (long form to everyone) Rolling Census (over 5/10 year period) Short Form (everyone), Long form (Sample) Short Form + Annual Survey (US model) Census options Survey option(s)

8 SOURCESFRAMEDATA ESTIMATION OUTPUTS All National to Small Area Beyond 2011 – statistical options Population Data Socio demographic Attribute Data Address Register Household Communal Maintained national address gazetteer – provides frame for population data & surveys Population estimates Attribute estimates Interactional Analysis E.g. TTWA Longitudinal data Household structure etc CENSUS Adjusting for Adjusting for non response Coverage Assessment incl. under & over-coverage - by survey and admin data? missing data and error bias in survey (or sources) Quality measurement Population distribution provides weighting for attributes Socio demographic Survey(s ) Admin Source Commercial sources? Comm Source ?? increasing later? Surveys to fill gaps

9 Potential data sources Population data NHS Patient Register DWP/HMRC Customer Information System Electoral roll (> 17 yrs) School Census (5-16 yrs) Higher Education Statistics Agency data (Students) Birth and Death registrations Socio-demographic sources Surveys DVLA? Commercial sources? Utilities? TV licensing?

10 DWP CIS population counts compared with ONS Mid Year population estimates

11 Patient Register population counts compared with ONS Mid Year population estimates

12 Electoral Roll population counts compared with ONS Mid Year population estimates

13 Higher Education Students Customer Information System Coverage Of Main Administrative Sources Extras includes: Some duplicates International students on short-term courses Students ceased studying, not formally deregistered Extras includes: Short-term migrant children Missing includes: Under 17s Ineligible voters Non responders Missing includes: Non school aged people Independent school children Home schooled children Missing includes: Some migrant worker dependants Some international students Undocumented asylum seekers Missing includes: Migrants not (yet) registered Newborn babies Some private only patients Missing includes: Non higher education students Independent University students Extras includes: Some duplicates Some ex-pats Some deceased Short-term migrants Extras includes: Multiple registrations Some ex-pats Some deceased Short-term migrants Extras includes: Some ex-pats Some deceased Short-term migrants Missing includes: Non-drivers Under 17’s Some foreign-licence holders Extras includes: Some ex-pats Some deceased UK Driving Licence Resident Population CIS PRD Electoral Roll Patient Register Data School Census SC ER SC ER DVLA HESA CIS PRD

14 Key risks of non census alternatives Public opinion Technical challenge Changes in administrative datasets UK harmonisation Getting a decision

15 Key mathematical challenges Methods for Production of statistics Coverage assessment and adjustment Data matching Correcting for missing data Small area population attribute modelling Methods for Protection of confidentiality Data pre-processing and encryption Statistical Disclosure Control Evaluation Quantifying financial benefits Defining what is an ‘acceptable’ level of quality

16 Coverage assessment How many fish in your pond? Day 1, catch 100, tag them, put them back Day 2, catch 50, find 25 already tagged How many fish in your pond? Answer: 200 (ish) According to day 2, half in the pond are marked We marked 100, so there must be about 200 altogether “Dual System Estimation”

17 Application to the census We ‘fish’ twice, in 1% of postcodes Census Then census coverage survey (CCS) 6 weeks later No need for tags They have names, addresses, dates of birth We match the two separate lists of people (500k) to work out What percentage of people in the CCS had first been ‘caught’ in the census Thus, the total population in each postcode

18 Coverage adjustment Apply the adjustment factor to the other 99% of postcodes where we did no CCS With appropriate stratification Add ‘synthetic’ records Extra households Extra people With the right key characteristics In roughly the right locations Using ‘Donor imputation’ to complete each record So that all the final tables add up to the right number

19 Dual system estimation - formulae Counted By CCS? Yes NoTOTAL Counted Yesn 11 n 10 n 1+ By Census? Non 01 n 00 n 0+ TOTALn +1 n +0 n ++ Total population n ++ = n 1+  n +1 n 11 We can make life very complicated for people who aren’t mathematicians!

20 Application to administrative data Administrative data sources also have undercount But the bigger problems are due to time lags - Emigration; deaths Results in overcount in administrative sources - Internal migration Results in people recorded in the wrong location - overcount in one area, undercount in another Just applying Dual System Estimation would result in significant over-estimation

21 Potential overcount estimation approaches (1) Redesigned coverage survey asking: who usually lives here? when did you move in? where are you registered to vote? where are you registered with a GP? who lived here before you? where do they live now? does John Smith still live here? Increasing sensitivity Reducing appropriateness / legality

22 Potential overcount estimation approaches (2) Match new coverage survey to admin data Measure coverage patterns, develop models Intermediate model Match records only in CS postcodes Full linkage model Match records in all sources across all postcodes Keep records if same location on all datasets => more likely to be correct Particularly if recently recorded ‘activity’ Develop intelligent rules to resolve residual records Reduces scale of overcount - but increases undercount

23 Small Area Estimation Surveys only give sufficient precision at relatively high levels of geography Users require information at lower levels Census ‘output area’ ~ 125 households / 300 people SAE - family of methods to increase precision of survey estimates at lower geographies by “borrowing strength” from other, more detailed data sources, or neighbouring areas Widely used by National Statistical Institutes e.g. unemployment, income, households in poverty - but generally univariate, estimating means

24 CVsSample size= 1,000,000 people Prevalence 0.5%1%5%10%15%20%50% Population size National 50,000,0001.4%1.0%0.4%0.3%0.2% 0.1% Region 5,500,0004.3%3.0%1.3%0.9%0.7%0.6%0.3% LA 150,00025.8%18.2%8.0%5.5%4.3%3.7%1.8% LA (small) 50,00044.6%31.5%13.8%9.5%7.5%6.3%3.2% MSOA (avg) 7,200117.6%82.9%36.3%25.0%19.8%16.7%8.3% MSOA (min) 5,000141.1%99.5%43.6%30.0%23.8%20.0%10.0% LSOA (avg) 1,600249.4%175.9%77.1%53.0%42.1%35.4%17.7% LSOA (min) 1,000315.4%222.5%97.5%67.1%53.2%44.7%22.4% OA 300575.9%406.2%178.0%122.5%97.2%81.6%40.8% Ward (Eng) 7,000119.2%84.1%36.8%25.4%20.1%16.9%8.5% Ward (Wales) 3,500168.6%118.9%52.1%35.9%28.5%23.9%12.0% Precision of direct survey outputs

25 Potential components (Very?) Large survey Administrative sources aggregate (area based) or unit record available for lower geographic levels than survey outputs Possible models Generalised Linear Models (GLM): multi-level models spatial / temporal extensions can add power Bayesian or frequentist estimation frameworks Micro-simulation

26 Small area modelling - issues Quality of ancillary data is absolutely critical Most existing applications use census covariates More powerful models incorporate time and space effects, but are more complex Every variable is different, and requires different models There’s often no substitute for geography as a predictor ‘similar people gather in similar areas’ BUT clear academic view – the methods exist, it just depends on data

27 201520162017201820192020202120222023 population estimates population characteristics outputs detailed design procure / develop develop / test ADMIN DATA SOLUTION 201520162017201820192020202120222023 detailed design procure / develop develop / test rehearserunoutputs TRADITIONAL CENSUS SOLUTION 2011201220132014 research / definition initiation BEYOND 2011 ‘Phase 1’ Sept 2014 recommendation & decision point Beyond 2011 - Timeline - the key decision

28 201520162017201820192020202120222023 population estimates population characteristics outputs detailed design procure / develop develop / test 2011201220132014 research / definition initiation 2024 address register admin sources required on an ongoing basis – ideally the National Address Gazetteer – subject to confirmation of quality public sector & commercial ? developing over time coverage surveys testing continuous assessment attribute surveys info from existing surveys – e.g. labour force survey, integrated household survey etc supplemented by new targeted surveys as required modelling increasing modelling over time Beyond 2011 - Timeline (non census solution) test linkage increasing linkage over time

29 202720282029203020312032203320342035 2023202420252026 2036 address register required on an ongoing basis administrative sources will change and disappear and be added & develop over time continuous coverage survey existing surveys increasing linkage over time increasing modelling over time need for attribute surveys declines over time ? 20372038 regular production of population and attribute estimates ongoing methodology refinement Beyond 2011 - and into the future

30 Improving quality & quantity accuracy of population estimates accuracy of characteristics estimates range of topics small area detail multivariate small area detail experimental statistics develop to become national statistics 201320312021

31 Statistical benefit profile 2011202120312041 Benefit CensusAlternative method loss gain loss gain

32 Cost profile (real terms) 2011202120312041 Cost Census ??? Alternative method

33 Next steps Research potential methods and models Using census data To understand coverage patterns in admin data To simulate new survey designs As a gold standard – how well can we replicate census results? Assess quality, costs, benefits, risks Discuss with stakeholders (!) Public acceptability research Report progress every six months Make recommendations in 2014

34 Advice and assistance very welcome!

Download ppt "Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton Beyond 2011 Programme Director Office for National Statistics."

Similar presentations

Ads by Google