Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Research Update Becky Tinsley Louise Morris.

Similar presentations


Presentation on theme: "Statistical Research Update Becky Tinsley Louise Morris."— Presentation transcript:

1 Statistical Research Update Becky Tinsley Louise Morris

2 Overview Brief reminder of what you saw last April Update on the research we have been doing on population estimates using admin data Findings from some case studies we have undertaken to produce statistics about population characteristics using admin data

3 Population Estimates

4 Framework for producing population estimates using linked admin data

5

6 Last time.....on matching Recall = percentage of true matches, out of all possible true matches available Precision = percentage of true matches, out of all the matches that Beyond 2011 made

7 New developments – associative matching

8

9 Framework for producing population estimates using linked admin data

10 NHS patient register DWP/HMRC Customer information system 1% coverage survey HESA data (students) population estimates Statistical Population Dataset (SPD) Last time.....on SPDs

11 SPD 5 Admin data method lower than 2011 Census Admin data method higher than 2011 Census

12 Population Pyramids using admin data

13 New developments – SPDs by quinary age, sex, and LA

14 New developments – SPDs at OA level Percentage of OAs Percentage difference from 2011 Census Estimates

15 New developments – evidence-based rules Research findingsRefinement Some overseas students do not work whilst they are at university, and some overseas students are unable to work due to the nature of their visas. As a result, they are unlikely to register for a National Insurance Number, thus not appearing on the CIS dataset. From current rules, this means that they will not be counted in the SPD, resulting in an undercount of students. A new rule: If a student appears on the HESA dataset and are linked with either the PR or the CIS, include them in the SPD Children are only included in the CIS dataset if their parents claim Child Benefit. We have seen an undercount of children in the SPD which supports this and in particular affects 0-4 year olds. This will worsen as changes to Child Benefit eligibility rules are implemented. A new rule: All children aged 0-4 on the PR will be included in the SPD, even if they are not found on the CIS School Census data contains accurate, regularly updated address information for children. A new rule: All children aged 5 to 15 who are either on PR or CIS and on School Census are included in the SPD. The SC address takes precedence over addresses on the other sources (much in the same way as HESA does for students) In some instances, people who have died are not removed from some data sources. A new rule: any records that link to a death registration are removed from the SPD

16 SPD 5

17 Using only PR for 0- 4 year olds SPD 9

18 New developments – the potential of ‘activity’ data

19

20 Framework for producing population estimates using linked admin data

21 Questions

22 Population Characteristics

23 Population Characteristics – last time Statistics about population and household characteristics  Population - ethnicity, education qualifications, health status etc  Household & housing – household size, accommodation type etc  Single variable and cross-classifications  Range of geographic areas Integrated system:  Combine admin data & direct data collection (survey)  Survey (census) initially - more use of admin data and modelling as coverage of topic improves, methods develop Trade-off – detail vs. frequency:  Need to better understand requirements for small area attribute data?  Outputs from 4% survey design – combining across time and consistency issues (complex output structure)

24 Consultation & Research Conclusions Clear need for multivariate statistics for small populations, within small geographic areas Admin data will be key to production of these outputs  Survey based approach cannot provide detail o Insufficient power to measure differences within LAs o Between subgroups, or through time  Significant further research to explore the potential of admin data in this context o Information available o Methods of application

25 Admin data research – last time High level assessment of wide range of data sources (approx 150) for socio-demographic coverage  Considered direct and indirect use  Focussed on shortlist of priority topics Coverage of characteristics data but also challenges:  For some sources - limited population coverage  quality issues (definitional differences, limited response etc)  Priority list of topics and sources identified (M12) Initial thinking on other applications:  Covariates – for modelling/integrated system approach  Modelling health variable using health records  Combining census and admin records on education qualifications

26 Further research - case studies Initial relatively simple applications  Based on assessment of available data sources Significant further research needed  Better understand sources & develop methods of application Considered options for potential applications  E.g. direct use, model-based application Research of an exploratory nature - no conclusions drawn Objective: to help inform development of a long- term plan

27 Case studies – topics & aims TopicAimData EthnicityComparison of admin sources with 2011 census responses to: compare consistency provide initial assessment of coverage/ definitional differences. Access to record level data Household Estimates Unemployment Estimates Potential for model-based estimates at lower geographies Access to aggregate data IncomeInitial assessment of issues in direct use of admin data No data access

28 2011 Census ethnicity English School Census ethnicity White British Irish Irish Traveller/ Gypsy/Romany Indian Bangladeshi Pakistani White and Asian Other Asian Chinese African White and Black African Caribbean White and Black Caribbean Other White Other Black Other Mixed Other Ethnicity Missing Total (denominator) White British95%0.50% 2%0.50% 2%5,048,672 Irish41%47%1%0.50% 1%0.50% 5%0.50%3%0.50%2%22,609 Irish Traveller/ Gypsy/Romany 35%2%54%0.50% 6%0.50% 1%2%9,150 Indian 0.50 % 89%0.50%1% 5%0.50% 1%0.50%2%169,609 Bangladeshi 0.50 % 92%1%0.50%2%0.50% 4%99,905 Pakistani 0.50 % 1%0.50%86%1%4%0.50% 1%3%4%252,189 White and Asian 11%0.50% 1%0.50%2%54%3%0.50% 3%0.50%15%3%4%82,152 Other Asian1%0.50% 12%0.50%2% 58%1%0.50% 4%17%2%84,028 Chinese2%0.50% 1%2%83%0.50% 7%2% 27,577 African1%0.50% 1%0.50%83%1% 0.50%1%7%2%1%3%190,489 White and Black African 6%0.50% 8%55%1%3% 14%2%4%38,611 Caribbean1%0.50% 3%0.50%77%3%0.50%9%3%1%4%71,256 White and Black Caribbean 12%0.50% 2%3%62%1%2%12%1%4%108,920 Other White8%0.50%1%0.50% 1%0.50% 75%0.50%6%5%3%169,626 Other Black1%0.50% 34%1%30%2%0.50%20%6%1%4%27,625 Other Mixed10%0.50% 1%0.50%1%5%3%1%2%3%2%8%5%4%47%5% 23,763 Other Ethnicity5%0.50% 2%12%0.50%2%1%0.50% 10%2%10%50%4%66,760 Missing69%0.50% 2%1%3%1%2%0.50%4%0.50%2%1%5%1%2% 3%222,193 Census and England School Census Ethnicity

29 Census Comparison of Percentage of Households of Each Size for selected LAs Percentage of Households Household Size 1 2 3 4 5+ 10203040 Birmingham 10203040 Boston 10203040 Bournemouth 10203040 Brent 10203040 Cambridge 10203040 Camden 10203040 Cardiff 10203040 Ceredigion 10203040 Cheshire East 10203040 Chesterfield 10203040 Coventry 10203040 1 2 3 4 5+ East Devon 1 2 3 4 5+ 10203040 Eastbourne 10203040 Forest Heath 10203040 Herefordshire, County of 10203040 Hillingdon 10203040 Kensington and Chelsea 10203040 Kingston upon Thames 10203040 Lambeth 10203040 Leicester 10203040 Manchester 10203040 Newcastle upon Tyne 10203040 Newham 10203040 1 2 3 4 5+ Northumberland 1 2 3 4 5+ 10203040 Oxford 10203040 Powys 10203040 Reading 10203040 Richmondshire 10203040 Rotherham 10203040 Stratford-on-Avon 10203040 Tonbridge and Malling 10203040 Waltham Forest 10203040 Warwick 10203040 Waveney 10203040 Westminster 10203040 1 2 3 4 5+ Wirral Administrative Data Method

30 Unemployment estimates Overview Methodology focus – model based estimation Extending current LA level approach:  No census covariates  Uses DWP jobseekers allowance data (aggregate) Predicting unemployment as a proportion of 16-64 population within MSOA Findings Did not perform well compared to other SAE models Confidence Intervals showed high level of uncertaninty:  Limited ability to distinguish differences between MSOAs Assessment of CVs provided comparison against quality standards……  Standard – attribute for 3% of population estimated with cv of 20% or less  Model: CVs consistently high - >20.1%, median 30% Highlights need to think about methodological issues Models applied for one topic / geographic area don’t necessarily work elsewhere Access to record level data will help understand sources and allow scope for other applications

31 Income case study Income – key topic of interest to users  Not included – response concerns Potential sources:  admin data - definitional issues  Surveys - limited geographic level, Available sources:  Range of administrative sources – PAYE, benefits, pension etc  Survey and model based estimates – ONS and DWP (FRS), HMRC Opportunities – collaborative working with other departments:  Better understand sources  Review statistical methods / applications – Survey of Personal Income Solution – combining admin data and survey sources?

32 Admin data case studies - findings Potential for using admin and survey data in combination to produce statistics about population characteristics  High level of agreement for children between ethnicity on School Census and Census (but differed by ethnic group)  Distribution of household size and composition on admin sources similar to Census However differences due to definitions, collection processes, classifications and lags  And need to think about methods Will carry on research – more on this tomorrow

33 Questions


Download ppt "Statistical Research Update Becky Tinsley Louise Morris."

Similar presentations


Ads by Google