1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS.

Slides:



Advertisements
Similar presentations
Multilevel modelling short course
Advertisements

Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
Use of health surveys in resource allocation Matt Sutton Senior Research Fellow University of Glasgow Health Survey's User Group.
Longitudinal LFS Catherine Barham and Paul Smith ONS.
Household Projections for England Yolanda Ruiz DCLG 16 th July 2012.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
1.2.4 Statistical Methods in Poverty Estimation 1 MEASUREMENT AND POVERTY MAPPING UPA Package 1, Module 2.
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Rural deprivation in Worcestershire Worcestershire Partnership Board 21 st July 2010 Tom Smith Oxford Consultants for Social Inclusion (OCSI)
Small Area Estimates of Fuel Poverty in Scotland Phil Clarke (ONS), Ganka Mueller (Scottish Government)
Northern Ireland Neighbourhood Information Service - NINIS Fiona Johnston Neighbourhood Statistics NISRA.
Multiple Linear Regression Model
Chapter 10 Simple Regression.
Clustered or Multilevel Data
Chapter 11 Multiple Regression.
Topic 3: Regression.
Squeezing more out of existing data sources: Small Area Estimation of Welfare Indicators Berk Özler The World Bank Development Research Group, Poverty.
Creating Research proposal. What is a Marketing or Business Research Proposal? “A plan that offers ideas for conducting research”. “A marketing research.
Arun Srivastava. Small Areas What is a small area? Sub - population Domain The Domain need not necessarily be geographical. Examples Geographical Subpopulations.
Estonian Labour Force Survey Ülle Pettai Leading Statistician Social Surveys Service Population and Social Statistics Department.
MEASURING INCOME AND POVERTY AT A NATIONAL LEVEL Sian Rasdale Social Justice Analysis, Scottish Government.
The effect of uncertainty on fuel poverty statistics Laura Williams, Department of Energy and Climate Change GSS Methodology Symposium, 6 th July 2011.
Better Information for Regional Government Marie Cruddas, Minda Phillips & Pete Brodie, ONS. Presented by Martin Brand, ONS Methodology Directorate.
Use of survey (LFS) to evaluate the quality of census final data Expert Group Meeting on Censuses Using Registers Geneva, May 2012 Jari Nieminen.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
ISCO-08 - Current Status and plans to support implementation David Hunter Department of Statistics International Labour Office United Nations Expert Group.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
1 POPULATION PROJECTIONS Session 8 - Projections for sub- national and sectoral populations Ben Jarabi Population Studies & Research Institute University.
Sampling Class 7. Goals of Sampling Representation of a population Representation of a population Representation of a specific phenomenon or behavior.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
European Conference on Quality in Official Statistics Roma, July 8-11, 2008 New Sampling Design of INSEE’s Labour Force Survey Sébastien Hallépée Vincent.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Small Area Health Insurance Estimates (SAHIE) Program Joanna Turner, Robin Fisher, David Waddington, and Rick Denby U.S. Census Bureau October 6, 2004.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Small-area estimation in Official Statistics: ICT survey in Enterprises of the Basque Country Jorge Aramendi, Jose Miguel Escalada, Elena Goni & Anjeles.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
American Community Survey Multi-Year Estimates: Challenges and Opportunities Discussant II: Mike Cohen Study Director, CNSTAT September 25 th, 2008.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Simon Power Managing Consultant John Rae Director Understanding Communities Through PayCheck
OPENING QUESTIONS 1.What key concepts and symbols are pertinent to sampling? 2.How are the sampling distribution, statistical inference, and standard.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Household Economic Resources Discussant Comments UN EXPERT GROUP MEETING 9 September 2008 Garth Bode, Australian Bureau of Statistics.
General Register Office for S C O T L A N D information about Scotland's people Household Estimates and Projections Esther Roughsedge General Register.
Prediction, Goodness-of-Fit, and Modeling Issues Prepared by Vera Tabakova, East Carolina University.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
ONS Labour Market Statistics on Nomis by Bob Watson and Sinclair Sutherland.
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Modelling international migration to produce local level estimates Ruth Fulton Office for National Statistics.
What is my neighbourhood like? Read this if you want to learn: 1)Why statistical data about your local area is important 2)What statistical information.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
An ecological analysis of crime and antisocial behaviour in English Output Areas, 2011/12 Regression modelling of spatially hierarchical count data.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
IAOS Shanghai – Reshaping Official Statistics Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at.
Sinclair Sutherland Labour supply: Finding and using statistics.
Indices of Deprivation Measuring change between ID2004 and ID2007 Kate Wilkinson University of Oxford.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Small Area Estimation Programme
Worklessness Data on Neighbourhood Statistics
Geographic Definitions and Linking
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
Variant Projections.
The European Statistical Training Programme (ESTP)
Expert Group on Quality of Life Indicators
Marie Reijo, Population and Social Statistics
Presentation transcript:

1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

2 Outline 1.Small Area Estimation Problem 2.History and current provision 3.Development in progress 4.Wider research 5.Consultancy service

3 1. Small Area Estimation Problem “ Official statistics provide an indispensable element in the information system of a democratic society” (Fundamental Principles of Official Statistics, UNSD ) Sample surveys are used to provide estimates for target parameters on population (or National) level and also for subpopulations or domains of study However implementation in a Small Area Context is challenging

4 Small Area Estimation Problem In small areas/domains sample sizes are usually not large enough to provide reliable estimates using classical design based methods. Small area estimation problem refers to SMALL SAMPLE SIZES (or none at all) in the domain or area of interest.

5 2. History Small Area Estimation in UK begun as research project in late 1990s. In response to calls for locally focussed information in many different areas : Environmental Business Social, e.g. health, housing, deprivation, unemployment. Also calls for more general domain estimation; – e.g. cross classifications by age/sex, occupation. Initial experimental studies on mental health estimation for DoH.

6 Developing alternative methodology Purpose : –To enable production of reliable estimates of characteristics of interest for small areas or domains based on very small or no sample. –To asses the quality (precision) of estimates. Several years of research and development (since 1995) –Partnership work with universities and Statistics Finland –The EURAREA project: Research programme funded by Eurostat to ‘enhance techniques to meet European needs’ (from )

7 Basis of Approach: Relax the Survey Restriction ‘Borrow strength’ by removing the isolation of depending solely on the survey and solely on respondents in a given area. –Widen the class of respondents for a given area by pooling together similar areas. –Widen the class of respondents by taking past period respondents into account. –Take advantage of other related data sources which are not sample survey based. Known as auxiliary data. e.g. Administrative data or census data which are available for all areas/domains.

8 Model based estimation All approaches detailed are based on an implicit or explicit model. The auxiliary data and use of survey data from all areas is the approach currently adopted in UK. –Borrows strength nationally. –Uses an explicit statistical model to represent the relationship between the survey variable of interest and auxiliary data.  Dependent variable is survey variable of interest.  Independent variables are certain auxiliary data variables known as covariates.  Model fitted using sample data and assumed to apply generally.  Model then used in the obtaining of area/domain estimates.

9 Outline of a model structure Suppose variable of interest, Y, in an area j is linearly related to a single covariate X A possible model structure is given by : where is the mean of Y in area j This is a deterministic structure, so we need to add some random variability

10 Obtain u j represent random area differences from the deterministic value. represents variability between areas.

11 Model fitting Fit the model using direct survey estimates for each area. This introduces additional sampling variability. Unit level sampling variability giving rise to additional area level sampling variability

12 Estimating from the model Once the model is fitted, estimate for area j by using parameter estimates :

13 Estimating from the model Once the model is fitted, estimate for area j by using parameter estimates : Estimate of mean squared error given by

14 Estimating from the model Once the model is fitted, estimate for area j by using parameter estimates : Estimate of mean squared error given by Modelling success measured by obtaining estimates with high precision based on low mean squared errors.

15 Current provision SAEP – a generic methodology for application to variables from household based surveys. –Mean household income based on Family Resources Survey published as Experimental Statistics for wards in 1998/99, 2001/02 and for middle layer super output areas 2004/05 Specialised methodology for labour market estimation of unemployment from Labour Force Survey. –Unemployment levels and rates routinely published quarterly as National Statistics for Local Authority Districts in Great Britain.

16 SAEP methodology and income estimation SAEP methodology is -: derived from outlined model-based approach, BUT is based on a unit (household)/area multilevel model; borrows strength across areas using multivariate area level auxiliary data (covariates); can model transformation of variable of interest if required; adapted for estimating at ward/middle layer super output area (MSOA) from customary ONS clustered design household sample surveys;

17 Application to income estimation - Response Variable Income value for each household sampled in Family Resources Survey (FRS). ~ 3,300 MSOAs in England and Wales with sample in 2004/05, ~ 21,500 total responding households. But not a simple random sample. –Clustered design with primary sampling units as postcode sectors, ~ 1,500 sampled postcode sectors.

18 Coping with design clustering Samples are random samples of postcode sectors; –So random terms are around postcode sectors, indexed by j Estimation is required for geographically distinct wards or middle layer super output areas; –So covariates are for these areas, indexed by d –For estimation, covariates must be known for all areas not just sampled areas.

19 SAEP model and estimator structure for income estimation Multilevel structure gives rise to unit level random term replacing area sampling variability Logarithmic transformation of income taken because of positive skewness of income distribution Model :

20 SAEP model fitting procedure Create a dataset containing : –Variable of interest from individual household responses to survey. –values of a large number of administrative and census variables for the particular household area of residence which we believe could impact on variable of interest, eg census variables, DWP social benefit claimant rates, council tax band proportions

21 SAEP model fitting procedure (cont.) Starting with a null model, fit covariates in a stepwise manner in order of significance by using specialised multilevel software – eg. MLwiN or SAS PROC MIXED. In this way select a set of significant covariates and fit an accepted model. Use diagnostic techniques to investigate model against assumptions eg. Randomness of residuals, unbiasedness of predictions.

22 Estimator and mean squared error Estimator on log income scale : A synthetic estimator is used omitting the random area terms :

23 Estimator and mean squared error Estimator on log income scale : A synthetic estimator is used omitting the random area terms : Mean squared error

24 Converting to raw income scale Need to make allowance for mean(log) log(mean) Area estimate

25 Converting to raw income scale Need to make allowance for mean(log) log(mean) Area estimate Confidence interval

26 Actual model for ward estimation of income in 2004/05 phrpman = proportion of household reference persons aged who are in professional or managerial occupations. lnphrpecac = logit of proportion of household reference persons aged who are economically active. lnphhtype1 = logit of proportion of one person households. engegh = proportion of council tax band G&H dwellings for England. pcgeo = proportion of people aged 60 and over claiming pension credit (guarantee element only).

27

28 Income estimation outputs Estimates obtained of sufficient precision for publication and acceptable to user community. Accredited as Experimental Statistics Placed on Neighbourhood Statistics website together with user guides and technical documentation.

29 Estimation of unemployment at local authority level BACKGROUND Unemployment is a key indicator and is used for policy making and resource allocation Official UK measure of unemployment follows the International Labour Organisation Definition (ILO) ILO unemployment is estimated via the Labour Force Survey (national level) Small (local) sample sizes in the LFS for some areas

30 Features of Labour Force Survey A rotating panel survey –Roughly 60,000 households surveyed each quarter –Each household remains in sample for 5 quarters (waves 1 to 5) then drops out Waves 1 and 5 respondents for last four quarters used to obtain an annual ‘local labour force survey’ dataset of about 90,000 independent households. Unclustered survey design – giving a sample in each LAD.

31 Features of unemployment modelling Unclustered LFS design means –direct estimates available for each LAD –availability of estimated random area terms in LAD estimation However –low precision of direct survey estimates due to small sample sizes –need for better precision model-based estimates Availability of a highly correlated covariate – number of claimants of unemployment benefit/job seekers allowance –Eliminates need for model fitting to a range of possible covariates on each occasion.

32 The small area estimation model A LOGISTIC multilevel model by local authority (d) and six age/sex classes (i). It relates the probability p di of an individual to be unemployed. Response variable: proportion of unemployed individuals in LFS in age/sex class of local authority (logit transformed). Covariate data Benefit data: the logit of the claimant proportion of job seekers allowance in each age/sex class within each local authority and also for overall age/sex classes; The age/sex class: male/female for age groups (16 to 24; 25 to 49; 50 and over) Geographical region: the 12 government office regions (GOR) ONS area classification : 7 categories under the National Statistics Area Classification for Local Authorities

33 The model used to link p id with the auxiliary data is a Binomial linear mixed model with a logistic link function Area random effect

34 Estimator from model The model-based estimator of proportion unemployed in each age/sex group of each LAD is then given after fitting model by : Note the use of the term in the estimator as it is now available for each LAD.

35 Model has estimated a proportion at each age/sex group This is converted into an estimate of unemployment level at each LAD by : –multiplying each proportion estimate by the LFS estimate of population unsampled –adding those sampled and found unemployed –summing the age/sex group estimates Final Estimator for unemployment level for area d is: Model-based estimate for Unemployment 6 age-sex groups

36 LAD Estimation of unemployment rate The estimate of unemployment rate is obtained using model-based estimate of unemployment level and the direct estimate of employment : Direct survey estimate of Employment Model-based estimate of Unemployment

37 Precision of Estimates The mean squared error (MSE) for the unemployment level estimates in LAD d is given by several components G 1 and G 2 come from the uncertainty in estimating the coefficients and u in the model G 3 arises because we have estimated the variance of u G 4 is necessary because the model estimates actual values rather than means G 5 is the additional variance component due the estimation of population size in each LAD

38 Unemployment estimates publication The standard errors of the model based estimates found to be smaller than the corresponding direct standard errors in each LAD. Model-based estimates have been accredited as National Statistics and now published quarterly in Labour Market statistics releases. (

39 3. Developments in progress Labour Market area –Consistent estimation of all three labour market states: - employed, not economically active, unemployed –Currently… Local Authority labour market estimates are: Model-based estimates for unemployment Direct survey estimates for economically inactivity and employment figures Now developing a multivariate model to estimate concurrently number of unemployed, employed and economic inactive people by local authority

40 Compositional data The proportions of individuals classified in each category are:  Proportions bounded between 0 and 1 and subject to a unity-sum constraint.  Multinomial Logistic model to relate labour market probabilities with auxiliary data for all categories is therefore defined with only 2 equations.

41 Multinomial Logistic Model

42 Multinomial Logistic Model Then:

43 The Model Relates the probabilities of labour market states to following predictors: age/sex group ; Geographical region and ONS area classification: Benefit data: claimant proportions (JSA) and incapacity benefit Other variables will be tested (e.g. income support)

44 Model estimates a proportion for each labour market state at each age/sex group Final Estimator for a labour market state j for area d is: Model-based estimate for all Labour Market States 6 age-sex groups All labour market states

45 D evelopment stage of multinomial model Current stage: –development of SAS programs to calculate precision of the multinomial estimates based on methodology proposed by Saei(2006) –Model selection and test of other covariates –Model cross validation including several time periods Up to now: –Implementation of the multinomial model indicates that plausible estimates can be obtained for all labour market states when simultaneously modelled

46 Developments in progress (cont.) Labour Market area –Unemployment estimation at Parliamentary constituency level Non-nested geography but with certain matching areas Issue here is to ensure consistency with local authority estimates at comparable areas Model developed and estimates likely to become available in the coming year

47 Developments in progress (cont.) Income estimation –Estimation at local authority level Clustered survey design entails a modification of SAEP framework to cater Currently in development –Estimation of poverty: proportion households below threshold Currently being developed for MSOA/local authority level

48 4. Wider research activities In conjunction with academic partners –Estimation of change over time Current work is confined to single point-in-time estimation but users would like indication of progress over time – particular in relation to funding –Estimation of poverty using M-quantile modelling Research using FRS data by Nikos Tzavidis –Models incorporating spatial relationships Preliminary investigation of spatial relationship in unemployment model in conjunction with Ayoub Saei at Southampton University Link with work at Imperial College by Nicky Best and Virgilio Gomez-Rubio

49 5. Methodology Consultancy Service ONS is currently establishing a methodology consultancy service –To undertake and support statistical work by other government departments and public sector organisations. –Resource for assessment/quality improvement –Currently working with Health and Safety Executive on small area estimation of incidence of work related illness at local authority level.

50 References Small Area Estimation Project Report. Model-Based Small Area Estimation Series No.2, ONS, January 2003 Developments in small area estimation in UK with focus in current research. Clarke, P., Mcgrath K., Chandra, H., Tzavidis, N. (2007). IASS Satellite Meeting on Small Area Estimation, Pisa. Model Based Estimates of Income for Middle Layer Super Output Areas 2004/05 Technical Report, ONS, September Report 2004_05 v2 - Final_tcm pdf dPDF.do?downloadId=21704http://neighbourgood.statistics.gov.uk/HTMLDocs/images/Technical Report 2004_05 v2 Development of improved estimation methods for local area unemployment levels and rates. Labour Market Trends, vol. 111, no 1 Summary publication accompanying the publication of the 2003 unemployment estimates November pdf