Presentation on theme: "1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS."— Presentation transcript:
1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS
2 Outline 1.Small Area Estimation Problem 2.History and current provision 3.Development in progress 4.Wider research 5.Consultancy service
3 1. Small Area Estimation Problem “ Official statistics provide an indispensable element in the information system of a democratic society” (Fundamental Principles of Official Statistics, UNSD ) Sample surveys are used to provide estimates for target parameters on population (or National) level and also for subpopulations or domains of study However implementation in a Small Area Context is challenging
4 Small Area Estimation Problem In small areas/domains sample sizes are usually not large enough to provide reliable estimates using classical design based methods. Small area estimation problem refers to SMALL SAMPLE SIZES (or none at all) in the domain or area of interest.
5 2. History Small Area Estimation in UK begun as research project in late 1990s. In response to calls for locally focussed information in many different areas : Environmental Business Social, e.g. health, housing, deprivation, unemployment. Also calls for more general domain estimation; – e.g. cross classifications by age/sex, occupation. Initial experimental studies on mental health estimation for DoH.
6 Developing alternative methodology Purpose : –To enable production of reliable estimates of characteristics of interest for small areas or domains based on very small or no sample. –To asses the quality (precision) of estimates. Several years of research and development (since 1995) –Partnership work with universities and Statistics Finland –The EURAREA project: Research programme funded by Eurostat to ‘enhance techniques to meet European needs’ (from 2001-2004)
7 Basis of Approach: Relax the Survey Restriction ‘Borrow strength’ by removing the isolation of depending solely on the survey and solely on respondents in a given area. –Widen the class of respondents for a given area by pooling together similar areas. –Widen the class of respondents by taking past period respondents into account. –Take advantage of other related data sources which are not sample survey based. Known as auxiliary data. e.g. Administrative data or census data which are available for all areas/domains.
8 Model based estimation All approaches detailed are based on an implicit or explicit model. The auxiliary data and use of survey data from all areas is the approach currently adopted in UK. –Borrows strength nationally. –Uses an explicit statistical model to represent the relationship between the survey variable of interest and auxiliary data. Dependent variable is survey variable of interest. Independent variables are certain auxiliary data variables known as covariates. Model fitted using sample data and assumed to apply generally. Model then used in the obtaining of area/domain estimates.
9 Outline of a model structure Suppose variable of interest, Y, in an area j is linearly related to a single covariate X A possible model structure is given by : where is the mean of Y in area j This is a deterministic structure, so we need to add some random variability
10 Obtain u j represent random area differences from the deterministic value. represents variability between areas.
11 Model fitting Fit the model using direct survey estimates for each area. This introduces additional sampling variability. Unit level sampling variability giving rise to additional area level sampling variability
12 Estimating from the model Once the model is fitted, estimate for area j by using parameter estimates :
13 Estimating from the model Once the model is fitted, estimate for area j by using parameter estimates : Estimate of mean squared error given by
14 Estimating from the model Once the model is fitted, estimate for area j by using parameter estimates : Estimate of mean squared error given by Modelling success measured by obtaining estimates with high precision based on low mean squared errors.
15 Current provision SAEP – a generic methodology for application to variables from household based surveys. –Mean household income based on Family Resources Survey published as Experimental Statistics for wards in 1998/99, 2001/02 and for middle layer super output areas 2004/05 Specialised methodology for labour market estimation of unemployment from Labour Force Survey. –Unemployment levels and rates routinely published quarterly as National Statistics for Local Authority Districts in Great Britain.
16 SAEP methodology and income estimation SAEP methodology is -: derived from outlined model-based approach, BUT is based on a unit (household)/area multilevel model; borrows strength across areas using multivariate area level auxiliary data (covariates); can model transformation of variable of interest if required; adapted for estimating at ward/middle layer super output area (MSOA) from customary ONS clustered design household sample surveys;
17 Application to income estimation - Response Variable Income value for each household sampled in Family Resources Survey (FRS). ~ 3,300 MSOAs in England and Wales with sample in 2004/05, ~ 21,500 total responding households. But not a simple random sample. –Clustered design with primary sampling units as postcode sectors, ~ 1,500 sampled postcode sectors.
18 Coping with design clustering Samples are random samples of postcode sectors; –So random terms are around postcode sectors, indexed by j Estimation is required for geographically distinct wards or middle layer super output areas; –So covariates are for these areas, indexed by d –For estimation, covariates must be known for all areas not just sampled areas.
19 SAEP model and estimator structure for income estimation Multilevel structure gives rise to unit level random term replacing area sampling variability Logarithmic transformation of income taken because of positive skewness of income distribution Model :
20 SAEP model fitting procedure Create a dataset containing : –Variable of interest from individual household responses to survey. –values of a large number of administrative and census variables for the particular household area of residence which we believe could impact on variable of interest, eg census variables, DWP social benefit claimant rates, council tax band proportions
21 SAEP model fitting procedure (cont.) Starting with a null model, fit covariates in a stepwise manner in order of significance by using specialised multilevel software – eg. MLwiN or SAS PROC MIXED. In this way select a set of significant covariates and fit an accepted model. Use diagnostic techniques to investigate model against assumptions eg. Randomness of residuals, unbiasedness of predictions.
22 Estimator and mean squared error Estimator on log income scale : A synthetic estimator is used omitting the random area terms :
23 Estimator and mean squared error Estimator on log income scale : A synthetic estimator is used omitting the random area terms : Mean squared error
24 Converting to raw income scale Need to make allowance for mean(log) log(mean) Area estimate
25 Converting to raw income scale Need to make allowance for mean(log) log(mean) Area estimate Confidence interval
26 Actual model for ward estimation of income in 2004/05 phrpman = proportion of household reference persons aged 16-74 who are in professional or managerial occupations. lnphrpecac = logit of proportion of household reference persons aged 16-74 who are economically active. lnphhtype1 = logit of proportion of one person households. engegh = proportion of council tax band G&H dwellings for England. pcgeo = proportion of people aged 60 and over claiming pension credit (guarantee element only).
28 Income estimation outputs Estimates obtained of sufficient precision for publication and acceptable to user community. Accredited as Experimental Statistics Placed on Neighbourhood Statistics website together with user guides and technical documentation.
29 Estimation of unemployment at local authority level BACKGROUND Unemployment is a key indicator and is used for policy making and resource allocation Official UK measure of unemployment follows the International Labour Organisation Definition (ILO) ILO unemployment is estimated via the Labour Force Survey (national level) Small (local) sample sizes in the LFS for some areas
30 Features of Labour Force Survey A rotating panel survey –Roughly 60,000 households surveyed each quarter –Each household remains in sample for 5 quarters (waves 1 to 5) then drops out Waves 1 and 5 respondents for last four quarters used to obtain an annual ‘local labour force survey’ dataset of about 90,000 independent households. Unclustered survey design – giving a sample in each LAD.
31 Features of unemployment modelling Unclustered LFS design means –direct estimates available for each LAD –availability of estimated random area terms in LAD estimation However –low precision of direct survey estimates due to small sample sizes –need for better precision model-based estimates Availability of a highly correlated covariate – number of claimants of unemployment benefit/job seekers allowance –Eliminates need for model fitting to a range of possible covariates on each occasion.
32 The small area estimation model A LOGISTIC multilevel model by local authority (d) and six age/sex classes (i). It relates the probability p di of an individual to be unemployed. Response variable: proportion of unemployed individuals in LFS in age/sex class of local authority (logit transformed). Covariate data Benefit data: the logit of the claimant proportion of job seekers allowance in each age/sex class within each local authority and also for overall age/sex classes; The age/sex class: male/female for age groups (16 to 24; 25 to 49; 50 and over) Geographical region: the 12 government office regions (GOR) ONS area classification : 7 categories under the National Statistics Area Classification for Local Authorities
33 The model used to link p id with the auxiliary data is a Binomial linear mixed model with a logistic link function Area random effect
34 Estimator from model The model-based estimator of proportion unemployed in each age/sex group of each LAD is then given after fitting model by : Note the use of the term in the estimator as it is now available for each LAD.
35 Model has estimated a proportion at each age/sex group This is converted into an estimate of unemployment level at each LAD by : –multiplying each proportion estimate by the LFS estimate of population unsampled –adding those sampled and found unemployed –summing the age/sex group estimates Final Estimator for unemployment level for area d is: Model-based estimate for Unemployment 6 age-sex groups
36 LAD Estimation of unemployment rate The estimate of unemployment rate is obtained using model-based estimate of unemployment level and the direct estimate of employment : Direct survey estimate of Employment Model-based estimate of Unemployment
37 Precision of Estimates The mean squared error (MSE) for the unemployment level estimates in LAD d is given by several components G 1 and G 2 come from the uncertainty in estimating the coefficients and u in the model G 3 arises because we have estimated the variance of u G 4 is necessary because the model estimates actual values rather than means G 5 is the additional variance component due the estimation of population size in each LAD
38 Unemployment estimates publication The standard errors of the model based estimates found to be smaller than the corresponding direct standard errors in each LAD. Model-based estimates have been accredited as National Statistics and now published quarterly in Labour Market statistics releases. (http://www.statistics.gov.uk/StatBase/Product.asp?vlnk=14160)
39 3. Developments in progress Labour Market area –Consistent estimation of all three labour market states: - employed, not economically active, unemployed –Currently… Local Authority labour market estimates are: Model-based estimates for unemployment Direct survey estimates for economically inactivity and employment figures Now developing a multivariate model to estimate concurrently number of unemployed, employed and economic inactive people by local authority
40 Compositional data The proportions of individuals classified in each category are: Proportions bounded between 0 and 1 and subject to a unity-sum constraint. Multinomial Logistic model to relate labour market probabilities with auxiliary data for all categories is therefore defined with only 2 equations.
43 The Model Relates the probabilities of labour market states to following predictors: age/sex group ; Geographical region and ONS area classification: Benefit data: claimant proportions (JSA) and incapacity benefit Other variables will be tested (e.g. income support)
44 Model estimates a proportion for each labour market state at each age/sex group Final Estimator for a labour market state j for area d is: Model-based estimate for all Labour Market States 6 age-sex groups All labour market states
45 D evelopment stage of multinomial model Current stage: –development of SAS programs to calculate precision of the multinomial estimates based on methodology proposed by Saei(2006) –Model selection and test of other covariates –Model cross validation including several time periods Up to now: –Implementation of the multinomial model indicates that plausible estimates can be obtained for all labour market states when simultaneously modelled
46 Developments in progress (cont.) Labour Market area –Unemployment estimation at Parliamentary constituency level Non-nested geography but with certain matching areas Issue here is to ensure consistency with local authority estimates at comparable areas Model developed and estimates likely to become available in the coming year
47 Developments in progress (cont.) Income estimation –Estimation at local authority level Clustered survey design entails a modification of SAEP framework to cater Currently in development –Estimation of poverty: proportion households below threshold Currently being developed for MSOA/local authority level
48 4. Wider research activities In conjunction with academic partners –Estimation of change over time Current work is confined to single point-in-time estimation but users would like indication of progress over time – particular in relation to funding –Estimation of poverty using M-quantile modelling Research using FRS data by Nikos Tzavidis –Models incorporating spatial relationships Preliminary investigation of spatial relationship in unemployment model in conjunction with Ayoub Saei at Southampton University Link with work at Imperial College by Nicky Best and Virgilio Gomez-Rubio
49 5. Methodology Consultancy Service ONS is currently establishing a methodology consultancy service –To undertake and support statistical work by other government departments and public sector organisations. –Resource for assessment/quality improvement –Currently working with Health and Safety Executive on small area estimation of incidence of work related illness at local authority level.
50 References Small Area Estimation Project Report. Model-Based Small Area Estimation Series No.2, ONS, January 2003 Developments in small area estimation in UK with focus in current research. Clarke, P., Mcgrath K., Chandra, H., Tzavidis, N. (2007). IASS Satellite Meeting on Small Area Estimation, Pisa. Model Based Estimates of Income for Middle Layer Super Output Areas 2004/05 Technical Report, ONS, September 2007 http://neighbourgood.statistics.gov.uk/HTMLDocs/images/Technical Report 2004_05 v2 - Final_tcm97-53513.pdf http://neighbourhood.statistics.gov.uk/dissemination/MetadataDownloa dPDF.do?downloadId=21704http://neighbourgood.statistics.gov.uk/HTMLDocs/images/Technical Report 2004_05 v2 Development of improved estimation methods for local area unemployment levels and rates. Labour Market Trends, vol. 111, no 1 www.statistics.gov.uk/cci/article.asp?id=372 Summary publication accompanying the publication of the 2003 unemployment estimates November 2004 http://www.statistics.gov.uk/downloads/theme_labour/ALALFS/AnnexA. pdf