1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi.

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

1 Small Area Models for Unemployment Rate Estimation at Sub-Provincial Areas in Italy DAló M., Di Consiglio L., Falorsi S., Solari F. ~ Istat Pratesi M.,
Challenges in small area estimation of poverty indicators
Micro-level Estimation of Child Undernutrition Indicators in Cambodia Tomoki FUJII Singapore Management
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
Weighting sample surveys with Bascula Harm Jan Boonstra Statistics Netherlands.
Complex Surveys Sunday, April 16, 2017.
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Small area Estimation of Italian poverty and social exclusion indicators Stefano Falorsi Michele D’Alò Loredana Di Consiglio Fabrizio Solari Matteo Mazziotta.
Clustered or Multilevel Data
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
STAT262: Lecture 5 (Ratio estimation)
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
Ordinary Kriging Process in ArcGIS
Maximum likelihood (ML)
Correlation and Regression Analysis
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.
Arun Srivastava. Small Areas What is a small area? Sub - population Domain The Domain need not necessarily be geographical. Examples Geographical Subpopulations.
Improving Quality in the Office for National Statistics’ Annual Earnings Statistics Pete Brodie & Kevin Moore UK Office for National Statistics.
Inference for regression - Simple linear regression
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
9 th Workshop on Labour Force Survey Methodology – Rome, May 2014 The Italian LFS sampling design: recent and future developments 9 th Workshop on.
Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi, Piero Demetrio Falorsi 
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
European Conference on Quality in Official Statistics Roma, July 8-11, 2008 New Sampling Design of INSEE’s Labour Force Survey Sébastien Hallépée Vincent.
ISTAT - Italian National Institute of Statistics Labour Force Survey Division Unit “Methods for LFS data treatment” European Conference on Quality in Official.
Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
Introduction Since 1995, the Municipality of Firenze designed a quarterly labour force (LF) survey, parallel to that of ISTAT, to cope with the unavailability,
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
ESSnet on Small Area Estimation
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Weighting and estimation methods: description in the Memobust handbook Loredana di Consiglio, Fabrizio Solari 2013 European Establishment Statistics Workshop.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Sampling Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole -IDEA Brigitte Helynck, Philippe Malfait,
Improving of Household Sample Surveys Data Quality on Base of Statistical Matching Approaches Ganna Tereshchenko Institute for Demography and Social Research,
1 B IVARIATE AND MULTIPLE REGRESSION Estratto dal Cap. 8 di: “Statistics for Marketing and Consumer Research”, M. Mazzocchi, ed. SAGE, LEZIONI IN.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
CpSc 881: Machine Learning
Rome, May 2014 Structural variables Weighting the Spanish annual subsample.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Tutorial I: Missing Value Analysis
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data Yimei Li Department of Biostatistics St. Jude Children’s Research Hospital Joint.
Geostatistics GLY 560: GIS for Earth Scientists. 2/22/2016UB Geology GLY560: GIS Introduction Premise: One cannot obtain error-free estimates of unknowns.
1 ESSnet on Small Area Estimation Meeting no. 4 ESSnet on Small Area Estimation Meeting no. 4 Neuchatel, 7-8 July 2011 WP4: Software Tools.
IAOS Shanghai – Reshaping Official Statistics Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Addis.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Estimating standard error using bootstrap
Inference about the slope parameter and correlation
Linear Mixed Models in JMP Pro
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
OVERVIEW OF LINEAR MODELS
Estimation of Employment for Cities, Towns and Rural Districts
OVERVIEW OF LINEAR MODELS
Fixed, Random and Mixed effects
Longitudinal Data & Mixed Effects Models
SMALL AREA ESTIMATION FOR CITY STATISTICS
Presentation transcript:

1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi S., Solari F. ~ Istat Pratesi M., Salvati N. ~ Università di Pisa Q2008 ~ Rome, July 11 th

2 OUTLINE  Italian Labour Force Survey  Standard small area estimators for LFS  Small area estimators that incorporate spatial information  Model based direct estimator (MBDE)  Semi-parametric models (based on p-splines)  Experimental study  Analysis of results  Final remarks

3 Labour Force Survey description Labour Force Survey (LFS) is a quarterly two stage survey with partial overlap of sampling units according to a rotation scheme of type (2-2-2). In each province the municipalities are classified as Self-Representing Areas (SRAs) and the Non Self-Representing Areas (NSRAs). From each SRAs a sample of households is selected. In NSRAs the sample is based on a stratified two stage sampling design. The municipalities are the primary sampling units (PSUs), while the households are the Secondary Sampling Units (SSUs). For each quarterly sample about 1350 municipalities and 200,000 individuals are involved.

4 ■ Since 2000, ISTAT disseminates yearly LFS estimates of employed and unemployed counts related to the 784 Local Labour Market Areas (LLMAs). ■ LLMAs are unplanned domains obtained as clusters of municipalities cutting across provinces which are the LFS finest planned domains. ■ The direct estimates are unstable due to very small LLMA sample sizes (more than 100 LLMAs have zero sample size). SAE methods are necessary. ■ Until 2003, a design based composite type estimator was adopted. ■ Starting from 2004, after the redesign of LFS sampling strategy, a unit-level EBLUP estimator with spatially autocorrelated random area effects has been introduced. Small area estimation on LFS

5 Standard small area estimators – design based The GREG estimator is based on the standard linear model: Direct and GREG estimator and can be expressed as an adjustment of the direct estimator for differences between the sample and population area means of covariates The direct estimator is given by

6 Unit level Synthetic and EBLUP Standard small area estimators – model based The Synthetic estimator assumes a standard linear mixed model with unit- specific auxiliary variables, random area-specific effects and errors independently normally distributed and is given by The EBLUP estimator assumes the same model but is given by

7 Enhanced small area estimators 1. Unit level EBLUP with spatial correlation of area effects The matrix A depends on the distances among the areas and on an unknown parameter connected to the spatial correlation coefficient among the areas. The EBLUP-S estimator is based on the following unit level linear mixed model:

8 Enhanced small area estimators 2. Model Based Direct Estimator (Chambers & Chandra, 2006) where the weights are such that is the (E)BLUP of The MBD estimator is based on a unit level linear mixed model and is given by under the model (Royall, 1976). Calibrated with respect to the total of x. Reduces bias vs EBLUP Does not allow estimation for non-sampled areas Less efficient than EBLUP

9 In the literature there are many nonparametric regression methods (kernel, local polynomial, wavelets…) BUT difficult to incorporate in a Small area model Methods based on penalized splines (Eilers e Marx, 1996; Ruppert et al., 2003) can be estimated by means of mixed models -> promising candidate for SAE methods Enhanced small area estimators 3. Nonparametric EBLUP (Opsomer et al., 2008) Great Flexibility in definition of model Estimable with existing software using REML Hard to estimate efficiency and test for terms significance (via bootstrap?)

10 LFS empirical study The simulation study on LFS has been carried out to estimate the unemployment rate at LLMA level 500 two-stage LFS sample have been drawn from 2001 census data set. The performances of the methods have been evaluated for the estimation of the unemployment rate in the 127 LLMAs belonging to the geographical area “Center of Italy ”. GREG, Synthetic, EBLUP small area estimators have been applied considering two different sets of auxiliary variables Case A - LFS real covariates = sex by 14 age classes + employment indicator at previous census; Case B – LFS real covariates + geographic coordinates (latitude and longitude of the municipality the sampling unit belongs to).

11 ■ Spatial EBLUP: A spatial correlation in the variance matrix of the random effects has been considered (EBLUP SP) + Case A covariates ■ MBD: Model based direct estimation is performed on sampled LLMAs, while synthetic estimators based on unit level linear mixed model is considered for non sampled LLMAs (Case A covariates) ■ Nonparametric EBLUP: two semiparametric representations based on penalized splines have been applied (fitted as additional random effects): geographical coordinates of the municipality (EBLUP-SPLINE SP): this allows for a finer representation of the spatial component vs EBLUP SP (at municipality level instead of LLMA). age (EBLUP-SPLINE AGE & EBLUP SP-SPLINE AGE) Enhanced Small area estimators

12 Average Absolute RB: Average RRMSE: Maximum Absolute RB: Maximum RRMSE: Evaluation Criteria % Relative Bias: % Relative Root Mean Squared Error:

13 ESTIMATORAARB ARRMSEMARB MRRMSE DIRECT GREG A GREG B SYNTH A SYNTH B EBLUP A EBLUP B EBLUP SP MBD EBLUP-SPLINE SP EBLUP-SPLINE AGE EBLUP SP-SPLINE AGE Results – A: LFS covariates; B = A + geog. coord. mun.

14 Analysis of results  Area level estimators (not shown here) perform a little better in terms of Bias but much worse in terms of MSE.  The results of GREG, SYNTH and EBLUB in case B, when geographical information is considered in the fixed term, display better performances in terms of bias.  In terms of MSE standard estimators in case A outperform standard estimators in case B if the ARRMSE is considered as overall evaluation criteria, while better results are obtained in case B if MRRMSE is considered

15  EBLUP SP can be compared with the unit level EBLUP with geographical information included as covariates and the EBLUP-SPLINE SP. oEBLUP SP show better performances in terms of MSE, while the unit level EBLUP outperform the other estimators in terms of bias. oThe EBLUP-SPLINE SP displays performances in between the other estimators. Analysis of results  EBLUP-SPLINE AGE performs similarly to the unit level EBLUP in Case A oThe use of the age in a nonparametric way is an alternative use of auxiliary information. With respect to case A the model is more parsimonious.  As it was expected MBDE shows better results in term of bias and performs poorly in term of MSE than other SAE methods  The use of autocorrelation structure together with the spline on the variable age doesn’t improve the performances

16 Final remarks  Sensitivity to smoothing parameters’ choice in the splines approach has to be investigated.  The introduction of the sampling weighs should be considered to try to achieve benchmarking with direct estimates produced at regional level  The response in a 0-1 variable: a logistic mixed model is currently being investigated  The model group is a small portion of Italy (center); hence the area specific effects are smaller than they could be if an overall model was considered for all the country: the introduction of geographical information should be analyzed considering a larger model level group