Marie Reijo, Population and Social Statistics

Slides:

Advertisements

Similar presentations

The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.

Advertisements

Challenges in small area estimation of poverty indicators

Further Updating Poverty Mapping in Albania Gianni Betti*, Andrew Dabalen**, Celine Ferrè** and Laura Neri* * University of Siena, Italy, ** The World.

Chapter 10 Simple Regression.

STAT262: Lecture 5 (Ratio estimation)

Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V

Formalizing the Concepts: STRATIFICATION. These objectives are often contradictory in practice Sampling weights need to be used to analyze the data Sampling.

Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.

Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.

Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.

CZECH STATISTICAL OFFICE | Na padesátém 81, Prague 10 | czso.cz1/16 Ondřej Nývlt, Ilona Nováková, Lukáš Savko EFFECTIVE DISTRIBUTION OF SAMPLE OVER.

Poverty Estimation in Small Areas Agne Bikauskaite European Conference on Quality in Official Statistics (Q2014) Vienna, 3-5 June 2014.

Household Economic Resources Discussant Comments UN EXPERT GROUP MEETING 9 September 2008 Garth Bode, Australian Bureau of Statistics.

Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.

Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)

Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.

STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.

Stats Methods at IC Lecture 3: Regression.

Occupational restructuring challenges competencies-project

Multiple Regression.

Chapter 7. Classification and Prediction

Peter Linde, Interviewservice Statistics Denmark

Statistical Quality Control, 7th Edition by Douglas C. Montgomery.

Generalized Linear Models

Graduate School of Business Leadership

SAMPLING (Zikmund, Chapter 12.

Meeting-6 SAMPLING DESIGN

Conducting of EU - SILC in the Republic of Macedonia, 2010

Ratio and regression estimation STAT262, Fall 2017

Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.

The European Statistical Training Programme (ESTP)

Multiple Regression.

Presenting a harmonised city definition and its application

The effects of rotational design and attrition

Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka

Do local social problems need centralized statistics?

Effect of Panel Length and Following Rules on Cross-Sectional Estimates of Income Distribution: Empirical Evidence from FI-SILC Marjo Pyy-Martikainen Workshop.

Small area estimation of violent crime victim rates in the Netherlands

Chapter 8: Weighting adjustment

The Simple Linear Regression Model: Specification and Estimation

Interval Estimation and Hypothesis Testing

Estimation of Employment for Cities, Towns and Rural Districts

Basic Practice of Statistics - 3rd Edition Inference for Regression

SAMPLING (Zikmund, Chapter 12).

The European Statistical Training Programme (ESTP)

Implementation of the Revised "Degree of Urbanisation" Classification

Simple Linear Regression

Fixed, Random and Mixed effects

Chapter: 9: Propensity scores

Parametric Methods Berlin Chen, 2005 References:

Sampling Chapter 6.

New Techniques and Technologies for Statistics 2017 Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.

Sampling and estimation

The European Statistical Training Programme (ESTP)

Small area estimation with calibration methods

Drawing and applying poverty maps The Hungarian case

Sampling Techniques Assist. Prof. Maha A. AL-Nuaimi Ph.D/ Comm. Med

Chaug‐Ing Hsu & Yuh‐Horng Wen (1998)

Introduction to Machine learning

SMALL AREA ESTIMATION FOR CITY STATISTICS

UA Revision An overview.

Chapter 5: The analysis of nonresponse

Small area estimation for the Dutch Investment Survey

MGS 3100 Business Analysis Regression Feb 18, 2016

Workshop on best practices for EU-SILC revision, −

Stratification, calibration and reducing attrition rate in the Dutch EU-SILC Judit Arends.

Presentation transcript:

Improving accuracy of poverty and social exclusion estimates for cities and functional urban areas Marie Reijo, Population and Social Statistics (marie.reijo@stat.fi} Ari Veijanen, Standards and Methods (ari.veijanen@stat.fi)

Background, Urban Audit statistics Urban Audit, ESS statistics on cities, towns and suburbs Spatial units: City, local administrative unit (LAU), at least 50 000 habitants Greater city, stretches beyond LAU Functional urban area (FUA), city and its commuting zone Data on several topics, e.g. demography, social aspects, economic aspects, education, environment Data from many sources: Household income, AROPE and its component AROP collected from NSIs, EU-SILC as source 2018

Objectives, Finnish EU-SILC To produce standard techniques and procedures for annual estimation for small areas from FI-SILC To provide more accurate estimates compared with the national SILC estimates To provide coherent estimates with the EU-SILC regional estimates ((NUTS2, NUTS3) * Degree of urbanisation) 2018

Background, Finnish EU-SILC Integrated with national IDS Four-year rotating panel design since 2010 The Finnish EU-SILC sampling and estimation design: Two-phase stratified sampling: 1. systematic sampling by location, 2. SWOR by non-proportional allocation by strata (socio-economic status) => Fixed strata, non-fixed domains (cities, urban areas) Using auxiliary information for calibration from TSID: e.g. direct NUTS3 population number with capital area as separated (TSID: Total statistics on income distribution) Small sample sizes by domains (cities, urban areas) => Not accurate estimates for cities and FUAs Rather accurate estimates for NUTS2 and NUTS3 2018

Cities(LAUs) and FUAs by municipality districts 2018

Sample sizes and SILC AROP and AROPE estimates for cities (LAUs) City AROPE AROP n % CV 95%CL_L 95%CL_U FI001C2 2 365 12.9 1.644 10.4010 15.4269 8.1 1.066 6.0445 10.0922 FI002C1 921 19.2 6.768 14.1359 24.3347 15.6 6.432 10.6769 20.6194 FI003C1 740 25.8 9.429 19.7637 31.8017 16.5 6.429 11.5657 21.5062 FI004C3 723 18.7 7.469 13.3162 24.0308 13.9 5.540 9.3151 18.5427 FI005C1 1 195 11.2 4.523 7.0082 15.3459 5.9 2.402 2.8172 8.8930 FI006C1 732 11.5 5.025 7.1266 15.9147 3.7 1.748 1.1258 6.3094 FI007C2 448 11.6 7.907 6.0948 17.1187 9.7 7.198 4.4661 14.9844 FI008C3 510 22.1 15.575 14.3555 29.8273 18.4 14.400 10.9923 25.8693 FI009C1 536 25.0 18.698 16.4876 33.4399 15.745 10.9314 26.4875 Others 16 648 15.3 0.387 14.0607 16.4987 11.8 0.344 10.6009 12.9017 Total 24 818 15.7 0.218 14.7438 16.5760 0.190 10.6037 12.3122 2018

Sample sizes and SILC AROP and AROPE estimates for urban areas (FUA) % CV 95%CL_L 95%CL_U FI001L3 5 868 11.6 0.673 10.0399 13.2560 6.8 0.420 5.5229 8.0641 FI002L3 1 868 16.4 3.173 12.9028 19.8861 12.7 2.820 9.4269 16.0103 FI003L4 1 578 19.9 3.987 15.9732 23.8010 13.5 2.811 10.1763 16.7497 FI004L4 1 015 15.6 4.986 11.2536 20.0073 12.0 3.856 8.1771 15.8758 FI007L2 760 12.2 4.922 7.8193 16.5171 9.9 4.324 5.7996 13.9517 FI008L2 782 17.0 8.485 11.2802 22.6998 14.3 7.710 8.8295 19.7153 FI009L2 792 24.6 13.523 17.3869 31.8038 18.9 11.452 12.2167 25.4835 Others 12 155 16.7 0.614 15.1653 18.2370 12.9 0.549 11.4520 14.3567 Total 24 818 15.7 0.218 11.5 0.190 10.6037 12.3122 FI001K2 4 338 0.924 14.7438 16.5760 6.7 0.530 5.2693 8.1231 2018

SILC and TSID estimates for cities (LAUs), persons in total 2018

SILC and TSID estimates for FUAs, persons aged 15-24 estimates 2018

SAE estimation, estimator selection Generalized regression estimators GREG, design based model assisted estimation: Characteristics: small bias; small variance, if not small domains; MSE close to variance Exploitation of domain population data Unplanned and non-fixed domains (cities, urban areas): mixed models with fixed and random effects (domain) Especially MLGREG-D as method (generalised linear mixed model (GLMM) within generalised linear regression model framework (binary distribution), ML estimation method in model fitting. Approximate variance estimation method (≈ Poisson sampling due to unplanned domain/strata, approximate variance estimates) SAE -method characteristics, see e.g. Lehtonen and Pahkinen 2004 2018

SAE estimation, estimation phases EU-SILC sample k ∈ s: information on PIN, household, strata, weights, measured study variables s(y) (AROPE, AROP, JOBLESS, MATDEP) Total Statistics on Income Data, TSID k ∈ U: information on PIN, auxiliary variables Merging EU-SILC and TSID data at unit level Modelling y by sample units k ∈ s by assisting model: specifying final model, fitting and diagnosing, getting parameters β Calculating predictions ŷk for all units, TSID k ∈ U, by using β for s(y) Calculating residuals ê k = Y k – Ŷ k for EU-SILC k ∈ s Calculating domain GREG estimates for output variables Evaluation of estimates: incl. variance estimates and comparisons with initial estimates 2018

Modelling s(y), final model Auxiliary variables 𝑧k’ = (zIk,….,zJk)’ are as follows: sum of earned income (amount/10000) sum of unemployment benefits (amount/10000) des1,…, des7 = number of persons in decile groups age10-14, …, age80-84, age85+ = number of persons in ages groups sum of AROP persons number of males number of persons with earned income number of persons with unemployed benefits sum of equivalence scale units (of household-dwellings) d𝑜𝑚𝑎𝑖𝑛 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑑 for cities and FUAs are used as random terms. 2018

Modelling s(y), final model Generalized linear mixed model, especially binary logistic model (logit model) fitted separately for all s(y): k ∈ s: 𝐸𝑚( 𝛾𝑖𝑘 𝑢𝑑 )= 𝑒𝑥𝑝⁡(𝑧𝑘′(𝛽𝑖+𝑢𝑖𝑑) 1+𝑒𝑥𝑝⁡(𝑧𝑘′(𝛽𝑟+𝑢𝑟𝑑) 2018

Construction GREG estimators Construction GREG estimators (predictions ŷ𝑘 and residuals ek = yk− ŷ𝑘 are incorporated into the GREG estimators) known domain sizes : 𝑡 𝑑𝐺𝑅𝐸𝐺= 𝑘∈𝑈𝑑 ŷ𝑘 + 𝑘∈𝑠𝑑 𝜔𝑘(𝑦𝑘−ŷ𝑘) where 𝜔𝑘= 1 𝜋𝑘 , 𝑠𝑑=𝑠 ⌒ 𝑈𝑑 𝑎𝑛𝑑 𝑑=1,…,𝐷 2018

Variance for GREG 𝑉( 𝑡 𝑑i, HT) = 𝑘∈𝑠𝑑 𝑙∈𝑠𝑑 (𝑎𝑘𝑎𝑙−αkl) 𝑦𝑘y𝑙 An unbiased estimator for the design variance of the HT estimator of a domain total is: 𝑉( 𝑡 𝑑i, HT) = 𝑘∈𝑠𝑑 𝑙∈𝑠𝑑 (𝑎𝑘𝑎𝑙−αkl) 𝑦𝑘y𝑙 Where αk and αl are design weights and 𝛼 𝑘𝑙 is the inverse of the second –order inclusion probability 𝜋𝑖 𝑘𝑙 . Usually the equation is impractical. But under the assumption of Poisson sampling, the probabilities 𝜋𝑖 𝑘𝑙 are 𝜋𝑖 𝑘 * 𝜋𝑖 𝑙 for k not l. Then the double sum is easy to calculate. To apply the equation in the case of MLGREG estimation model residuals have been substituted for the original y-values. 2018

AROPE, GREG estimates for cities (LAUs) 2018

AROP, GREG estimates for cities (LAUs) 2018

JOBLESS, GREG estimates for cities (LAUs) 2018

MATDEP, GREG estimates for cities (LAUs) 2018

AROPE, GREG estimates for FUAs 2018

AROP, GREG estimates for FUAs 2018

JOBLESS, GREG estimates for FUAs 2018

MATDEP, GREG estimates for FUAs 2018

AROPE and AROP GREG estimates for cities SILC % CV ≈ Bias ≈ Error FI001C2 14.1 4.101 12.9 1.644 1.2 2.844 10 4.182 8.1 1.066 -3.0 4.066 FI002C1 16.3 6.779 19.2 6.768 -2.9 9.668 15.1 5.344 15.6 6.432 -0.3 6.732 FI003C1 21.7 7.323 25.8 9.429 -4.1 13.529 12.7 9.713 16.5 6.429 -0.9 7.329 FI004C3 7.713 18.7 7.469 -2.2 9.669 13.6 7.643 13.9 5.540 7.740 FI005C1 14.9 5.451 11.2 4.523 3.7 8.223 6.8 7.008 5.9 2.402 5.402 FI006C1 15.2 4.964 11.5 5.025 8.725 5.2 10.462 1.748 -3.2 4.948 FI007C2 11.9 7.908 11.6 7.907 0.3 8.207 7.729 9.7 7.198 -0.7 7.898 FI008C3 21.3 6.217 22.1 15.575 -0.8 16.375 18.6 5.192 18.4 14.400 8.2 22.600 FI009C1 25.5 7.141 25 18.698 0.5 19.198 16.1 7.457 15.745 8.6 24.345 Others 15.3 0.387 11.8 0.344 Total 15.7 0.218 0.190 2018

JOBLESS and MATDEP GREG estimates for cities SILC % CV ≈ Bias ≈ Error FI001C2 9.4 5.537 7.3 1.033 -2.1 3.133 3.9 10.787 2.2 0.341 -1.7 2.041 FI002C1 10.9 9.991 11.3 5.251 0.4 5.651 2.7 31.019 2.6 1.055 -0.1 1.155 FI003C1 12.9 10.121 14.7 6.533 1.8 8.333 5.6 19.264 4.4 3.148 -1.2 4.348 FI004C3 10.206 12.6 5.885 1.7 7.585 1.4 51.587 1.9 1.083 0.5 1.583 FI005C1 8.0 10.071 6.5 2.779 -1.5 4.279 39.876 0.579 0.3 0.879 FI006C1 10.6 9.548 9.7 4.537 -0.9 5.437 32.129 0.885 -0.5 1.385 FI007C2 7.4 12.693 4.9 3.196 -2.5 5.696 -139.272 0.065 0.565 FI008C3 10.8 15.612 11.2 9.486 9.886 0.8 120.622 1.060 0.6 1.660 FI009C1 13.6 14.979 15.1 16.959 1.5 18.459 19.486 5.3 6.260 -2.7 8.960 Others 6.4 0.171 0.045 Total 7.6 0.120 11.5 0.190 2018

AROPE and AROP GREG estimates for FUAs SILC % CV ≈ Bias ≈ Error FI001L3 13.0 2.110 11.6 0.673 -1.4 2.073 7.2 2.663 6.8 0.420 -0.4 0.820 FI002L3 14.3 5.008 16.4 3.173 2.1 5.273 10.7 5.068 12.7 2.820 2.0 4.820 FI003L4 15.0 6.045 19.9 3.987 4.9 8.887 9.3 7.680 13.5 2.811 4.2 7.011 FI004L4 13.9 7.762 15.6 4.986 1.7 6.686 12.9 6.787 12.0 3.856 -0.9 4.756 FI007L2 16.6 4.798 12.2 4.922 -4.4 9.322 13.3 4.669 9.9 4.324 -3.4 7.724 FI008L2 6.937 17.0 8.485 10.485 15.1 4.854 7.710 -0.8 8.510 FI009L2 30.2 4.717 24.6 13.523 -5.6 19.123 18.5 5.091 18.9 11.452 0.4 11.852 FI001K2 .. 0.924 6.7 0.530 2018

JOBLESS and MATDEP GREG estimates for FUAs SILC % CV ≈ Bias ≈ Error FI001L3 7.9 3.320 6.8 0.448 -1.1 1.548 2.6 8.578 2.0 0.139 -0.6 0.739 FI002L3 8.8 7.930 9.2 2.154 0.4 2.554 2.3 21.503 2.2 0.438 -0.1 0.538 FI003L4 8.874 10.5 2.567 1.7 4.267 3.3 20.040 3.1 1.091 -0.2 1.291 FI004L4 8.7 10.147 9.3 3.337 0.6 3.937 1.0 54.062 1.4 0.587 0.987 FI007L2 7.2 10.731 4.6 1.784 -2.6 4.384 -171.466 0.2 0.026 0.3 0.326 FI008L2 14.328 8.2 4.606 5.206 1.3 54.799 0.663 0.1 0.763 FI009L2 12.0 13.211 13.8 11.712 1.8 13.512 7.3 16.907 5.0 4.353 -2.3 6.653 FI001K2 .. 7.5 0.627 0.046 2018

AROPE GREG estimates for LAUs and SILC estimates for densely-populated areas (DB100=1) by NUTS2, persons 2018

AROPE GREG estimates for LAUs and SILC estimates for densely-populated areas (DB100=1) by NUTS3, persons 2018

Conclusions Smaller bias, but increased variance in some domains (accuracy close to variance), overall accuracy improves by GREG -estimators AROPE accuracy increases clearly in domains with smallest samples sizes, e.g. FI007C2, FI008C2, FI009C2, FI007L2, FI008L2, FI009L2, among them especially FUAs. For other domains with higher sample sizes AROPE accuracy improvement is not so significant due to variance increase, which however is moderate, especially for FI001C2 AROP variance increase is marked for some cities MATDEP variances are high Coherence with the NUTS2 (and NUTS3) estimates for LAUs from the EU-SILC data could be better (e.g. FI1B). 2018

Conclusions Rather stable variables and processes for updating estimates for LAUs and FUAs annually, assisting model for variable selection should be tested by annual SILC data Further development, procedures are not complete Techniques, models and parameters tests are further needed for: - other regions, e.g. rural areas and thinly-populated areas - inclusion of geographic areas (NUTS2) for estimation - other SILC variables, subjective type in particular - estimates by classification variables Data users’ needs 2018

Many thanks for your attendance ! 2018