Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marie Reijo, Population and Social Statistics

Similar presentations


Presentation on theme: "Marie Reijo, Population and Social Statistics"— Presentation transcript:

1 Improving accuracy of poverty and social exclusion estimates for cities and functional urban areas
Marie Reijo, Population and Social Statistics Ari Veijanen, Standards and Methods

2 Background, Urban Audit statistics
Urban Audit, ESS statistics on cities, towns and suburbs Spatial units: City, local administrative unit (LAU), at least habitants Greater city, stretches beyond LAU Functional urban area (FUA), city and its commuting zone Data on several topics, e.g. demography, social aspects, economic aspects, education, environment Data from many sources: Household income, AROPE and its component AROP collected from NSIs, EU-SILC as source 2018

3 Objectives, Finnish EU-SILC
To produce standard techniques and procedures for annual estimation for small areas from FI-SILC To provide more accurate estimates compared with the national SILC estimates To provide coherent estimates with the EU-SILC regional estimates ((NUTS2, NUTS3) * Degree of urbanisation) 2018

4 Background, Finnish EU-SILC
Integrated with national IDS Four-year rotating panel design since 2010 The Finnish EU-SILC sampling and estimation design: Two-phase stratified sampling: 1. systematic sampling by location, 2. SWOR by non-proportional allocation by strata (socio-economic status) => Fixed strata, non-fixed domains (cities, urban areas) Using auxiliary information for calibration from TSID: e.g. direct NUTS3 population number with capital area as separated (TSID: Total statistics on income distribution) Small sample sizes by domains (cities, urban areas) => Not accurate estimates for cities and FUAs Rather accurate estimates for NUTS2 and NUTS3 2018

5 Cities(LAUs) and FUAs by municipality districts
2018

6 Sample sizes and SILC AROP and AROPE estimates for cities (LAUs)
City AROPE AROP n % CV 95%CL_L 95%CL_U FI001C2 2 365 12.9 1.644 8.1 1.066 6.0445 FI002C1 921 19.2 6.768 15.6 6.432 FI003C1 740 25.8 9.429 16.5 6.429 FI004C3 723 18.7 7.469 13.9 5.540 9.3151 FI005C1 1 195 11.2 4.523 7.0082 5.9 2.402 2.8172 8.8930 FI006C1 732 11.5 5.025 7.1266 3.7 1.748 1.1258 6.3094 FI007C2 448 11.6 7.907 6.0948 9.7 7.198 4.4661 FI008C3 510 22.1 15.575 18.4 14.400 FI009C1 536 25.0 18.698 15.745 Others 16 648 15.3 0.387 11.8 0.344 Total 24 818 15.7 0.218 0.190 2018

7 Sample sizes and SILC AROP and AROPE estimates for urban areas (FUA)
% CV 95%CL_L 95%CL_U FI001L3 5 868 11.6 0.673 6.8 0.420 5.5229 8.0641 FI002L3 1 868 16.4 3.173 12.7 2.820 9.4269 FI003L4 1 578 19.9 3.987 13.5 2.811 FI004L4 1 015 15.6 4.986 12.0 3.856 8.1771 FI007L2 760 12.2 4.922 7.8193 9.9 4.324 5.7996 FI008L2 782 17.0 8.485 14.3 7.710 8.8295 FI009L2 792 24.6 13.523 18.9 11.452 Others 12 155 16.7 0.614 12.9 0.549 Total 24 818 15.7 0.218 11.5 0.190 FI001K2 4 338 0.924 6.7 0.530 5.2693 8.1231 2018

8 SILC and TSID estimates for cities (LAUs), persons in total
2018

9 SILC and TSID estimates for FUAs, persons aged 15-24 estimates
2018

10 SAE estimation, estimator selection
Generalized regression estimators GREG, design based model assisted estimation: Characteristics: small bias; small variance, if not small domains; MSE close to variance Exploitation of domain population data Unplanned and non-fixed domains (cities, urban areas): mixed models with fixed and random effects (domain) Especially MLGREG-D as method (generalised linear mixed model (GLMM) within generalised linear regression model framework (binary distribution), ML estimation method in model fitting. Approximate variance estimation method (≈ Poisson sampling due to unplanned domain/strata, approximate variance estimates) SAE -method characteristics, see e.g. Lehtonen and Pahkinen 2004 2018

11 SAE estimation, estimation phases
EU-SILC sample k ∈ s: information on PIN, household, strata, weights, measured study variables s(y) (AROPE, AROP, JOBLESS, MATDEP) Total Statistics on Income Data, TSID k ∈ U: information on PIN, auxiliary variables Merging EU-SILC and TSID data at unit level Modelling y by sample units k ∈ s by assisting model: specifying final model, fitting and diagnosing, getting parameters β Calculating predictions ŷk for all units, TSID k ∈ U, by using β for s(y) Calculating residuals ê k = Y k – Ŷ k for EU-SILC k ∈ s Calculating domain GREG estimates for output variables Evaluation of estimates: incl. variance estimates and comparisons with initial estimates 2018

12 Modelling s(y), final model
Auxiliary variables 𝑧k’ = (zIk,….,zJk)’ are as follows: sum of earned income (amount/10000) sum of unemployment benefits (amount/10000) des1,…, des7 = number of persons in decile groups age10-14, …, age80-84, age85+ = number of persons in ages groups sum of AROP persons number of males number of persons with earned income number of persons with unemployed benefits sum of equivalence scale units (of household-dwellings) d𝑜𝑚𝑎𝑖𝑛 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑑 for cities and FUAs are used as random terms. 2018

13 Modelling s(y), final model
Generalized linear mixed model, especially binary logistic model (logit model) fitted separately for all s(y): k ∈ s: 𝐸𝑚( 𝛾𝑖𝑘 𝑢𝑑 )= 𝑒𝑥𝑝⁡(𝑧𝑘′(𝛽𝑖+𝑢𝑖𝑑) 1+𝑒𝑥𝑝⁡(𝑧𝑘′(𝛽𝑟+𝑢𝑟𝑑) 2018

14 Construction GREG estimators
Construction GREG estimators (predictions ŷ𝑘 and residuals ek = yk− ŷ𝑘 are incorporated into the GREG estimators) known domain sizes : 𝑡 𝑑𝐺𝑅𝐸𝐺= 𝑘∈𝑈𝑑 ŷ𝑘 + 𝑘∈𝑠𝑑 𝜔𝑘(𝑦𝑘−ŷ𝑘) where 𝜔𝑘= 1 𝜋𝑘 , 𝑠𝑑=𝑠 ⌒ 𝑈𝑑 𝑎𝑛𝑑 𝑑=1,…,𝐷 2018

15 Variance for GREG 𝑉( 𝑡 𝑑i, HT) = 𝑘∈𝑠𝑑 𝑙∈𝑠𝑑 (𝑎𝑘𝑎𝑙−αkl) 𝑦𝑘y𝑙
An unbiased estimator for the design variance of the HT estimator of a domain total is: 𝑉( 𝑡 𝑑i, HT) = 𝑘∈𝑠𝑑 𝑙∈𝑠𝑑 (𝑎𝑘𝑎𝑙−αkl) 𝑦𝑘y𝑙 Where αk and αl are design weights and 𝛼 𝑘𝑙 is the inverse of the second –order inclusion probability 𝜋𝑖 𝑘𝑙 . Usually the equation is impractical. But under the assumption of Poisson sampling, the probabilities 𝜋𝑖 𝑘𝑙 are 𝜋𝑖 𝑘 * 𝜋𝑖 𝑙 for k not l. Then the double sum is easy to calculate. To apply the equation in the case of MLGREG estimation model residuals have been substituted for the original y-values. 2018

16 AROPE, GREG estimates for cities (LAUs)
2018

17 AROP, GREG estimates for cities (LAUs)
2018

18 JOBLESS, GREG estimates for cities (LAUs)
2018

19 MATDEP, GREG estimates for cities (LAUs)
2018

20 AROPE, GREG estimates for FUAs
2018

21 AROP, GREG estimates for FUAs
2018

22 JOBLESS, GREG estimates for FUAs
2018

23 MATDEP, GREG estimates for FUAs
2018

24 AROPE and AROP GREG estimates for cities
SILC % CV ≈ Bias ≈ Error FI001C2 14.1 4.101 12.9 1.644 1.2 2.844 10 4.182 8.1 1.066 -3.0 4.066 FI002C1 16.3 6.779 19.2 6.768 -2.9 9.668 15.1 5.344 15.6 6.432 -0.3 6.732 FI003C1 21.7 7.323 25.8 9.429 -4.1 13.529 12.7 9.713 16.5 6.429 -0.9 7.329 FI004C3 7.713 18.7 7.469 -2.2 9.669 13.6 7.643 13.9 5.540 7.740 FI005C1 14.9 5.451 11.2 4.523 3.7 8.223 6.8 7.008 5.9 2.402 5.402 FI006C1 15.2 4.964 11.5 5.025 8.725 5.2 10.462 1.748 -3.2 4.948 FI007C2 11.9 7.908 11.6 7.907 0.3 8.207 7.729 9.7 7.198 -0.7 7.898 FI008C3 21.3 6.217 22.1 15.575 -0.8 16.375 18.6 5.192 18.4 14.400 8.2 22.600 FI009C1 25.5 7.141 25 18.698 0.5 19.198 16.1 7.457 15.745 8.6 24.345 Others 15.3 0.387 11.8 0.344 Total 15.7 0.218 0.190 2018

25 JOBLESS and MATDEP GREG estimates for cities
SILC % CV ≈ Bias ≈ Error FI001C2 9.4 5.537 7.3 1.033 -2.1 3.133 3.9 10.787 2.2 0.341 -1.7 2.041 FI002C1 10.9 9.991 11.3 5.251 0.4 5.651 2.7 31.019 2.6 1.055 -0.1 1.155 FI003C1 12.9 10.121 14.7 6.533 1.8 8.333 5.6 19.264 4.4 3.148 -1.2 4.348 FI004C3 10.206 12.6 5.885 1.7 7.585 1.4 51.587 1.9 1.083 0.5 1.583 FI005C1 8.0 10.071 6.5 2.779 -1.5 4.279 39.876 0.579 0.3 0.879 FI006C1 10.6 9.548 9.7 4.537 -0.9 5.437 32.129 0.885 -0.5 1.385 FI007C2 7.4 12.693 4.9 3.196 -2.5 5.696 0.065 0.565 FI008C3 10.8 15.612 11.2 9.486 9.886 0.8 1.060 0.6 1.660 FI009C1 13.6 14.979 15.1 16.959 1.5 18.459 19.486 5.3 6.260 -2.7 8.960 Others 6.4 0.171 0.045 Total 7.6 0.120 11.5 0.190 2018

26 AROPE and AROP GREG estimates for FUAs
SILC % CV ≈ Bias ≈ Error FI001L3 13.0 2.110 11.6 0.673 -1.4 2.073 7.2 2.663 6.8 0.420 -0.4 0.820 FI002L3 14.3 5.008 16.4 3.173 2.1 5.273 10.7 5.068 12.7 2.820 2.0 4.820 FI003L4 15.0 6.045 19.9 3.987 4.9 8.887 9.3 7.680 13.5 2.811 4.2 7.011 FI004L4 13.9 7.762 15.6 4.986 1.7 6.686 12.9 6.787 12.0 3.856 -0.9 4.756 FI007L2 16.6 4.798 12.2 4.922 -4.4 9.322 13.3 4.669 9.9 4.324 -3.4 7.724 FI008L2 6.937 17.0 8.485 10.485 15.1 4.854 7.710 -0.8 8.510 FI009L2 30.2 4.717 24.6 13.523 -5.6 19.123 18.5 5.091 18.9 11.452 0.4 11.852 FI001K2 .. 0.924 6.7 0.530 2018

27 JOBLESS and MATDEP GREG estimates for FUAs
SILC % CV ≈ Bias ≈ Error FI001L3 7.9 3.320 6.8 0.448 -1.1 1.548 2.6 8.578 2.0 0.139 -0.6 0.739 FI002L3 8.8 7.930 9.2 2.154 0.4 2.554 2.3 21.503 2.2 0.438 -0.1 0.538 FI003L4 8.874 10.5 2.567 1.7 4.267 3.3 20.040 3.1 1.091 -0.2 1.291 FI004L4 8.7 10.147 9.3 3.337 0.6 3.937 1.0 54.062 1.4 0.587 0.987 FI007L2 7.2 10.731 4.6 1.784 -2.6 4.384 0.2 0.026 0.3 0.326 FI008L2 14.328 8.2 4.606 5.206 1.3 54.799 0.663 0.1 0.763 FI009L2 12.0 13.211 13.8 11.712 1.8 13.512 7.3 16.907 5.0 4.353 -2.3 6.653 FI001K2 .. 7.5 0.627 0.046 2018

28 AROPE GREG estimates for LAUs and SILC estimates for densely-populated areas (DB100=1) by NUTS2, persons 2018

29 AROPE GREG estimates for LAUs and SILC estimates for densely-populated areas (DB100=1) by NUTS3, persons 2018

30 Conclusions Smaller bias, but increased variance in some domains (accuracy close to variance), overall accuracy improves by GREG -estimators AROPE accuracy increases clearly in domains with smallest samples sizes, e.g. FI007C2, FI008C2, FI009C2, FI007L2, FI008L2, FI009L2, among them especially FUAs. For other domains with higher sample sizes AROPE accuracy improvement is not so significant due to variance increase, which however is moderate, especially for FI001C2 AROP variance increase is marked for some cities MATDEP variances are high Coherence with the NUTS2 (and NUTS3) estimates for LAUs from the EU-SILC data could be better (e.g. FI1B). 2018

31 Conclusions Rather stable variables and processes for updating estimates for LAUs and FUAs annually, assisting model for variable selection should be tested by annual SILC data Further development, procedures are not complete Techniques, models and parameters tests are further needed for: - other regions, e.g. rural areas and thinly-populated areas - inclusion of geographic areas (NUTS2) for estimation - other SILC variables, subjective type in particular - estimates by classification variables Data users’ needs 2018

32 Many thanks for your attendance !
2018


Download ppt "Marie Reijo, Population and Social Statistics"

Similar presentations


Ads by Google