Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,

Similar presentations


Presentation on theme: "Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,"— Presentation transcript:

1 Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio, Marco Fortini, Stefano Falorsi ISTAT Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

2 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Outline Purpose: to plan a sampling strategy taking into account for municipal undercoverage of next Italian Census round Sketch of 2011 Italian Census Sources of data useful in planning Post Enumeration Survey (PES) Sampling strategies considered for comparison Construction of a fictitious, but plausible, population for simulations of sampling universe Results of simulation study

3 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Key innovations of the 2011 Italian census From traditional enumeration method… Search for households and people on the field … to a register-supported census Municipal population registers so to mail out questionnaires to people Data collection method based on web, mail back and municipal data collection centres Reduction of the number of enumerators Data collection from late respondents Coverage evaluation activities

4 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Coverage evaluation program Requested by Eurostat quality report, it is anyhow crucial in this context of extensive process and methods innovations Over-coverage: people no more living in the municipality who are still enlisted into the population registers Checked by interviewers during contact of late-respondents Under-coverage: people living in the municipality being not yet enlisted in population registers  Supplemental lists of people  Extensive search on the field  Statistical estimation based on capture-recapture techniques

5 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Overview of Italian census undercount Gross undercoverage of population registers Estimated by Fortini and Gallo (2009) in about 400,000 people (up to 560,000) through administrative data and mixture model analysis to account for underreporting in the source Gross undercoverage of 2001 Census (enumeration based) 2001 Post Enumeration Survey estimates that about 800,000 people were missed Both estimates are based on strong assumptions However, this evidence makes reasonable the use of municipal population registers as the main source for households enumeration

6 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Capture-Recapture Approach Correction for population register undercount through a second source based on independent field enumeration x 1+ people enlisted into municipal register estimate of municipal population based on field enumeration survey in a sample or enumeration areas (EAs) estimate of people that would have been counted by both the sources if field enumeration had carried out on the whole municipal area Petersen estimator of the hidden population is (Wolter, 1986) Main goal: municipality estimates of population counts

7 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Sampling design for the 2011 Post-Enumeration Survey About 1300 municipalities and 1,200,000 people will be sampled Two alternative two-stage sampling design with municipalities and enumeration areas as primary and secondary sampling units Design A - region by class of population size (less than 5000, 5000-20000, 20000-50000, more than 50000) Design B - aggregation of provinces inside region by the 4 classes of population size (help in reducing bias of SAE) Stratification and selection of municipalities according to their population size is considered for both designs It is necessary to sample among municipalities in order to control costs

8 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Estimators Direct estimates of census counts are available only at planned domain level small area estimation methods are needed at least for municipalities not included in the sample Possible available predictors at area level modelling Population counts coming from register Demographic indicators (e.g. dependency ratios) Socio economic indicators In what follows we consider  Direct estimation at regional level (Planned domains)  Synthetic estimator at municipality level Assumption of invariance among municipal under-coverage rates at planned domain level

9 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Direct Estimators Simple Calibrated Expansion estimators Inverse of the selection probability Final weight

10 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Synthetic Estimator Based on invariance assumption of under-coverage rates for municipalities belonging to the same planned domain For each system of weights, the coverage ratio is computed at domain level From the ratios, simple and calibrated synthetic estimators are obtained for municipalities Simple Calibrated

11 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Empirical study It is based on simulation study Two pseudo-populations of 335,643 Italian EAs were considered Sources of information 2001 Italian Post Enumeration Census Administrative data on changes of residence occurred after 2001 census (from November 2002 to December 2005) For every non empty EAs belonging to the 8101 Italian municipalities, the following counts were generated  Observed count from population register (X 1+ )  True (N) population count  Field enumeration count (X +1 )  Count of people enumerated by both the sources (X 11 )

12 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Assemble the Pseudo-population For each Municipality Munic. Id EA Id True NP. RegSurveyBoth 10151535 10152 37 10153 53 10154 40 10155 4 6 64 10157 13 Tot.746 EA Population register counts come from 2001 Census counts

13 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Assign True population counts to municipality For each Municipality Munic.EATrueNP.RegSurveyBoth 10151535 10152 37 10153 53 10154 40 10155 4 6 64 10157 13 Tot.755746 EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 1/r

14 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Assign True population counts to EAs For each Municipality Munic.EATrueNP.RegSurveyBoth 10151538535 10152 37 10153 58 53 10154 40 10155 4 4 6 65 64 10157 13 Tot.755746 EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 1/r True N is allocated between EAs by hierarchical Dirichlet/Multinomial model with parameter vector p given by distribution of P. Reg population among EAs

15 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Assign survey counts to EAs Each Municipality Munic.EATrue NP.RegSurveyBoth 10151538535 10152 37 10153 58 53 10154 40 10155 4 4 6 65 64 10157 13 Tot.755746 EA Survey counts – True N multiplied by coverage rate ‘rs’ ‘rs’ from beta - binomial distribution “alpha” and “beta” such that mean and variance of 2001 PES coverage rates is reproduced (5 macro regions by 4 classes of munic. pop. size) rs 536

16 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Assign survey counts to municipality Each Municipality Munic.EATrue NP. RegSurveyBoth 10151538535 536 10152 37 10153 58 53 58 10154 40 39 10155 4 4 4 6 65 64 65 10157 13 Tot.755746752 Municipal count is obtained summing up value of the EAs

17 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Assign number of people enumerated by both the lists Each Municipality Munic.EATrueNP.RegSurveyBoth 10151538535 536533 10152 37 10153 58 53 58 10154 40 39 10155 4 4 4 6 65 64 65 10157 13 Tot.755746752 People enumerated by both lists: Hypergeometric distribution at EA level with parameters True N, P.Reg, Survey

18 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Assign number of people enumerated by both the lists Each Municipality Munic.EATrueNP.RegSurveyBoth 10151538535 536533 10152 37 10153 58 53 58 53 10154 40 39 10155 4 4 4 4 6 65 64 65 64 10157 13 Tot.755746752743 Municipal count is obtained summing up EAs

19 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 St. dev. of coverage rates among municipalities About 400,000 and 900,000 missing people were generated for pseudo- Register and pseudo-Survey respectively Population register variability is larger for POP2 than for POP1 Survey variability is larger than its respective Population register variability (because of its lower coverage rate) Survey variability is not so close to PES variability, even though their order of magnitude is the same

20 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Variability of coverage rates among EAs – Population registers Pseudo-coverage of the register vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too many points here Simulated EAs show too many large units with very small coverage rate, which seems not realistic in our context

21 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Variability of coverage rates among EAs – Control survey Pseudo-coverage of survey vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too few points here Simulated EAs show too few small units with small coverage rate in this case

22 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Simulation of the sampling space Four tests: designs A and B for populations 1 and 2 Each simulation is based on 500 sample replications Sampling of municipalities with probability proportional to their population size Simple random sampling of EAs within municipalities Simple and weighted direct estimation at domain level Synthetic estimation at municipality level Population counts coming from population registers are used here as benchmark for comparisons downwards biased but available at zero cost of achievement

23 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Results – Bias of registers vs. synthetic estimates Main results  Direct estimates have good performance in terms of bias and MSE at domain level  Calibrated estimates overcome the simple ones in terms of MSE, both for direct and synthetic estimators  The less-aggregate design B does not significantly improve the estimates, so only design A is shown here In terms of bias, synthetic estimator improves registers. Improvements decrease for larger municipalities. This results are more evident for population 1 than for population 2 In terms of maximum bias the improvement is not so noticeable

24 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Bias of synthetic estimator vs register counts Population 1 - design A by class of municipality size Less than 5,000 5,000 – 19,000 Bisectors delimit the zone where synthetic estimates are better than simple register counts in term of bias 20,000 – 49,000 50,000 and more  Synthetic estimator almost always improve registers in terms of bias  However, the improvement does not seem so prominent

25 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Bias of synthetic estimator vs register count Population 2 - design A by class of municipality size  Same conclusion for POP2 with worst results for larger municipalities

26 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Results – MSE of synthetic and direct estimators Direct estimator can be applied to self-representative municipalities It is reported in the table for the two classes of larger municipalities On average, synthetic estimator overcome the direct, which seems not useful even in sampled municipalies MSE of synthetic estimates is much larger than Bias (in Table 2) Since in real cases this does not happen, this could be an evidence of a too high variability of pseudo-populations at level of EAs

27 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010 Difference between synthetic and direct estimator in terms of MSE – municipalities larger than 50,000 inh. The most part of municipalities larger than 50,000 inh. show better Synthetic MSE (negative values) Direct and Synthetic estimates are equivalent for larger municipalities (>250,000 inh.), but only for in POP1

28 Sampling strategy for the dual-system correction of the under- coverage in the Register Supported 2011 Italian Population Census Concluding Remarks  Sampling strategy of next Italian Census PES is evaluated here through pseudo-population and simulated experiments  Slight improvement in census counts from registers is obtained from synthetic estimates  Though Census PES is required by EU regulation for evaluation purposes, our present results does not endorse the use of PES in order to correct Census counts  Even not discussed here, direct estimation with calibration achieved suitable results at domain level both in term of Bias and Variance Further developments  Better definition of pseudo-populations with respect to coverage ratios between EAs  Use of model estimation (EBLUP) is promising in our previous studies carried out in a simplified framework Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010


Download ppt "Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,"

Similar presentations


Ads by Google