Presentation on theme: "1 Editing the Integrated Census in Israel. EDITING THE INTEGRATED CENSUS IN ISRAEL Prepared by Eva Rotenberg, Central Bureau of Statistics, Israel (1)"— Presentation transcript:
EDITING THE INTEGRATED CENSUS IN ISRAEL Prepared by Eva Rotenberg, Central Bureau of Statistics, Israel (1) (1) I would like to thank Ari Paltiel who edited this paper
3 The paper describes the editing and imputation procedures which were used for the demographic variables in the integrated 2008 census in Israel.
Background The Israeli census is an "Integrated Census" Combines administrative data for 100% of the population with data obtained from a large sample survey (approximately 17% of the households in the country). The main administrative data source for the Improved Administrative File is the National Population Register (NPR) All register records are identified by a unique personal identity number (PIN), which can be used for matching records The NPR contains personal records for all citizens and permanent residents of Israel and includes demographic and residential information
The area survey of the census serves two main purposes: 1. The survey results provide parameters to calculate a weight which represents the probability of a person to actually reside in his/her registered Statistical Area (which is a is a compound of consecutive buildings/blocks consisting of an average of 5,000 inhabitants) in the NPR http://www.cbs.gov.il/mifkad/integ_census.pdf 2. Collecting socio-economic information such as labor force characteristics, household typology, education, housing, ownership of durable goods and disability Background
Patterns of Demographic Data in the NPR The demographic variables which were edited and imputed are: year of birth, sex, marital status, year of immigration, country of birth, and parents country of birth Edit checks, which were implemented with Canceis software, include : standard checks between relationships such as: ages of parents and children, marital status and age, year of immigration and year of birth, etc
Patterns of Demographic Data in the NPR The missing values of country of birth and parents country of birth are concentrated in older persons' records in the NPR The missing values of year of immigration were dispersed among younger persons born abroad The choice of imputation methods is dictated by such special population patterns and relationships between variables such as country of birth, parents country of birth, year of immigration, year of birth
8 Methodology of Editing and Imputation Cold deck imputation Deterministic imputation Statistical imputation NIM (Nearest-neighbor Imputation Methodology) using Canceis software (Canadian Census Edit & Imputation System).
Methodology of Editing and Imputation Cold-deck imputation : Imputation from external data sources - the census area sample survey and previous (traditional) censuses The imputation process is based on failed edits in the administrative source When a discrepancy is found in edited items between valid records of the administrative source with valid records of the census area survey we prefer the administrative source as the more reliable source of data for most variables.
Choice of methodology The imputation sequence progresses by degree of accuracy from strong to weak imputation Once the cold-deck imputation stage is exhausted, we weigh the possibilities of different imputation methods NIM does not apply to all cases for which imputation is needed either because : there are more certain possibilities the data source does not meet the preconditions for hot-deck imputation For these cases we used other imputation methods: deterministic imputation, statistical imputation
The Process of Editing and Imputation Strong deterministic imputation Completion from the Census sample survey Matching with previous censuses Weak deterministic imputation Statistical imputation Nearest-neighbor Imputation Methodology
12 Results The relative proportions of imputations at each stage of the process were determined by the data patterns of different demographic variables in the NPR and the tailoring of the combination of imputation methods
Results Year of immigration IndividualsPercents Strong Deterministic Imputation1804268.8 Imputation by Census Survey54310.3 Imputation by Previous Censuses 165690.8 Weak Deterministic Imputation18420.1 Statistical Imputation (means)159570.8 NIM69140.3 TOTAL22713911.0
Results Country of birth IndividualsPercents Imputation by Census Survey820.0 Imputation by Previous Censuses 5650.0 NIM27900.0 TOTAL34370.0
Results Fathers country of birth IndividualsPercents Imputation by Census Survey532161.3 Imputation by Previous Censuses 3048827.5 Weak Deterministic Imputation86680.2 NIM315810.8 TOTAL3983479.8
16 Summary In this paper we have shown how the methodology of the Integrated Census in Israel, characterized by a combination of administrative source and a field survey dictated the choice of imputation methods The imputation process as a whole and the relative proportions of imputations at each stage of the process were determined by the data patterns of different demographic variables in the NPR and the tailoring of the combination of imputation methods