Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.

Slides:



Advertisements
Similar presentations
1 Editing the Integrated Census in Israel. EDITING THE INTEGRATED CENSUS IN ISRAEL Prepared by Eva Rotenberg, Central Bureau of Statistics, Israel (1)
Advertisements

Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
Harvard Center for Population and Development Studies1 Census Editing and the Art of Motorcycle Maintenance Michael J. Levin Center for Population and.
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
Quality assurance -Population and Housing Census Alma Kondi, INSTAT, Albania.
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
© John M. Abowd 2005, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2005.
IMPUTING MISSING VALUES FOR HIERARCHICAL POPULATION DATA Overview of Database Research Muhammad Aurangzeb Ahmad Nupur Bhatnagar.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Copyright 2010, The World Bank Group. All Rights Reserved. PROCESSING, Part 1 Data capture, editing, imputation and tabulation Quality assurance for census.
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22Slide 1 Verification and Validation u Assuring that a software system meets a user's.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys United.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Central egency for public mobilization and statistics.
The Use of Administrative Sources for Statistical Purposes Matching and Integrating Data from Different Sources.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
Register-Based Census 2011 in Slovenia – Some Quality Aspects Danilo Dolenc Statistical Office of the Republic of Slovenia UNECE-Eurostat Expert Group.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
© John M. Abowd 2007, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2007.
King Fahd University of Petroleum & Minerals Department of Management and Marketing MKT 345 Marketing Research Dr. Alhassan G. Abdul-Muhmin Editing and.
Data Capture Overview United Nations Statistics Division
New and Emerging Methods Maria Garcia and Ton de Waal UN/ECE Work Session on Statistical Data Editing, May 2005, Ottawa.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Challenges in Collecting Police-Reported Crime Data Colin Babyak Household Survey Methods Division ICES III - Montreal – June 20, 2007.
First Thoughts on Editing in Mixed Modes in the 2011 Census Heather Wagstaff and Ruth Wallis Methodology Directorate Office for National Statistics, U.K.
The Dutch Virtual Census of 2001 A New Approach by Combining Different Sources Eric Schulte Nordholt ECE Census meetings Geneva, November 2004.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Asunción,
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
Census Data Capture: ABS Experience 1991 to 2006 Noumea February 2008.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-I Evaluation of editing and.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
Generic Statistical Data Editing Models (GSDEMs) Workshop on the Modernisation of Official Statistics The Hague, 24 November 2015.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Coding and Data Processing Section A 1.
The 2011 Census: Estimating the Population Alexa Courtney.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
United Nations Symposium on Population and Housing Censuses 13 – 14 September 2004 New York.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
1 Handbook on Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods,
Theme (ii): New Data Sources and Census
Bangkok, Thailand, September 2008
28 November - 1 December 2016, Amman, Jordan
PRODUCTION PROCESS AND FLOW
Dar es Salaam, Tanzania, 9-13 June 2008
28 November - 1 December 2016, Amman, Jordan
Multi-Mode Data Collection Approach
Overview of Census Evaluation and Selected Methods Pres. 2
Overview of Census Evaluation and Selected Methods Pres. 2
Activities of the UNECE-UNODC Task Force on Victimization Surveys
Generic Statistical Business Process-Censuses
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Overview of Census Evaluation and Selected Methods Pres. 2
Multi-Mode Data Collection Approach
Treatment of Missing Data Pres. 8
Manual Data Capture – Key Entry
Multi-Mode Data Collection
Presentation transcript:

Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census Technology for SPECA and CIS member countries (Astana, 7-8 June 2007)

Paolo Valente - UNECE Statistical Division Slide 2 Content: 1.Coding 2.Editing and imputation Reference material:  Handbook on Census Management for Population and Housing Censuses (Chapter IV, sections D-F)  Handbook on Population and Housing Census Editing

Paolo Valente - UNECE Statistical Division Slide 3 1. Census data coding Questions:  How did you code the data in the last census?  Were you satisfied or not with coding?  What problems did you find in coding?  Any problems with specific variables?

Paolo Valente - UNECE Statistical Division Slide 4 Census data coding  Data coding = Assigning classification codes to the responses written on the census form  Coding systems:  Manual  Computer assisted  Automatic  Mix of a), b) or c)  Coding methodologies:  Simple (1 or 2 words): ex. Birth place  Structured (> 1 question): ex. Occupation  Hierarchical: ex. Address

Paolo Valente - UNECE Statistical Division Slide 5 Manual data coding  Clerks identify code using “code books”, and write it in the census form for later processing  Pros:  Easy to implement  No technology needed  Cons:  Time consuming  Labor intensive  Risk of inconsistency

Paolo Valente - UNECE Statistical Division Slide 6 Computer-assisted coding  Assisted by computerized system  Computer-based code books  How it works:  Coder type only few characters  System selects matching list  Coder choose right code  Code automatically recorded by the system

Paolo Valente - UNECE Statistical Division Slide 7 Computer-assisted coding  Pros:  Efficiency  Good quality  Particularly suitable for structured coding (possibility to include coding rules)  Cons:  Relatively complex system  Long time needed for development  Cost relatively high

Paolo Valente - UNECE Statistical Division Slide 8 Automatic coding  Based on computerized algorithms  No human intervention  Text captured by ICR and matched against indexes  A score is assigned by the system to the matched response:  If score is above certain level, response accepted  If score is below level, human intervention is needed (computer-assisted coding)

Paolo Valente - UNECE Statistical Division Slide 9 Automatic coding  Matching rates depend on algorithms used and type of variable  Maximum matching rates in ideal circumstances:  For simple variables (birth place), approx. 80%  For complex variables (occupation, industry), approx. 50%  All responses not matched have to be processed with computer assisted coding

Paolo Valente - UNECE Statistical Division Slide 10 Automatic coding  Pros:  High efficiency  Good quality (if system developed accurately)  Consistency  Particularly suitable for structured coding (possibility to include coding rules)  Cons:  Very complex system  Long time needed for development  High cost  Risk of systematic errors in case of faults in matching algorithms or indexes

Paolo Valente - UNECE Statistical Division Slide 11 Coding – Practices in 2000 round  In general CIS countries used manual coding  About half of UNECE countries used automatic coding, in combination with computer-assisted or manual coding  In most cases software developed in-house  Software for automatic coding:  ACTR (Automated Coding by Text Recognition) developed by Statistics Canada, also used by Italy, UK See “Measuring Population and Housing”, Chapter III  Integrated software system, including computer assisted coding: CSPro (US Census Bureau)

Paolo Valente - UNECE Statistical Division Slide 12 Coding in the 2010 census round Questions:  What are your plans for coding data of next census?  Are you considering computer-assisted coding?  Why? …or why NOT?

Paolo Valente - UNECE Statistical Division Slide Editing and imputation Questions on editing:  Which data did you edit in the last census?  How did you edit the data?  Did you have any problems?

Paolo Valente - UNECE Statistical Division Slide Editing and imputation Questions on imputation:  Did you impute any missing data? If yes:  For which variables?  What method and software you used?  Did you produce statistics on imputation rates?

Paolo Valente - UNECE Statistical Division Slide 15 Editing and imputation  Editing = Detecting and correcting errors in census data  Imputation = assigning values to missing data  The two concepts are related and the two terms are sometimes used in different ways

Paolo Valente - UNECE Statistical Division Slide 16 Editing and imputation  Different types of errors:  Coverage errors (ex. omissions, duplicates)  Enumerator errors  Respondent errors  Coding errors  Data entry errors but also…  Editing errors!

Paolo Valente - UNECE Statistical Division Slide 17 Editing and imputation  Important not only to detect errors, but also to identify causes, in order to take appropriate measures and improve overall quality  Objectives of editing and imputation:  Improve quality of census data  Facilitate analysis of census data  Identify types and sources of errors

Paolo Valente - UNECE Statistical Division Slide 18 Editing and imputation  Dilemma: what should be edited and what should NOT be edited?  Complex editing systems can be difficult and expensive to implement, and in some cases may introduce distortions  Go for relatively simple editing system!

Paolo Valente - UNECE Statistical Division Slide 19 Editing and imputation  In general, the editing system should be:  Minimalist (only obvious errors)  Automated (as much as possible)  Systematic  Compliant with other NSI procedures  Compliant with intl. standards

Paolo Valente - UNECE Statistical Division Slide 20 Editing and imputation General guidelines for editing:  Make the fewest required changes possible  Eliminate obvious inconsistencies  Supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or comparable group as a guide

Paolo Valente - UNECE Statistical Division Slide 21 Editing and imputation Example of inconsistent information 1:  Reference person and spouse have same sex

Paolo Valente - UNECE Statistical Division Slide 22 Editing and imputation Example of inconsistent information 2:  Excessive age difference between mother and children

Paolo Valente - UNECE Statistical Division Slide 23 Editing and imputation Editing approaches:  Top-down: Items in sequence, from first to last  Multiple variable (Fellegi-Holt):  A set of statements and relationships among variables are checked in the household  The edit keeps track of all false statements  The system assess how to best changes the data

Paolo Valente - UNECE Statistical Division Slide 24 Editing and imputation Imputation methods:  Static imputation (or “cold deck”)  Used mainly for missing values only  Value assigned from predetermined set, or distribution of valid responses  The set of values does not change over time  Dynamic imputation (or “hot deck”)  Used for missing or inconsistent values  Value assigned from “donor” with similar characteristics, that changes constantly  Response imputations change over time See “Handbook on Census Editing”, Ch. II.E and Annex V

Paolo Valente - UNECE Statistical Division Slide 25 Editing and imputation  Types of edits:  Fatal edits identify errors with certainty  Query edits identify suspected errors  Structure edits  Check coverage and relations between different units: persons, households, housing units, enumeration areas etc.  Edits for population and housing items See “Handbook on Census Editing”, Chapters III, IV and V

Paolo Valente - UNECE Statistical Division Slide 26 Editing and imputation Practices in 2000 round  Most ECE countries (33 out of 40) performed computer-supported editing, including several CIS countries  22 countries performed automatic imputations  Most countries developed specific software  Some countries used SAS, Oracle, SQL, CSPro See “Measuring Population and Housing”, Chapter III

Paolo Valente - UNECE Statistical Division Slide 27 Editing and imputation Plans for 2010 round Questions:  What are your plans for editing and imputation?  What editing approaches/methods are you considering?

Paolo Valente - UNECE Statistical Division Slide 28 Editing and imputation Plans for 2010 round Questions:  For which variables would you consider imputation of missing values?