Treatment of Missing Data Pres. 8

Slides:



Advertisements
Similar presentations
Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Advertisements

Unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
Harvard Center for Population and Development Studies1 Census Editing and the Art of Motorcycle Maintenance Michael J. Levin Center for Population and.
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
NLSCY – Non-response. Non-response There are various reasons why there is non-response to a survey  Some related to the survey process Timing Poor frame.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Overview of error model for estimates of foreign-born immigration using data from the American Community Survey Mary H. Mulry U.S. Census Bureau 2011 International.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
1 SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS Presenters: Nat McKee - Branch Chief Census Bureau Demographic Surveys Division (DSD) Income Surveys Programming.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
2010 World Programme on Population and Housing Censuses Workshop on Civil Registration and Vital Statistics in the UNESCWA Region Cairo, Egypt, December.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-I Evaluation of editing and.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Addis.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
1 Handbook on Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods,
Research Design
Workshop on World Programme for the Census of Agriculture 2020 Amman, Jordan May 2016 Theme 8: Demographic and social characteristics Technical Session.
6/24/2014 Utilizing Administrative Records in the 2020 Census SDC/CIC Steering Committee Update October 24, 2014.
Introduction to fertility
Dr. Unnikrishnan P.C. Professor, EEE
Methodologies & Procedures for Evaluation
Methodologies and Procedures for Evaluating Coverage and Content Error Pres. 6 United Nations Regional Workshop on the 2010 World Programme on Population.
2000 POPULATION AND HOUSING CENSUS:
Canadian Census E&I – Lessons Learned from 2006 with Plans for 2011
Presented for Workshop on 2010 Census Evaluation using PES,
Why do we need to evaluate the census?
I n f o r m a t i o n e n Wir bewegen
Post Enumeration Survey Census
Introduction to Survey Data Analysis
Using Weights in the Analysis of Survey Data
The European Statistical Training Programme (ESTP)
CENSUS EVALUATION & POST ENUMERATION SURVEYS
Central Statistics Organization
Vital statistics and their sources
Working Group on Population and Housing Censuses
Post Enumeration Surveys Pres. 2
Towards a Fully Adjusted Census Database for the 2011 Census
IPUMS-International Integration Process
The European Statistical Training Programme (ESTP)
Chapter 8: Weighting adjustment
Overview of Census Evaluation and Selected Methods Pres. 2
Overview of Census Evaluation and Selected Methods Pres. 2
Demographic Analysis and Evaluation
Overview of Approaches to Register-Based Populating Censuses
Using Weights in the Analysis of Survey Data
Generic Statistical Business Process-Censuses
Overview of Census Evaluation Methods
Evaluation of Content Error Pres. 10
Overview of Census Evaluation and Selected Methods Pres. 2
Planning and Implementation of Post Enumeration Surveys Pres. 4
Field procedures and non-sampling errors
Tabulations & Dual System of Estimation (DSE)
The European Statistical Training Programme (ESTP)
Evaluation of Content Error Pres. 10
Tabulation and Dual System of Estimation (DSE) Pres. 9
Chapter 13: Item nonresponse
Methodologies and Procedures for Evaluating Coverage and Content Error Pres. 6 United Nations Regional Workshop on the 2010 World Programme on Population.
Tabulations & Dual System of Estimation (DSE)
Adjusting Census Figures Pres. 11
Chapter 5: The analysis of nonresponse
Presentation transcript:

Treatment of Missing Data Pres. 8 United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

Treatment of Missing Data Why are some data missed? Refusals Item non-response Time constraints Paucity of resources Lax enumerators Units not found Insufficient data for matching, etc. United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010 2

Treatment of Missing Data Four types of missing data Unit missing data - Household non-interview Item missing data - When some information for household or person is available and some information is not available Unresolved match or residence status – When match or residence status in P-sample could not be determined for PES Estimation Unresolved enumeration status – When correct or erroneous enumeration status in E-sample could not be determined for PES estimation United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010 3

How to treat missing data ? A. doing nothing B. use only the complete records C. use a weighting method D. impute a missing value United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

A. Doing nothing If missing data are very few, it may not have significant effect on data usages and one can ignore them Requires to work with an incomplete dataset with missing data United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

B. Use only the complete records Easy but risky option. The subset of respondents may be: too small to be significant, Non representative of the total population under study Estimates may be seriously biased, unless non-response doesn’t depend on any of the variables of interest This option can be envisaged only for a rapid descriptive analysis United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

C. Use a weighting method Unit non-response: Increase the respondents’ weight to compensate for the non-respondents. The objective is to produce roughly unbiased estimates Item non-response: Possible to use reweighting methods but the main disadvantage is to have different weights for the same record (one for each of the variables). That’s why it is generally not used for item non-response United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

D. Imputation The process of imputation changes one or more responses or missing values in a record or several records to ensure internally coherent records result Before using any imputation method, the best strategy is to start with manual study of responses; imputation can then handle the remaining unresolved edit failures Two methods of imputation: Cold Deck and Hot Deck Cold Deck Imputation: Used mainly for missing or unknown values (not for inconsistent/invalid values) Values are imputed on a proportional basis from a distribution of valid responses (e.g., from previous census) In doing so, cold deck draws values from a fixed (but possibly outdated) distribution of values United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

D. Imputation (contd.) Hot Deck or Dynamic Imputation: Used for both missing data and inconsistent/invalid items Uses one or more variables to estimate the likely response based on data about individuals with similar characteristics The “donor set” (or imputation matrix) constantly changes through updating; therefore, imputations dynamically change during the process of editing all the records Thus, hot deck draws from a distribution that dynamically changes with each imputation and eventually (through modifications) “approaches” the distribution of current data set Caution: if the different items for a particular record have unknown values, hot deck may not use the same “donor” to impute for both missing values; in this case, it is preferable to use the same donor for both items United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

F. Imputation (contd.) Unresolved match or residence status in P-sample: Estimate probabilities of match (residence) status Form cells/groups to estimate probabilities Each cell be homogenous with respect to probability to be estimated Different/hetrogenous Probabilities between cells/groups Use reasons for field follow-up to form cells Unresolved enumeration status in E-sample: Estimate probabilities of correct enumeration Different/hetrogenous probabilities between cells/groups United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

G. Imputation (contd.) Essential to evaluation, process planning and management: i) number of cases of each type of error; ii) non-response rates for each item; iii) imputation rates for each item, …. Important to generate edit trail showing all data changes and substituted values with their tallies If original value of data is changed in any way; flags should be added onto each item that is changed or imputed This information is critical for planning of future censuses; e.g., As a means to investigate age threshold below which female with “child ever born” triggers a query edit and to decide if threshold should be modified for future rounds United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

A useful reference Handbook on Population and Housing Census Editing Rev. 1 Available on the UNSD website and currently under printing United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

Thank You! United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

Example of Hot Deck for Sample Household (Sex Only) ID number Relationship Sex Age Dynamic Imputation Matrix 1 39 2 35 3 13 4 9 1 10 5 40 6 99* 7 8 9 2 9 44 36 Missing Information: 9, 99 Relationship: 1=Head; 2=Spouse; 3=Child; 4=Other Relative; 5=Non-Relative Sex: 1=Male; 2=Female United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

Example of Hot Deck for Age (Sex and Relationship) Initial Imputation Matrix For Age Based on Sex and Relationship   Relationship Head of Household (1) Spouse (2) Son/Daughter (3) Other Relative (4) Non-Relative (5) Male (1) 35 12 40 Female (2) 32 37 United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

Example of Hot Deck for Age (Sex and Relationship) ID number Relationship Sex Age 1 39 2 35 3 13 4 9 1 10 5 40 6 99 40 7 8 9 2 99 37 9 44 36 Missing Information: 9, 99 Relationship: 1=Head; 2=Spouse; 3=Child; 4=Other Relative; 5=Non-Relative Sex: 1=Male; 2=Female United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010

Example of Hot Deck for Age (Sex and Relationship) Initial Imputation Matrix For Age Based on Sex and Relationship   Relationship Head of Household (1) Spouse (2) Son/Daughter (3) Other Relative (4) Non-Relative (5) Male (1) 35 12 40 Female (2) 32 37 39* 13* 44* 35* 36* Dynamic Imputation Matrix After Multiple Changes United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, 10-14 May 2010