Presentation is loading. Please wait.

Presentation is loading. Please wait.

Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.

Similar presentations


Presentation on theme: "Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data."— Presentation transcript:

1 Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data Editing (Oslo, Norway, 26 September 2012)

2 Outline Census Overview Edit and Imputation Methodologies Societal Differences and Challenges Performance Analysis Data Editing in the 2005 Census Conclusions

3 2011 Abu Dhabi Census Overview First census conducted by SCAD Main collection via CAPI, October 2011 20 questions Three methodologies used for edit and imputation, each with its own purpose: Donor Deterministic Manual

4 Edit and Imputation Methodologies Donor Imputation Canadian Census Edit and Imputation System v4.5 (CANCEIS) hot deck module Substitutes invalid value with value from “donor” record Deterministic Imputation Correct data via hard-coded rules (SAS) Applied mostly for out-of-scope responses Manual Imputation Manually check and modify data. Difficult cases like very large households.

5 Societal and Cultural Differences Very large household sizes: ~5 persons average Contrast to typical ~2.5 averages in Western countries Error rates increase with family size; used less exacting DLTs to account for this Households of 17+ treated as individual records, with some manual imputation as well

6 Societal Differences continued Complex relationships in large households Extended families High proportion of household servants Multiple wives – special consistency rules required Large Expatriate Population Many live in shared living arrangements Significant portion live in employer-provided camps Shares and collectives treated as 1-person households

7 Imputation Performance Example Statistics Predictive Accuracy: R 2 generated by regressing true on imputed values, used to assess predictive ability. Estimation Accuracy: Difference in means of true and imputed values, m 1, used to assess aggregate imputation accuracy. Imputation performance for Age Test Data: Starting with clean data, introduced two types of errors: missing data and “interchange” errors. Most performance measures from Euredit project (Charlton, 2003) Charlton, J. C. (2003).“Evaluating New Methods for Data Editing and Imputation - Results from the Euredit Project”, UNECE Statistical Data Editing Work session. Madrid, Spain.

8 Data Editing in the 2005 Census 2005: Manual and Deterministic imputation Phase 1: validation edits, outlier detection via SQL Small subset imputed via deterministic imputation Phase 2: Most failed records corrected manually Comparison to 2011 2005 performance unknown 2005: Three methodologists, several months’ preparation 15 data clerks, 4+ months 2011: Two methodologists, 5 months total

9 Conclusions Modern edit and imputation methodology successfully applied in distinct cultural context Reliable results Measurable changes More efficient approach Special thanks to CANCEIS E&I unit, Statistics Canada


Download ppt "Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data."

Similar presentations


Ads by Google