Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France, 28-30.

Slides:



Advertisements
Similar presentations
Katherine Jenny Thompson
Advertisements

Migration of a large survey onto a micro-economic platform Val Cox April 2014.
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.
New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT.
1 Editing Administrative Data and Combined Data Sources Introduction.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Agenda: Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration –Who is randomized?
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
E&I for 2006 Canadian Census Mike Bankier Statistics Canada
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Vienna, 23 April 2008 UNECE Work Session on SDE Topic (v) Editing on results (post-editing) 1 Topic (v): Editing based on results Discussants: Maria M.
Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.
Eurostat Statistical Data Editing and Imputation.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
“... providing timely, accurate, and useful statistics in service to U.S. agriculture.” Wendy Barboza, Darcy Miller, Nathan Cruze United States Department.
12th Meeting of the Group of Experts on Business Registers
MANAGEMENT AND ANALYSIS OF WILDLIFE BIOLOGY DATA Bret A. Collier 1 and T. Wayne Schwertner 2 1 Institute of Renewable Natural Resources, Texas A&M University,
THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS G. Bianchi, R. M. Lipsi, P. Francescangeli, G. Ruocco, A.
Emerging methodologies for the census in the UNECE region Paolo Valente United Nations Economic Commission for Europe Statistical Division International.
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
New and Emerging Methods Maria Garcia and Ton de Waal UN/ECE Work Session on Statistical Data Editing, May 2005, Ottawa.
Challenges in Collecting Police-Reported Crime Data Colin Babyak Household Survey Methods Division ICES III - Montreal – June 20, 2007.
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) María García, Chandra Erdman, and Ben Klemens.
Recommended Practices for Editing and Imputation in the European Statistical System: the EDIMBUS Project* Orietta Luzi (Istat, Italy) Ton De Waal (Statistics.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
for statistics based on multiple sources
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Quality Assurance Programme of the Canadian Census of Population Expert Group Meeting on Population and Housing Censuses Geneva July 7-9, 2010.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Implicit Linear Inequality Edits Generation and Error Localization in the SPEER Edit System Maria Garcia U.S. Census Bureau UNECE Work Session on Statistical.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.
Outlier Treatment in HCSO Present and future. Outline Outlier detection – types, editing, estimation Description of the current method Alternatives Future.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.
Regional Seminar on Promotion and Utilization of Census Results and on the Revision on the United Nations Principles and Recommendations for Population.
Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Heather Wagstaff and Thomas Burg Topic (vi) Methodologies for Editing Census Data INTRODUCTION UNECE Work Session on Statistical Data Editing:Vienna
Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.
New and Emerging Methods UN/ECE Work Session on Statistical Data Editing Vienna April 21-23, 2008.
Generic Statistical Data Editing Models (GSDEMs) Workshop on the Modernisation of Official Statistics The Hague, 24 November 2015.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Ljubljana, 11 Mai 2011UNECE Work session on SDE Topic (vii) New and emerging methods 1 Topic (vii): New and emerging methods Discussion Discussants: Rudi.
Ljubljana, 11 Mai 2011UNECE Work session on SDE Topic (vii) New and emerging methods 1 Topic (vii): New and emerging methods Introduction Session organizers:
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
1 General Recommendations of the DIME Task Force on Accuracy WG on HBS, Luxembourg, 13 May 2011.
Methods for Data-Integration
Theme (v): Managing change
Generic Statistical Data Editing Models (GSDEMs)
Theme (i): New and emerging methods
Theme (ii): New Data Sources and Census
Estimation methods for the integration of administrative sources
Survey phases, survey errors and quality control system
Survey phases, survey errors and quality control system
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Data processing German foreign trade statistics
UNECE Work Session on Statistical Data Editing, Paris, 2014
New and Emerging Methods
Presentation transcript:

Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France, April 2014 Introduction

Topic (ii): New and Emerging Methods  Papers under this topic cover new and emerging methods or other methods for improving and/or optimizing the process of data editing and imputation.  There are eight papers addressing improvement in the following areas. Development of E&I for new kinds of data sets Improving imputation by selecting better methods Detection of outlying, suspect or erroneous values Evaluation methods/measures for E&I

E&I for new kinds of data sets  Imputation with multi-source data: the case of Italian SBS – Istat. Construction of a micro-data file from different administrative sources with imputation to fill in the gaps of non-observed values. Many constraints on the variables.  Data editing and scanner data – France Compute CPI using scanner data instead of data collected by persons. Vast amount of data is a performance challenge for automatic editing.

Improving imputation for particular applications by better methods  A new imputation methodology for the Agricultural Resource Management Survey –USA/NASS Old method: Condition mean New method: Iterative Sequential Regression, ISR  Multiple imputation methods for imputing earnings in the SIPP - USA Census Bureau. Old method: Randomized hot-deck New method: Sequential Regression Multivariate Imputation (SRMI).

Detection of outlying, suspect or erroneous values  Development of outlier treatment in HCSO – Hungary Overview of outlier detection methodology. Current method: much work by experts, not reproducible. Alternative: develop a quicker more automated method.  A generalized Fellegi-Holt paradigm for automatic editing – Netherlands Automatic error localization of random errors. Generalizes the FH-paradigm with the purpose of making automatic editing more similar to manual editing.

Evaluation methods/measures for E&I  Simulating multiple imputation – Germany Aim is to support the choice of imputation model by a tool in R. Simulating different missing data mechanisms and comparing different multiple imputation techniques using a number of performance measures  Implementation and Evaluation of Automatic Editing – Netherlands Implementation of automatic editing as a sequence of process steps performing different tasks. Evaluation measures across process steps. To optimize their configuration.

Enjoy the presentations! Agenda ‒Italy - Imputation with multi-source data: the case of Italian Structural Business Statistics −Hungary - Presentation and development of outlier treatment in HCSO −Germany – Simulating Multiple Imputation of Water Consumption in the Agricultural Census ‒United States Census Bureau – Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation Break (30 minutes ) ‒United States/NASS – Assessing the Impact of a New Imputation Methodology for the Agricultural Resource Management Survey −Netherlands – A generalized Fellegi-Holt paradigm for automatic editing −France – Data editing and scanner data −Netherlands – Implementation and evaluation of automatic editing Lunch Discussion

Topic (ii): New and Emerging Methods Summary

 Italy Imputation of a multi-source admin. data set for SBS. Imputation involved a sequence of 4 methods to take the properties of variables into account. Largest difference with traditional method is due to sampling error, not measurement error.  Hungary Reviews the current outlier detection method and considered a number of alternative indicators. Aim is to improve automation, reproducibility and analysis Considers development of a tool in R

Summary  Germany Development of a new R software tool for repeated simulation of different types of nonresponse and different imputation techniques. Test application: Multiple imputation. Linear regression with PMM or hot-deck residuals. PMM performs better.  USA Census Bureau Use the Sequential Regression Multivariate Imputation (SRMI) method in a population survey. Results show SRMI is a feasible alternative to the Hot deck and it has the potential to improve estimates. Imputations are set up in a multiple imputation framework.

Summary  USA National Agricultural Statistics Service Use Iterative Sequential Regression for imputation in an agricultural survey Results show the new method preserves important relationships and distributions of response data while providing a better estimate of uncertainty.  Netherlands – Generalization of FH Extension of the definition of the error localization problem: finding the minimum number of edit operations rather than finding the minimum number of fields to impute. Simulation study shows improvements over FH.

Summary  France Uses large volume scanner data for the CPI. Confidence intervals used in data editing can be more precise, but the performance of computations with “Big data” is a problem.  Netherlands – Evaluation of E&I A a step-by-step implementation and evaluation of the E&I process. Feedback on edit rules, deterministic and model based methods and data.

Summary  EUROSTAT (Tabled paper ) Imputation of continuous variables using four separate data mining procedures

Topic (ii) Discussion …………..

Topic (ii): Points for discussion How difficult is it to select and specify the correct imputation method/model. And to re-configure when variables or sources change. Can this be done in routine production, or must it be done by methodology staff. We see several applications of Multiple Imputation, but MI is still not much applied in standard production at NSI’s. What are the reasons for or against using MI? Are other methods for variance estimation also considered (replication, model-assisted)? Some problems are reported with outliers when using imputations drawn from a distribution (Germany, USA/NASS). Is there remedy?

Topic (ii): Points for discussion The Fellegi-Holt approach to error localization (minimum number of fields to change) does not always identifies the errors that human editors do. Are there experiences with shortcomings of the FH-approach? Alternatives for error localisation? What are the differences between the methods ISR (Iterative sequential regression) used by USA/NASS and the SRMI (Sequential regression multiple imputation) used by the Census Bureau.

New and Emerging Methods: Closing Remarks

Closing Remarks