IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official.

Slides:

Advertisements

Similar presentations

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.

Advertisements

Non response and missing data in longitudinal surveys.

Treatment of missing values

Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.

Missing Data. What is missing? Missing data are unavoidable, and more encompassing than the ubiquitous association of the term. What is missing? ~Cases.

IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 ROBUST REGRESSION IMPUTATION: CONSIDERATION ON THE INFLUENCE.

Adapting to missing data

How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Missing Data in Randomized Control Trials

How to deal with missing data: INTRODUCTION

LECTURE 15 MULTIPLE IMPUTATION

Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.

Survey Experiments. Defined Uses a survey question as its measurement device Manipulates the content, order, format, or other characteristics of the survey.

Eurostat Statistical Data Editing and Imputation.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Multiple Imputation Approaches for Right-Censored Wages in the German IAB Employment Register European Conference on Quality in Official Statistics 2008,

Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.

Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.

SIMULATION USING CRYSTAL BALL. WHAT CRYSTAL BALL DOES? Crystal ball extends the forecasting capabilities of spreadsheet model and provide the information.

Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.

Two bootstrapping routines for obtaining uncertainty measurement around the nonparametric distribution obtained in NONMEM VI Paul G. Baverel 1, Radojka.

1 Multiple Imputation : Handling Interactions Michael Spratt.

1 S T A T A U S E R S G R O U P M E E T I N G SEPTEMBER Multiple Imputation for households surveys A comparison of methods Stata Users Group Meeting.

Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,

Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.

Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

24-26 September 2012 UNECE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing Use of Machine Learning Methods to Impute Categorical.

Computer Science, Software Engineering & Robotics Workshop, FGCU, April 27-28, 2012 Fault Prediction with Particle Filters by David Hatfield mentors: Dr.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.

Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

for statistics based on multiple sources

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,

ECON 504: Advanced Economic Statistics August 23, 2011 George R. Brown School of Engineering STATISTICS.

© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.

The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.

1 The Monitoring of Linear Profiles Keun Pyo Kim Mahmoud A. Mahmoud William H. Woodall Virginia Tech Blacksburg, VA (Send request for paper,

Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.

Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.

A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.

- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.

Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.

Diagnostic methods for checking multiple imputation models Cattram Nguyen, Katherine Lee, John Carlin Biometrics by the Harbour, 30 Nov, 2015.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.

Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.

Methods and software for editing and imputation: recent advancements at Istat M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.

A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.

Tutorial I: Missing Value Analysis

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.

Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.

Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

Canadian Bioinformatics Workshops

Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.

Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.

HANDLING MISSING DATA.

Missing data: Why you should care about it and what to do about it

Computer aided teaching of statistics: advantages and disadvantages

Multiple Imputation using SOLAS for Missing Data Analysis

Multiple Imputation Using Stata

Missing Data Imputation in the Bayesian Framework

MEASUREMENT OF THE QUALITY OF STATISTICS

A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019.

Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.

Presentation transcript:

IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official Economic Statistics via Multiple Imputation: Unveiling the Invisible Missing Values National Statistics Center (Japan) Masayoshi Takahashi Notes: The views and opinions expressed in this presentation are the authors’ own, not necessarily those of the institution.

Outline 1. Problems of Missing Values and Imputation 2. Theory of MI and the EMB Algorithm 3. Mechanism Behind the Diagnostic Algorithm 4. Data and Missing Mechanism 5. Assessment of the Diagnostic Algorithm 6. Conclusions and Future Work 1

Problems of Missing Values  Prevalence of missing values  Effects of missing values Reduction in efficiency Introduction of bias  Assumptions and solution Missing At Random (MAR) Imputation 2 1. Problems of Missing Values and Imputation

Problematic Nature of Single Imputation (SI) 3 1. Problems of Missing Values and Imputation Deterministic SI Stochastic SI There is only one set of regression coefficients. Random noise ^ = OLS estimate

Multiple Imputation (MI) Comes for Rescue 4 2. Theory of Multiple Imputation and the EMB Algorithm ~ = random sampling from a posterior distribution Multiple sets of regression coefficients Need multiple values of

Likelihood of Observed Data 5 2. Theory of Multiple Imputation and the EMB Algorithm Random sampling from observed likelihood  Various computation algorithms  Not easy!! Solution

Computational Algorithms  EMB algorithm Expectation-Maximization Bootstrapping Most computationally efficient  Other MI algorithms MCMC FCS 6 2. Theory of Multiple Imputation and the EMB Algorithm

Graphical Presentation of the EMB Algorithm 7 2. Theory of Multiple Imputation and the EMB Algorithm

Paradox in Imputation  Imputed values Estimates, not true values Diagnosis  True values Always missing Cannot compare the imputed values with the truth  How do we go about imputation diagnostics? 8 3. Mechanism Behind the Diagnostic Algorithm

Solution to the Paradox  Indirect diagnostics of imputation Abayomi, Gelman, and Levy (2008) Honaker and King (2010)  MI Within-imputation variance Between-imputation variance 9 3. Mechanism Behind the Diagnostic Algorithm

Disadvantage of multiple imputation  Dozens of imputed datasets  Computational burden  Multiple values for one cell  Unrealistic to directly use in official statistics Mechanism Behind the Diagnostic Algorithm

Proposal in this Research  Two-step procedure Imputation step: Stochastic SI Diagnostic step: MI  Advantage Can have only one imputed value  Advantage of SI Can know the confidence about each imputed value  Advantage of MI Mechanism Behind the Diagnostic Algorithm  New!!

Multiple Imputation as a Diagnostic Tool  Variation among M imputed datasets Estimation uncertainty in imputation  Our diagnostic algorithm Utilizes this variability Can examine the stability & confidence of imputation models  What does this mean? See the next slide for illustration Mechanism Behind the Diagnostic Algorithm

Illustration: Two Cases of Variation in Imputations Mechanism Behind the Diagnostic Algorithm

Mathematical Representation Mechanism Behind the Diagnostic Algorithm Imputation Step: Stochastic SI Diagnostic Step: MI If, then no uncertainties What we actually check is whether

Data  Multivariate log-normal distribution  Mean vector & variance-covariance matrix Simulated dataset Manufacturing Sector 2012 Japanese Economic Census  Number of observations 1,000  Variables turnover, capital, worker Data and Missing Mechanism

Missing Mechanism  Target variable turnover  Missing rate 20%  Missing mechanism MAR A logistic regression to estimate the probability of missingness according to the values of explanatory variables (capital and worker) Data and Missing Mechanism

R-Function diagimpute  New function developed in R  Graphical detection of problematic imputations as outliers  Graphical presentation of the stability of imputation via control chart  Not yet publicly available A work in progress Once finalized, planning to make it publicly available Assessment of the Diagnostic Algorithm

Preliminary Result Assessment of the Diagnostic Algorithm

Preliminary Result Assessment of the Diagnostic Algorithm

Conclusions  MI as a diagnostic tool A novel way  Diagnostic algorithm Still a work in progress A preliminary assessment given Useful to detect problematic imputations  Help us strengthen the validness of official economic statistics Conclusions and Future Work

Future Work  Intend to further refine the algorithm  Test it against a variety of real datasets  Use several imputation models Conclusions and Future Work

References 1 1. Abayomi, Kobi, Andrew Gelman, and Marc Levy. (2008). “Diagnostics for Multivariate Imputations,” Applied Statistics vol.57, no.3, pp Allison, Paul D. (2002). Missing Data. CA: Sage Publications. 3. Congdon, Peter. (2006). Bayesian Statistical Modelling, Second Edition. West Sussex: John Wiley & Sons Ltd. 4. de Waal, Ton, Jeroen Pannekoek, and Sander Scholtus. (2011). Handbook of Statistical Data Editing and Imputation. Hoboken, NJ: John Wiley & Sons. 5. Honaker, James and Gary King. (2010). “What to do About Missing Values in Time Series Cross-Section Data,” American Journal of Political Science vol.54, no.2, pp.561– Honaker, James, Gary King, and Matthew Blackwell. (2011). “Amelia II: A Program for Missing Data,” Journal of Statistical Software vol.45, no King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve. (2001). “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation,” American Political Science Review vol.95, no.1, pp Little, Roderick J. A. and Donald B. Rubin. (2002). Statistical Analysis with Missing Data, Second Edition. New Jersey: John Wiley & Sons. 22

References 2 9. Oakland, John S. and Roy F. Followell. (1990). Statistical Process Control: A Practical Guide. Oxford: Heinemann Newnes. 10. Rubin, Donald B. (1978). “Multiple Imputations in Sample Surveys — A Phenomenological Bayesian Approach to Nonresponse,” Proceedings of the Survey Research Methods Section, American Statistical Association, pp Rubin, Donald B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons. 12. Schafer, Joseph L. (1997). Analysis of Incomplete Multivariate Data. London: Chapman & Hall/CRC. 13. Scrucca, Luca. (2014). “Package qcc: Quality Control Charts,” project.org/web/packages/qcc/qcc.pdf. project.org/web/packages/qcc/qcc.pdf 14. Statistics Bureau of Japan. (2012). “Economic Census for Business Activity,” Takahashi, Masayoshi and Takayuki Ito. (2012). “Multiple Imputation of Turnover in EDINET Data: Toward the Improvement of Imputation for the Economic Census,” Work Session on Statistical Data Editing, UNECE, Oslo, Norway, September 24-26,

References Takahashi, Masayoshi and Takayuki Ito. (2013). “Multiple Imputation of Missing Values in Economic Surveys: Comparison of Competing Algorithms,” Proceedings of the 59 th World Statistics Congress of the International Statistical Institute, Hong Kong, China, August 25-30, 2013, pp Takahashi, Masayoshi. (2014a). “An Assessment of Automatic Editing via the Contamination Model and Multiple Imputation,” Work Session on Statistical Data Editing, United Nations Economic Commission for Europe, Paris, France, April 28-30, Takahashi, Masayoshi. (2014b). “Keiryouchi Data no Kanrizu (Control Chart for Continuous Data),” Excel de Hajimeru Keizai Toukei Data no Bunseki (Statistical Data Analysis for Economists Using Excel), 3 rd edition. Tokyo: Zaidan Houjin Nihon Toukei Kyoukai van Buuren, Stef. (2012). Flexible Imputation of Missing Data. London: Chapman & Hall/CRC. 24

Thank you 25