Www.bls.gov Evaluation of Multiple Components of Error in the Collection and Integration of Survey and Administrative Record Data John L. Eltinge International.

Slides:



Advertisements
Similar presentations
Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University
Advertisements

Comparison of Simulation Methods Using Historical Data in the U.S. International Price Program M.J. Cho, T-C. Chen, P.A. Bobbitt, J.A. Himelein, S.P. Paben,
Description, Characterization and Optimization of Drill-Down Methods for Outlier Detection and Treatment in Establishment Surveys J. L. Eltinge, U.S. Bureau.
Characterization and Management of Multiple Components of Cost and Risk in Disclosure Protection for Establishment Surveys Discussion of Advances in Disclosure.
Estimating the Level of Underreporting of Expenditures among Expenditure Reporters: A Further Micro-Level Latent Class Analysis Clyde Tucker Bureau of.
Paul Smith Office for National Statistics
Conducting the Community Analysis. What is a Community Analysis?  Includes market research and broader analysis of community assets and challenges 
MARKET RESEARCH. LEARNING INTENTIONS Students will be able to:  Describe the basic terminology of statistics  Explain how ‘sampling’ can help marketers.
Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy
Federal Guidance on Statistical Use of Administrative Data Shelly Wilkie Martinez, Statistical and Science Policy, OIRA U. S. Office of Management and.
Metadata to Support the Survey Life Cycle Alice Born, Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Geneva,
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
TAJSTAT: Strengthening the National Statistical System Project Mustafa Dinc TLSS and MICS Conference Dushanbe, Tajikistan July 1, 2008.
François Brisebois, Statistics Canada International Total Survey Error Workshop June 15, 2010 Improvements to Economic Survey Methodologies to Reduce Revisions.
Results and next steps from the ESSnet Admin Data Alison Pritchard Business Outputs & Developments, Office for National Statistics, UK 4 December 2012.
1 Editing Administrative Data and Combined Data Sources Introduction.
Consumer Expenditure Survey Redesign Jennifer Edgar Bureau of Labor Statistics COPAFS Quarterly Meeting March 4, 2011.
Chapter Three Research Design.
Enhancing U.S. Statistics on Trade in Services Maria Borga U.S. Bureau of Economic Analysis September 14, 2010.
The Use of Administrative Sources for Statistical Purposes Administrative Sources and Statistical Registers.
Arun Srivastava. Types of Non-sampling Errors Specification errors, Coverage errors, Measurement or response errors, Non-response errors and Processing.
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
Administrative Data at Statistics Canada – Current Uses and the Way Forward 27 th Voorburg Group Meeting Warsaw, Poland André Loranger October 4, 2012.
Market Research.
Aggregate and Systemic Components of Risk in Total Survey Error Models John L. Eltinge U.S. Bureau of Labor Statistics International Total Survey Error.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
Volunteer Angler Data Collection and Methods of Inference Kristen Olson University of Nebraska-Lincoln February 2,
Nonresponse issues in ICT surveys Vasja Vehovar, Univerza v Ljubljani, FDV Bled, June 5, 2006.
Modernization and Reengineering of the Census of Governments A focus on the Quarterly Tax Survey June 4, 2010.
Two Approaches to the Use of Administrative Records to Reduce Respondent Burden and Data Collection Costs John L. Eltinge Office of Survey Methods Research.
12th Meeting of the Group of Experts on Business Registers
1 Presentation to OG6 Canberra, Australia May 2011 Statistical Uses of Administrative Data in Canada.
The Future of Administrative Data ICES III End Panel Discussion Don Royce Statistics Canada June 2007.
Impact of using fiscal data on the imputation strategy of the Unified Enterprise Survey of Statistics Canada Ryan Chepita, Yi Li, Jean-Sébastien Provençal,
Role of Statistics in Geography
Plans for the Research and Testing Phase of the 2020 Census Presentation to the State Data Centers October 15, 2010 Daniel H. Weinberg (Assistant Director.
Using Multiple Methods to Reduce Errors in Survey Estimation: The Case of US Farm Numbers Jaki McCarthy, Denise Abreu, Mark Apodaca, and Leslee Lohrenz.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
Poverty measurement: experience of the Republic of Moldova UNECE, Measuring poverty, 4 May 2015.
Implementation of quality indicators in the Finnish statistics production process Kari Djerf Statistics Finland Q2008, Rome Italy.
1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline.
1 Assessing the Quality of Administrative Data 2012 FCSM Statistical Policy Seminar Session 7: New Perspectives on the Quality of Administrative Data December.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Copyright 2010, The World Bank Group. All Rights Reserved. Reducing Non-Response Section B 1.
Household Economic Resources Discussant Comments UN EXPERT GROUP MEETING 9 September 2008 Garth Bode, Australian Bureau of Statistics.
© Federal Statistical Office Germany, Division IB, Institute for Research and Development in Federal Statistics Sheet 1 Surveys, administrative data or.
European Conference on Quality in Official Statistics 8-11 July 2008 Mr. Hing-Wang Fung Census and Statistics Department Hong Kong, China (
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Sources of Errors M&E Capacity Strengthening Workshop, Addis Ababa 4 to 8 June 2012 Arif Rashid, TOPS.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Household Survey Data on Remittances in Sending Countries Johan A. Mistiaen International Technical meeting on Measuring Remittances Washington DC - January.
3-1 Copyright © 2010 Pearson Education, Inc. Chapter Three Research Design.
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
Overview and challenges in the use of administrative data in official statistics IAOS Conference Shanghai, October 2008 Heli Jeskanen-Sundström Statistics.
Q2010 Special session 34 Data quality and inference under register information Discussion by Carl-Erik Särndal.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
Are the Standard Documentations really Quality Reports? European Conference on Quality in Official Statistics Helsinki, 3-6 May 2010 © STATISTIK AUSTRIA.
Administrative Data at Statistics Canada – Current Uses and the Way Forward Wesley Yung and Peter Lys, Statistics Canada.
Integrating Quantitative Literacy into Your Course.
1 General Recommendations of the DIME Task Force on Accuracy WG on HBS, Luxembourg, 13 May 2011.
Practical Approaches to Design and Inference Through the Integration of Complex Survey Data and Non-Survey Information Sources John L. Eltinge, Scott.
Methods for Data-Integration
Chapter 3: Cost Estimation Techniques
Implementation of Quality indicators for administrative data
Reducing Survey Burden Through Third-Party Data Sources
4.1. Data Quality 1.
Survey phases, survey errors and quality control system
Survey phases, survey errors and quality control system
Presentation transcript:

Evaluation of Multiple Components of Error in the Collection and Integration of Survey and Administrative Record Data John L. Eltinge International Total Survey Error Workshop June 15, 2010

Acknowledgements and Disclaimer The author thanks John Bosley, Moon Jung Cho, Larry Cox, Mike Davern, Mark Denbaly, Jennifer Edgar, Gretchen Falk, Bob Fay, Scott Fricker, Jenna Fulton, Karen Goldenberg, Jeff Gonzalez, Mike Horrigan, Bill Iwig, Alan Karr, Frauke Kreuter, Francois Laflamme, Judy Lessler, Shelly Martinez, Bill Mockovak, Jay Ryan, Adam Safir and Clyde Tucker for many helpful discussions. The views expressed in this paper are those of the author and do not necessarily reflect the policies of the Bureau of Labor Statistics.

Overview: I. Introduction: Types and Uses of Admin Records II. Example: Prospective Redesign of the U.S. Consumer Expenditure Survey III. Expansion of TSE to “Total Statistical Risk” IV. Cost Issues V. Mathematical Structures

I. Introduction: Types and Uses of Administrative Records A. Goals in Statistical Work with Administrative Data 1. State of nature: Current levels, comparison across groups, changes over time 2. Evaluation of a current policy or program 3. Evaluation of a prospective policy or program Links of (1), (2) and (3) with mission, formal constraints, incentive structures and institutional culture of a statistical agency or a “record originating” agency?

B. Statistical Uses of Administrative Records 1. Frames, auxiliary information for surveys 2. Direct use: Simple aggregates, complex modeling, use in microsimulation 3. Complex integration of survey, administrative data 4. Suggestion today: Work with administrative records will require us to expand the ideas of “total survey error” to incorporate “total statistical risk”

II. Example: Prospective Revision of the U.S. Consumer Expenditure Survey A. Population Structure 1. Population: All consumer purchases (transactions) in specified categories by: - Six-digit Universal Classification Code - Geography - Characteristics of the consumer unit (household) and outlet (store) - Time period

2. Inferential goals: Vary widely across stakeholders a. Mean of consumer expenditures in specified categories b. Analytic uses (regression, generalized linear models, quantiles)

B. Prospective Data Sources 1. Traditional sample surveys: Diary and interview 2. Administrative record systems a. Full population b. Superset of full population Ex: Aggregate sales in specified UCCs

c. Specialized subsets of the full population Ex: “Loyalty card” data linked with sample CUs (conditional on informed consent and confidentiality protections) 3. Specialized surveys to calibrate data from administrative records (cf. Lessler, 2006)

III. Expansion of “Total Survey Error” to “Total Statistical Risk” A. Working Model for Methodological Properties X = Frame, weight information Y = Sample survey data Z = Additional auxiliary information Properties of estimator based on variability from: 1. Population structure (superpopulation model) 2. Administrative and survey collection processes (“filters” including all TSE components) 3. Homogeneity of (1) and (2) across cases

B. Formal Evaluation of Properties: Evaluate expectation with respect to each component in (III.A) Current information available at conceptual, empirical levels? Critical importance of understanding the underlying processes for collection and reporting of administrative data

C. Prior Literature (Examples) Davern (2007, 2009), FCSM (1980, SPWP #6), Herzog, Winkler and Scheuren (2007), Jabine and Scheuren (1985), Jeskanen-Sundstrom (2007), Ord and Iglarsh (2007) Penneck (2007), Royce (2007), Winkler (2009)

D. From prior literature: Two concepts of data quality 1. Per Davern (2007), extend usual ideas of “total survey error” (TSE) to admin data: (Estimator) – (True value) = (frame error) + (sampling error) + (nonresponse effects) + (measurement error) + (processing effects)

2. Consider broader defs of data quality, e.g., Brackstone (1999): Accuracy (all components of TSE), AND: Timeliness, Relevance, Interpretability, Accessibility, Coherence 3. Risk: Failure in any component of data quality a. Aggregate risks: Historical focus of quantitative work b. Systemic risks: Often very important for stat programs - cf. “complex and tightly coupled systems” (Perrow, 1984)

IV. Cost Issues A. Statistical products (including surveys and administrative records) are capital intensive - Primarily intangible capital 1. Data originators: - Initial administrative purpose (amortize?) - Accommodate statistical agency (data quality, learning curve, systems) 2. Statistical agencies: - Learning curves - Systems for acquisition, edit/impute - Disclosure limitation

B. Broad acknowledgement of substantial costs C. Less empirical information generally available on: 1. Relative magnitudes of specific cost components 2. Extent of homogeneity of results from (1) with respect to: i. Type of administrative agency ii. Type of administrative records iii. Subpopulation iv. Other factors

D. Level of precision available on cost information 1. Purely qualitative 2. Order-of-magnitude 3. Relatively precise E. Practical uses of cost information 1. Qualitative decisions among options 2. Fine-tuning specific procedure F. Sources of cost information (F. Laflamme, 2008) 1. Special studies (risks: Hawthorne, incomplete accounting) 2. Cost-recovery contract accounting

V. Mathematical Structure for Full- Population Inference Based on Integration of Data from a General Survey and Specialized Administrative Records A. Population: Goal: Inference for

B. Data Sources 1. General survey for (most of) full pop 2. Administrative records: Estimators 3. Integrate (2) with general-survey data? Costs? Risks? Improvements in precision? 4. Example: U.S. Consumer Expenditure Survey Supplement usual (expensive) collection with specialized data from retail administrative records, transaction intermediaries (with permission)?

C. Easy case: 1. Frame allows partition of 2. Use high-quality estimators for direct use in D. Harder cases: 1. Screening questions, multiple-frame surveys 2. Adjust or downweight use of due to quality problems?

E. Risk Factors 1. Each of the usual data-quality issues: accuracy, timeliness, relevance, interpretability, accessibility, coherence 2. Operational risk: Degradation of quality of admin source 3. Costs for: a. Supplementary data source b. Screening for subpopulation membership c. Microdata review, edit and imputation d. Production systems for integration e. Investments in human resources

VI. Closing Remarks: Prospective Integration of Administrative Records with Survey Data A. Good opportunities for 1. Expanded statistical information for stakeholders 2. Reduction of overall production costs B. Suggestion: Expand Evaluation from “Total Survey Error” to “Total Statistical Risk” C. Evaluation of dominant factors of statistical risk and aggregate costs D. Example: Prospective Revision of the U.S. Consumer Expenditure Survey The papers today help us to understand more about some components in (A)-(C) Lots of interesting work for future years

Contact Information John L. Eltinge Associate Commissioner Office of Survey Methods Research