Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.

Slides:



Advertisements
Similar presentations
Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Advertisements

The Microdata Analysis System (MAS): A Tool for Data Dissemination Disclaimer: The views expressed are those of the authors and not necessarily those of.
1 The Synthetic Longitudinal Business Database Based on presentations by Kinney/Reiter/Jarmin/Miranda/Reznek 2 /Abowd on July 31, 2009 at the Census-NSF-IRS.
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
BEA Economic Areas Aligning Workforce & Economic Information Association of Public Data Users APDU 2008 Annual Meeting The Brookings Institution Washington,
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Semi-Permeable Boundaries Among Institutions: Non-Public Data and the Census RDC at Berkeley IASSIST 2009 – Tampere, Finland Jon StilesMay 27, 2009.
© John M. Abowd 2005, all rights reserved Recent Advances In Confidentiality Protection John M. Abowd April 2005.
Recent Advances In Confidentiality Protection – Synthetic Data John M. Abowd April 2007.
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
UNECE Workshop on Confidentiality Manchester, December 2007 Comparing Fully and Partially Synthetic Data Sets for Statistical Disclosure Control.
“OnTheMap” The Census Bureau’s New Tool for Residence-Workplace Analysis Fredrik Andersson and Jeremy Wu May 7, 2007 Daytona Beach, FL.
New Census Bureau Data for Entrepreneurship Research Ron S Jarmin US Census Bureau OECD November 19, 2007 This report is released to inform interested.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health.
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
Dissemination to support Research & Analysis John Cornish.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Framework of Statistical Information. This is a typology of the categories or classes of statistical information. Remember the relationship between statistics.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Statistical data confidentiality and micro data in Albania
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
FORUM GUIDE TO SUPPORTING DATA ACCESS FOR RESEARCHERS A STATE EDUCATION AGENCY PERSPECTIVE Kathy Gosa, Kansas State Department of Education.
1 The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,
Jerry Reiter Department of Statistical Science and the Information Initiative at Duke Duke University.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Demographic Full Count Review Presentation to the FSCPE March 26, 2001 Washington D.C.
Developing job linkages for the Health and Retirement Study John Abowd, Margaret Levenstein, Kristin McCue, Dhiren Patki, Ann Rodgers, Matthew Shapiro,
CENSUS AND TECHNOLOGY Presented by Dr. Muhammad Hanif 1.
Access to European Statistical System microdata
Short Training Course on Agricultural Cost of Production Statistics
Data Confidentiality and the Common Good.
Development of Strategies for Census Data Dissemination
Secure Data Laboratories: The U.S. Census Bureau Model
Differentially Private Verification of Regression Model Results
Data Accessibility, Confidentiality and Copyright United Nations Statistics Division Demographic Statistics Section.
Creating Something from Nothing: Working with Synthetic Files
Release of Microdata John Cornish.
Working with Sensitive or Confidential Data John Southall Bodleian Data Librarian Subject Consultant for Economics, Sociology, Social Policy and.
The Development of Statistical Business Registers in
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
"Development of Strategies for Census Data Dissemination".
Data collection with Internet
Presentation 2b 2018 Census Products & Services Engagement.
Current Developments in Differential Privacy
Identifying Worker Characteristics Using LEHD and GIS
Unit 6: Application Development
Idendification of and Consultation with Census Data Users
Martha Stinson. T. Kirk White. James Lawrence
Sub-regional workshop on integration of administrative data, big data
5 November, 2018 Nuku’alofa, Tonga
Nicolás J. I. Rodríguez & Arild Mellesdal
Classification Trees for Privacy in Sample Surveys
Quality, efficiency and productivity: a challenge for official statistics EFTA/CROSTAT/EUROSTAT Strategic Management Seminar, Split, November 2007.
Albania 2021 Population and Housing Census - Plans
On data accessibility and confidentiality……..
Maldives Review of the Statistical System of Maldives and the Statistics Development Plan Fifth Project Support Meeting Bangkok, Thailand | 9 May 2018.
Data collection with Internet
A strategic approach to data development and data sharing in the social sciences Peter Elias NCRM/SRA Workshop: "Data Linkage: Exploring the Potential"
The role of metadata in census data dissemination
Data collection with Internet
Local Employment Dynamics:
The Role of Metadata in Census Data Dissemination
OU BATTLECARD: Oracle WebCenter Training
Jerome Reiter Department of Statistical Science Duke University
Using paradata to explore users’ pathways through web surveys
OU BATTLECARD: WebLogic Server 12c
Presentation transcript:

Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau Thomas A. Louis U.S. Census Bureau and John Hopkins University Javier Miranda U.S. Census Bureau Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. 1

Tension between Confidentiality and Information Content of Statistical Products NSOs use a wealth of information to fulfill their mandate to provide information about the economy and population including – Survey collections – Administrative records Individuals often are legally required to provide this information –  High response rates –  High quality data –  Structured data (representative of the underlying population) NSOs are entrusted to use this information only for statistical purposes, with the explicit understanding that the data are confidential and won't be disclosed – Disclosure protection often codified in statutes Necessarily limiting the types of data products that can be publically released 2

Increasing the variety and information in statistical products while maintaining confidentiality Statistical products are constructed via suitable aggregation of confidential survey, census or administrative microdata – Products cannot reveal data for individual persons, households or firms However, products constructed from microdata so that re-identification is sufficiently difficult can potentially reveal details and features that traditional aggregations must mask to maintain confidentiality One approach to avoid literal masking is to use multiple imputation to fully or partially synthesize key variables or entire datasets The resulting “Synthetic” data set can be used to construct safe data products 3

Public-Use Synthetic Microdata Some sophisticated users require access to detailed microdata The traditional solution was to provide access through – Anonymized micro data – Secure environments such as a Research Data Center Anonymized microdata is becoming more risky and so restricted access, even in RDCs Synthetic datasets can address both concerns, and Census has produced two experimental versions – SIPP-Synthetic Beta: household data augmented with administrative data – Synthetic Longitudinal Business Database (SynLBD): synthetic panel of business establishments 4

Synthetic Data Synthetic data is the result of a process of data anonymization Synthetic data are generated by modeling real data – Sampling from posterior predictive distribution – Sequential regression multivariate imputation The resulting data tries to preserve as many of the moments of the real data as well as summary aggregates Higher confidentiality protection since respondent information is modeled 5

Development of Synthetic data products at the Census Bureau Collaboration between academe and Census research staff – Cornell and Duke Universities, National Institute for Statistical Sciences – Center for Economic Studies, Center for Disclosure Avoidance Research and Social, Economic and Household Statistics Division Partial funding from the National Science Foundation Census staff benefit greatly from working with leading academics Collaborations resulted in two class of new data products employing synthetic or partially synthetic data – Synthetic Public Use Micro Datasets – Online tools 6

Synthetic Data to Support Online Tools and Apps Partially Synthetic data allow users to select custom geographies in “OnTheMap” Commuting patterns natural disasters 7

Future of synthetic data in official statistics Data users will access official statistics in apps developed outside of NSOs – The Census Bureau has an open data API that allows developers to call tabulated estimates from several surveys – Allowing an API to call from synthetic micro data would allow developers much more latitude in creating apps that give users maximum flexibility Synthetic data can play a prominent role in producing public- use micro datasets as it gets more difficult to release anonymized data that protects confidentiality 8

Challenges Synthetic data aren’t valid for all analysis goals, and determining validity can be challenging Users may not accept or “trust” synthetic data On the other hand, users may put too much faith in estimates from synthetic data, taking the values as “truth” Already observed for multi-year estimates from the American Community Survey Statistical inferences require additional computations, requiring a higher level of user sophistication or flexible algorithms NSOs may not have sufficient staff with the training and skills to develop, deploy and support synthetic data products 9

Next Steps Research and development of synthetic data products continues – SynLBD v3 will be expanded to include firm characteristics and full time coverage NSOs need to pay special attention to educating the public on correct use of synthetic data and to make sure they know when they are using it – This might require developing software code to facilitate correct computation of statistical inferences Partnerships with academe should be reinforced – Mutually beneficial and key to developing internal human capital – Universities can develop courses targeted at NSO staff More aggressively explore partnerships with private industry to develop easy to use, multi-platform synthetic data applications Resources are limited, but interest and potential are sufficiently high that public/private partnerships are possible 10