Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center

Slides:



Advertisements
Similar presentations
How IPUMS Harmonizes Microdata Data Sources and Bibliography Data Sources: Original census data are contributed to the IPUMS- International project by.
Advertisements

Using a restricted-access web-site of anonymized, integrated census microdata (for 1, 2, 3, 4,
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS i integration principles IPUMS i integration principles » 1. Respect absolute anonymity and confidentiality »
REPUBLIC OF RWANDA National Institute of Statistics Prepared by Emmanuel GATERA National Institute of Statistics of Rwanda Management Information Systems.
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
Preservation and Security IPUMS International Wendy Thomas Data Archivist.
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Steven Ruggles Minnesota Population Center.
Proposed IPUMS-International Secure Data Enclave Patricia Kelly Hall
5. Integration of Microdata and Metadata (9 slides)
The IPUMS-International dynamic metadata system * * * Robert McCaa, Professor of Population History University of Minnesota.
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
Original dataOriginal data. (various) Reformat dataReformat data: structural issues draw sample confidentiality (general tools) Data dictionary. (txt/pdf)
MONGOLIA COUNTRY REPORT National Statistical Office IPUMS-Global Workshop, Lisbon, Portugal, August 22-26, 2007.
Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota Population Center.
IPUMS-EurAsia, : Changing Patterns of Microdata Use * * * Robert McCaa, Professor of Population History University.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Robert McCaa and Steven Ruggles Minnesota.
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
Census Bureau – Fernando Casimiro, Coordinator Lisboa IPUMS - Portugal Country Report.
Indigenous peoples, ethnicity and identities in contemporary censuses: A global perspective source: *
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
2014 SDC and CIC Annual Training Conference: Accessing ACS PUMS Data Tim Gilbert U.S. Census Bureau April 2, 2014.
National Household Survey: collection, quality and dissemination Laurent Roy Statistics Canada March 20, 2013 National Household Survey 1.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Design and Use of the IPUMS-International Data Series
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
IPUMS-International Steven Ruggles Minnesota Population Center.
POPULATION AND HOUSING CENSUSES IN SLOVAKIA ON THE WEBSITE Miroslav Hudec Pavol Büchler INFOSTAT – Bratislava MSIS Geneva
Design and Use of the IPUMS-International Data Serieshttp://international.ipums.org Matt Sobek Minnesota Population Center
The Minnesota Data Harmonization Projects Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek.
IPUMS-International Methods Matt Sobek Minnesota Population Center
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,
ELSA ELSA datasets and documentation available from the archive or by special arrangement Kate Cox National Centre for Social.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Quality Assurance Programme of the Canadian Census of Population Expert Group Meeting on Population and Housing Censuses Geneva July 7-9, 2010.
IPUMS Microdata Relation to head Marital status Literacy Occupation.
Editing of linked micro files for statistics and research.
 Background Data harmonization Data output  Web: Variable documentation system  Web: Data extract system IPUMS Dissemination System.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Integrated Public Use Microdata Series IPUMSwww.ipums.org Matt Sobek Minnesota Population Center
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
The Integrated Public Use Microdata Series database IPUMSwww.ipums.org Lab 1 Background on the IPUMS and SPSS.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
IPUMS-International Process Matt Sobek Minnesota Population Center
The 2011 Census: Estimating the Population Alexa Courtney.
3. IPUMS Documentation Dynamic Metadata System: 5 “clicks” to compare any census question, in English, for any combination of years and countries in the.
Census Office Fernando Casimiro Geneva, July 2010 Portugal – Census results tailored to user needs «
Click “Browse and Select Data”:  to view integrated metadata  and to get microdata (make an “extract”) Note: the data are “pooled” into a single file–
Integrated Public Use Microdata Series IPUMS Internationalwww.ipums.org Matt Sobek Minnesota Population Center
Integrated Public Use Microdata Series IPUMSwww.ipums.org.
Data access and development: The IPUMS perspective United Nations Commission on Population and Development The data revolution in action: National and.
View Source Documents Images in the official language(s) Text in English (translated, where necessary) Press to continue tutorial Topic: Source Documents.
CENSUS MICRODATA : THAILAND NATIONAL STATISTICAL OFFICE by PAKAMAS RATTANALANGKARN Thailand National Statistical Office.
Matt Sobek Minnesota Population Center
Integrated Health Interview Series (IHIS): Providing Free, Integrated NHIS Data over the Internet Miriam L. King, PhD, Minnesota Population Center Brian.
IPUMS-International Schedule
IPUMS “Pointer” Variables
Explore variables metadata (18 slides)
IPUMS-International Integration Process
CENSUS MICRODATA : THAILAND
Topic: Source Documents (9 slides)
Danilo Dolenc Statistical Office of the Republic of Slovenia
Metadata use in the Statistical Value Chain
The IPUMS-International Dissemination System
Presentation transcript:

Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center

IPUMS-International Development Process 1. Acquisition 2. Metadata Preparation 3. Data Preparation 4. Harmonization 5. Data Enhancements 6. Dissemination

IPUMS-International Development Process 1. Acquisition Data Data Data dictionary Data dictionary Census questionnaire and instructions Census questionnaire and instructions Sample design Sample design

IPUMS-International Development Process 2. Metadata Preparation English translation English translation Data dictionaries Data dictionaries

Original Data Dictionary (Kenya 1989)

Original Data Dictionary (Romania 1992)

Original Data Dictionary (China 1982)

Original Data Dictionary (Mexico 1990)

Variable Labels File – IPUMS Metadata (Costa Rica 2000)

IPUMS-International Development Process 2. Metadata Preparation English translation English translation Data dictionaries Data dictionaries Questionnaires and instructions Questionnaires and instructions

Census Questionnaire (Mexico 2000) WaterAccess

Text of Census Questionnaire (Mexico 2000)

XML-Tagged Census Questionnaire (Mexico 2000) Source variable MX00A016 MX00A017 MX00A018 (water access)

Source variable MX00A018 XML-Tagged Census Instructions (Mexico 2000)

IPUMS-International Development Process 3. Data Preparation Data reformatting Data reformatting

geographyhousing person (head) person (child) geographyhousingperson (head) geographyhousingperson (child) geographyhousingperson (child) geographyhousingperson (head) geographyhousingperson (spouse) geographyhousingperson (child) geographyhousingperson (child) geographyhousing person (head) person (spouse) person (child) (Brazil 1980) (Person records only; household data duplicated on person records) Reformat Rectangular Sample

dwelling household person (head) person (spouse) person (child) household person (head) person (child) person (head) person (spouse) dwelling household dwellinghousehold person (head) person (spouse) person (child) dwellinghousehold person (head) person (child) dwellinghousehold person (head) person (spouse) (Chile 1992) (Separate dwelling and household records) Reformat Dwelling-Household-Person Sample

serial 001head serial 001spouse serial 002head serial 002child serial 003head serial 001geog & housing serial 002geog & housing serial 003geog & housing serial 001household serial 001head serial 001spouse serial 003household serial 002household serial 002head serial 002child serial 003head Household File Person File (Brazil 2000) Merge Separate Household and Person Files

Data File before Reformatting

Data File after Reformatting

IPUMS-International Development Process 3. Data Preparation Data reformatting Data reformatting Draw samples Draw samples Confidentiality measures Confidentiality measures Stratified using geography, ethnicity, hh size, Stratified using geography, ethnicity, hh size, hh type, SES; adjusted as necessary for census hh type, SES; adjusted as necessary for census Limit geographic specificity Limit geographic specificity Swap across geographic units Swap across geographic units Randomize order within geographies Randomize order within geographies Merge small variable categories Merge small variable categories Top-code sensitive numeric variables Top-code sensitive numeric variables

IPUMS-International Development Process 4. Harmonization Data Data Translation tables Translation tables

Translation Table – Marital Status China1982Colombia1973Kenya1989Mexico1970U.S.A.1990

General Codes

IPUMS-International Development Process 4. Harmonization Data Data Translation tables Translation tables Supplemental programming Supplemental programming

Supplementary Variable Programming (INCTOT)

IPUMS-International Development Process 4. Harmonization Data Data Correspondence tables Correspondence tables Supplemental programming Supplemental programming Documentation Documentation Integration Integration Mark-up for web delivery Mark-up for web delivery

XML-Tagged Variable Text (Literacy)

Variable Description on Website (Literacy)

IPUMS-International Development Process 5. Data Enhancements Data editing Data editing Consistency edits Consistency edits Hot-deck imputation Hot-deck imputation

Missing Data Allocation Script (Occupation variable, USA) 5 dimensional table 324 cells

IPUMS-International Development Process 5. Data Enhancements Data editing Data editing Consistency edits Consistency edits Hot-deck imputation Hot-deck imputation Family interrelationship “pointers” Family interrelationship “pointers”

PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a Spouse’s Mother’sFather’s IPUMS “Pointer” Variables Location (Simple household)

PernumRelationshipAgeSexMarstChborn 1head53femaleseparated6 2child28malesinglen/a 3child22malesinglen/a 4child21malesinglen/a 5child25femalemarried2 6child-in-law28malemarriedn/a 7grandchild3malesinglen/a 8grandchild1malesinglen/a 9non-relative32femaleseparated2 10non-relative10malesinglen/a 11non-relative5femalesinglen/a Location Spouse’sFather’sMother’s IPUMS “Pointer” Variables (Complex household)

Rules for “Mother’s Location” Variable

IPUMS-International Development Process 6. Dissemination Documentation system Documentation system Preferences and dynamic content delivery Preferences and dynamic content delivery

Variable Codes Page – Current IPUMS System

IPUMS-International Development Process 6. Dissemination Documentation system Documentation system Preferences and dynamic content delivery Preferences and dynamic content delivery Data extraction system Data extraction system Sample, variable, and case selection Sample, variable, and case selection Advanced extract features Advanced extract features Input variables Input variables Full disclosure Full disclosure Reverse engineering Reverse engineering

End