Entrusting census microdata and metadata for timely integration and dissemination via the IPUMS-EurAsia and IECM initiatives, 2010-2014 * * * Robert McCaa,

Slides:



Advertisements
Similar presentations
Disseminating census microdata: the IPUMS and IECM experiences, (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.
Advertisements

Albert Esteve and the IECM-project team Centre d’Estudis Demogràfics Universitat Autònoma de Barcelona T HE I NTEGRATED E UROPEAN.
Communication ICM Bodrum, 21st of October The European Law Students’ Association Albania ˙ Austria ˙ Azerbaijan ˙ Belgium ˙ Bosnia and Herzegovina.
UNIVERSITY OF JYVÄSKYLÄ INTERNATIONAL COOPERATION.
We’re here for you. “European Exchange of Best Practice in Arson Investigation and Prevention” European exchange of best practice in arson investigation.
Welcome IPUMS/IECM-Europe Workshop: Accomplishments, plans and challenges * * * Robert McCaa, Professor of.
IPUMS workshop * * * Robert McCaa, Professor of Population History University of Minnesota additional information.
Census 2000 symposium, session 4 paper 261 Archiving Census Documentation and Microdata: Preserving Memory, Increasing Stakeholders * * * Wendy L. Thomas.
A proposal to preserve, integrate and manage access to anonymized census samples of the Official Statistical Agencies of the Arab States in cooperation.
Hist.umn.edu/~rmccaa/ipums-europe1 Sister-project: IPUMS-Latin America: 17 countries, ~500 million pop., 5 census rounds 80+ samples, 100+ million person.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Steven Ruggles Minnesota Population Center.
5. Integration of Microdata and Metadata (9 slides)
IPUMS-EurAsia, : Changing Patterns of Microdata Use * * * Robert McCaa, Professor of Population History University.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Robert McCaa and Steven Ruggles Minnesota.
The IECM project: Integrating the European Census Microdata IECM team* *A. Cabré, A. Esteve, J.Garcia, T. López, M. Valls PROJECT.
IPUMS-International: August * * * Robert McCaa, Professor of Population History University of Minnesota
Delegations ICM Cluj-Napoca, 20th April The European Law Students’ Association Albania ˙ Austria ˙ Azerbaijan ˙ Belgium ˙ Bosnia and Herzegovina.
Delegations III KAM, Bratislava 4th to 8th September 2013.
Knowledge Management LXV International Council Meeting Qawra, Malta 16 th - 23 rd of March 2014.
Study Visits ICM Croatia, Opatija, 27th October to 3th November 2013.
Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center
Institutional Visits IV KAM Prague, 3 rd to 7th September.
Knowledge Management and Transition ICM Cluj-Napoca, 24th April 2015.
Delegations IV KAM Prague 3rd to 7th September 2014.
Institutional Visits ICM Cluj Napoca, 19 th to 26 th April 2015 Patrick Zischeck, Assistant for IV and SV.
ELSA Shop(ping) LXIV International Council Meeting Opatija, Croatia October 28 th - November 3 rd 2013.
* * * Robert McCaa and Albert Esteve Palos IPUMS-International and Integrated European Census Microdata.
Institutional Visits III KAM, Bratislava 4th to 8th September 2013.
IS Studies Accreditation: Problems and Challenges Janice C. Sipior, Ph.D. Professor of MIS Department of Accountancy & IS Villanova School of Business.
Area Definition III KAM,Bratislava. The European Law Students’ Association Albania ˙ Austria ˙ Azerbaijan ˙ Belgium ˙ Bosnia and Herzegovina ˙ Bulgaria.
ELSA Law Schools ICM Cluj-Napoca, 21st April 2015.
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
Emerging methodologies for the census in the UNECE region Paolo Valente United Nations Economic Commission for Europe Statistical Division International.
Grants LXIV International Council Meeting 19th – 26th October, Bodrum Turkey.
Design and Use of the IPUMS-International Data Serieshttp://international.ipums.org Matt Sobek Minnesota Population Center
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Make it Smart&Creative ICM Cluj-Napoca, 21st April 2015.
Trans-Border access to Census Microdata: The IPUMS-IECM partnership * * * Robert McCaa and Albert Esteve Palós “You have to.
Doing Business in Europe Bay Area CITD Seminar Series Tuesday, September 21st, 2004 Kemarra Inc. - Key Marketing Resources & Associates San Francisco USA.
ICM Bodrum 24 th October AA Workshop Legal Research Group.
NextLastEurope. NextLastEurope  The region of Europe is the area on the map shaded dark purple. Europe.
Europe Research PowerPoint Each group (2-3) must choose two countries from Europe and create a PPT that teaches their classmates about those nations.
Institutional Visit LXV International Council Meeting Qawra, Malta 16 th - 23 rd of March 2014.
ELSA as the Franchise? LXV International Council Meeting Qawra, Malta 16 th - 23 rd of March 2014.
UNSD/STATISTICS KOREA International Seminar on Population and Housing Censuses: Beyond the 2010 Round Seoul, November 2012 Managing the cost of the.
EXTREME MAKEOVER Members’ Magazine LXIV International Council Meeting Opatija, Croatia October 28 th - November 3 rd 2013.
Map - Region 3 Europe.
ELSA Summer Law Schools IV KAM Prague, 3rd to 7th September 2014.
European Federation of Public Service Unions (EPSU)
Europe. Albania AL Austria Belarus Belgium.
Delegations LXV International Council Meeting Qawra, Malta 16 th - 23 rd of March 2014.
United Nations Economic Commission for Europe Statistical Division UNECE Databases David Boko UNECE Statistical Division.
Computer Class – Summer 20092/21/2016 3:45 AM European Countries Albania Andorra Austria Belarus Belgium Bosnia and Herzegovina Bulgaria Croatia Czech.
Geography Review On Map 1, please identify: -Spain -France -England -Russia -Ottoman empire -Persia -China -Mughal India -Songhai Empire.
Country EPS-12 Total (with ICPS) Hungary7979 Germany5559 Romania3841 Ukraine2527 United Kingdom1930 Finland1842 France1616 Italy1616 Poland1313 Switzerland1314.
The European Law Students’ Association Albania ˙ Austria ˙ Azerbaijan ˙ Belgium ˙ Bosnia and Herzegovina ˙ Bulgaria ˙ Croatia ˙ Cyprus ˙ Czech Republic.
1. Introduction 2. Background 3. Funding framework 4. EU participation 5. Timetable 6. Progress report 7. Future plans I ntegrating the E uropean C ensus.
Robert McCaa Antonio López Gay Representing IPUMS – International Project Minnesota Population Center / University of.
LXVI Internationl Council Meeting Turkey 19th – 26th of October 2014 Academic Activites Workshop Monday 20th of October –
Table 1. Number and rate of Legionnaires’ disease cases per population by country and year, EU/EEA, 2010–2014 ASR: age-standardised rate, C: case-based.
CONFIDENTIAL 1 EPC, European Union and unitary patent/UPC EPC: yes EEA: no EU: no (*) (*) Also means no unitary patent Albania, Macedonia, Monaco, San.
France Ireland Norway Sweden Finland Estonia Latvia Spain Portugal Belgium Netherlands Germany Switzerland Italy Czech Rep Slovakia Austria Poland Ukraine.
Integrating the European Census Microdata
Welcome IPUMS/IECM-Europe Workshop: Accomplishments, plans and challenges * * * Robert McCaa, Professor.
Намалување на загадувањето на воздухот со електромобилност
Adriatic Persian Gulf Map Test #1 Answers.
“Integrating Microbial Knowledge into Human Life”
Adriatic Persian Gulf Map Test #1 Answers.
European representation of respiratory critical care HERMES participants. European representation of respiratory critical care HERMES participants. Countries.
Danilo Dolenc Statistical Office of the Republic of Slovenia
Presentation transcript:

Entrusting census microdata and metadata for timely integration and dissemination via the IPUMS-EurAsia and IECM initiatives, * * * Robert McCaa, Albert Esteve and Patricia Kelly-Hall Minnesota Population Center and Centre d’Estudis Demogràfics

no. of slides no. of slides 1. IPUMS-International: “Best practice”3 2. The IECM Project: a European Flavor5 3. Census output needs:4 a. Form “A”: succinct descriptions of both census and microdata b. Metadata: questionnaires, instructions, dictionaries, codebooks as images,.txt,.doc,.xls,.pdf, XML, SDMX, CSPro, IMPS, DDI, etc. c. Microdata: to prepare, choose 1 of 4 modalities; entrust as encrypted, executable files ( or fax password) 4. Conclusion 2 Outline: Entrusting census microdata and metadata for timely integration and dissemination via the IPUMS-EurAsia and IECM initiatives,

What is IPUMS-International? “…best practice for a data repository of international statistical data” --Dennis Trewin chair UNECE task force on Statistical Confidentiality & Microdata Access

IPUMS-International: » Begun in 1999, IPUMS-International is the world’s largest integrated demographic database: » 130 integrated, anonymized census samples (44 countries) » 279 million person records; 3,000+ approved researchers » Database is likely to double over the next five years, by the addition of: » 2010 round samples of 17 current partners: Austria, Belarus, Canada, France, Greece, Hungary, Israel, Italy, Kyrgyzstan, Netherlands, Portugal, Romania, Slovenia, Spain, Switzerland, UK, USA, etc. » Samples for 5 countries currently in development: Belgium, Czech Republic, Ireland, Germany, Turkey » Future partners? Albania? Bulgaria? Croatia? Estonia? Finland? Kazahkstan? Latvia? Lithuania? Poland? Russian Federation? Serbia? Slovakia? Ukraine? FYR Macedonia? Others?

IPUMS-International dark green = integrated and disseminating (44 countries, 130 censuses, 279 millon person records) green = to be integrated (35 countries, 90 censuses, 150 mill.) Mollweide projection IPUMS-EurAsia : Germany Indonesia Ireland Nepal Pakistan Switzerland Thailand : why not yours?

The IPUMS-International team May 14, 2009 with NSF over-sight board (Not present: computer gurus, some researchers, research assistants, civil service employees, and others who were absent from the National Science Foundation Board meeting) Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center

Constructing the IPUMS-International integrated metadata and microdata system » IPUMS-International NEVER disseminates source microdata! » 5 step process of integration--2+ years invested in integrating metadata and microdata: 1.*Confirm the integrity and validity of source microdata and metadata 2.*Draw and anonymize high precision samples 3.Integrate microdata sample 4.Integrate metadata 5.Confirm the integrity and validity of the integrated microdata sample and metadata » *Steps 1 & 2 conducted by commissioned senior staff » Original source microdata never disseminated » Violation of confidentiality: subject to civil fine ($250,000) and/or criminal prosecution

5 step process of integration in the IPUMS system 3.Integrate microdata Composite coding scheme toComposite coding scheme to 1)preserve every significant detail and 2)harmonize every code Example: marital statusExample: marital status … 200 = married200 = married 210 = married, formal210 = married, formal 211 = married, civil211 = married, civil 212 = married, religious212 = married, religious ….…. 220 = married, informal (consensual)220 = married, informal (consensual) …

5 step process of integration in the IPUMS system 4.Integrate metadata (XML): Document every census, sample, variable and code: Source documents (pdf) in official language and EnglishSource documents (pdf) in official language and English Dynamic metadata system—compare any combination of countries and samples:Dynamic metadata system—compare any combination of countries and samples: wording of any census question and instructions to field workerswording of any census question and instructions to field workers Characteristics of each census and sampleCharacteristics of each census and sample Describe each variable: “universe”, definition, comparability, etc.Describe each variable: “universe”, definition, comparability, etc.

5 step process of integration in the IPUMS system 5.Confirm integrity and validity of each sample Before launch, each sample is scruplously checkedBefore launch, each sample is scruplously checked Test each integrated variable against non- harmonizedTest each integrated variable against non- harmonized Each integration decision may be checked by any researcher using integrated vs. non-harmonizedEach integration decision may be checked by any researcher using integrated vs. non-harmonized External evaluation by INDEC-Argentina (commissioned by IPUMS), 4 censuses ( )External evaluation by INDEC-Argentina (commissioned by IPUMS), 4 censuses ( ) Compared each variable, code and metadata against original source data and documentationCompared each variable, code and metadata against original source data and documentation Tens of thousands of words, codes, and frequencies tested—only a handful of errors, mis-interpretations or mis- understandings.Tens of thousands of words, codes, and frequencies tested—only a handful of errors, mis-interpretations or mis- understandings.

The IECM project Integrated European Census Microdata The IECM project Integrated European Census Microdata

PROJECT OVERVIEW | COORDINATION | HARMONIZATION | DISSEMINATION Disseminating: Austria, Belarus, France, Greece, Hungary, Italy, Netherlands, Portugal, Romania, Spain, Slovenia, United Kingdom Harmonizing: Czech Republic, Germany Ireland, Switzerland (next release), Turkey Negotiating: Belgium, Bulgaria, Latvia, Poland, Russia, Ukraine Contacted: Finland, Iceland, Lithuania, Moldova, Norway, Slovak Republic

Variables Included in Extracts Under-represented: geography, migration, ethnicity Harmonization increases usability and accessibility

Samples extracted Users statistics July – Dec 2008 Extracts by user’s country of residence 634 France 537 Greece 441 Spain 408 Austria 404 Hungary 340 Portugal 185 United Kingdom 179 Netherlands 85 Belarus 164 Spain 105 Italy 102 France 90 Germany 81 United Kingdom 45 Greece 37 Netherlands 21 Belgium 18 Czech Republic 17 Denmark 17 Switzerland 16 Austria 12 Ireland 6 Romania 6 Portugal 2 Poland

PROJECT OVERVIEW | COORDINATION | HARMONIZATION | DISSEMINATION I ntegrated E uropean C ensus M icrodata CoordinationHarmonization Dissemination Meetings: Barcelona 2005 Paris 2006 Lisbon 2007 Barcelona 2008 Integrated Documentation Intra-European classifications Mirror site Additional documentation Data Browser / Online Tabulator

The IECM project—addendum. New tools for data analysis Prototype of on-line tabulator of integrated variables PROJECT OVERVIEW | COORDINATION | HARMONIZATION | DISSEMINATION How are we currently disseminating the IECM census microdata? - Through an extraction system where users can create custom tailored microdata samples Why a data browser? - Fast and convenient tool to explore the contents of the database before making an extract - It prevents users from downloading microdata (if only basic figures are needed) Some caveats - We are not providing official statistics - Frequencies are not based on 100% population counts -Sampling errors must be calculated - Compared to microdata, cross-tabulated data have les s analyitical power

The online tabulator based on Redatam

C ENSUS MICRODATA FANS …

Census Output Needs: 1. Succinct description of census and microdata (Form “A”) 2. Comprehensive metadata: questionnaires, instructions, codebooks 3. Encrypted microdata Prof. Robert McCaa Minnesota Population Center 50 Willey Hall, th Ave. S. Minneapolis MN Tel , Ship FEDEX prepaid ( for account #) to: Prof. Robert McCaa Minnesota Population Center 50 Willey Hall, th Ave. S. Minneapolis MN Tel ,

1. Need for succinct, authoritative documentation of census and microdata: Form “A” » Efficient processing of metadata & microdata » Form “A”: » See Appendix A for details » Appendix B is the completed form for Spain--censuses of 1981, 1991, 2001 » click the name of a country to view samples » Describe the census: name, population universe, reference date, field work period, etc. » Describe the microdata: source, sample design, sample unit, sample fraction, size, weights, etc. » Define units in the microdata: private household, collective dwelling, included/excluded populations, etc.

2. Metadata needs see paragraphs for additional details » Documents in any form:.pdf,.txt,.doc,.xls,.pdf, XML, SDMX, DDI, CSPro, IMPS, etc. » Copies in official language and English: Essential: 1. Questionnaires 2. Instructions to interviewers 3. Codebooks, data dictionaries Helpful: Helpful: 4. Correspondence tables (e.g., occupation with ISCO08/88) 5. Summary official results 6. Technical, methodological reports 7. Sample design: preferred, every tenth private household; for collective dwellings (e.g., hospitals), every tenth person. 8. Boundary files for administrative geography coded in microdata

3. Microdata needs see paragraphs for additional details » 2 goals: 1. Permanently archive source microdata against loss (copies provided exclusively to the National Statistical Agency owner) 2. Integrate high precision, anonymized household samples into database » We prefer 100% microdata, particularly from developing countries where microdata are at risk of loss » Note: some European statistical offices can no longer locate census microdata for 1960s, 1970s, 1980s and even 1990s! » Or even where they can locate it, are unable to make the data useable » 4 modalities for entrusting microdata: % microdata to MPC: 38 countries 2. Samples provided by National Statistical Office: Multi-use samples also entrusted to MPC: Samples constructed by Research Institute upon request of NSO: 6 » License fee: US$5,000 for dataset of 1 million plus records

3. Microdata needs see paragraphs for additional details » High precision, household samples » 10 percent: 70 of 130 samples currently available » 5 percent: 28 » <5 percent: 32 (8 constitute all that survives) » Systematic random samples : » every n th private household after a random start » Collective dwellings: every n th person » extremely fine geographic stratification with proportional weighting » NUTS-2, NUTS-3 » Anonymization, performed by NSO or MPC In addition to sampling, 6 layers of technical protections: 1. Suppress small places or residence, work, school, etc. 2. Suppress codes of social categories with small counts 3. Top and Bottom coding of continuous variables 4. Suppress sensitive variables 5. Swap small % of households into different place of residence 6. Randomly order all household

Conclusion » Thanks to: » National Statistical Offices for trust and cooperation » International organizations for support and encouragement » Researchers for using of IPUMS integrated datasets » Invitation to: » National Statistical Office partners to entrust 2010 round microdata and metadata with Form “A” » National Statistical Offices that are not yet cooperating to participate to integrate pre-2010 census microdata » And…

…to the 58 th Session ISI: Dublin, Aug 21-26, » IPUMS Workshop, Aug » Microdata sessions » IPUMS Funding for delegates from developing countries » IPUMS booth

Thank you!!