Presentation on theme: "Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population."— Presentation transcript:
Disseminating census microdata: the IPUMS and IECM experiences, (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population Center and Centre d’Estudis Demogràfics (Global) (Europe portal) Disseminating census microdata: the IPUMS and IECM experiences, (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population Center and Centre d’Estudis Demogràfics (Global) (Europe portal) “Only used statistics are useful statistics.” -- Joint UNECE/Eurostat Meeting on Population and Housing Censuses inf.1
1. Discuss dissemination statistics from 59,170 extracts downloaded by IPUMS registered users 2. Invite 21 European partners to entrust 2010 round samples as expeditiously as possible 3. Invite non-partners to entrust samples of historical censuses (2000 and earlier rounds) as well as for the 2010 round 3 goals of presentation: IPUMS/IECM census microdata projects
no. of slides no. of slides 1. IPUMS-International: massive, global dissemination7 2. IPUMS-International: usage statistics9 3. Conclusion 2 Outline: Integrating census samples and metadata for timely dissemination via the IPUMS-International and IECM initiatives,
1. IPUMS-International: Massive, Global Integration and Dissemination “…best practice for a data repository of international statistical data” --Dennis Trewin chair UNECE task force on Statistical Confidentiality & Microdata Access See also: » 2006: "IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted access census microdata extracts to academic users," Monographs of official statistics: Work session on statistical data confidentiality. » 2009: Entrusting census microdata and metadata for timely integration and dissemination via the IPUMS-EurAsia and IECM initiatives, ECE/CES/GE.41/2009/23
IPUMS-International: » Begun in 1999, IPUMS-International is the world’s largest integrated demographic database: » 159 integrated, anonymized census samples (55 countries) » 325 million person records; 3,600 approved researchers » Database is likely to double over the next five years, by the addition of: » 2010 round samples of 17 current Eur-Asian partners: Armenia, Austria, Belarus, Canada, France, Greece, Hungary, Italy, Kyrgyzstan, Netherlands, Portugal, Romania, Slovenia, Spain, Switzerland, UK, USA, etc. » Samples for 8 Eur-Asian countries currently in development: Belgium, Czech Republic, Ireland, Germany, Poland, Turkey, Turkmenistan, Ukraine » Future partners? Albania? Bulgaria? Croatia? Estonia? Finland? …
59,170 extracts—586,643 variables—disseminated jumped 10% in June, with the 2010 launch » IPUMS-International NEVER disseminates source microdata! » 4 IPUMS constructed variables ranked in the top 30 » Spouse’s location in household » Mother’s location in household » Father’s location in household » Spouse rule for inferring location in household » These variables are constructed from household samples » 3 countries with person samples are invited to construct household samples: » Canada » Netherlands » UK
IPUMS-International dark green = integrated and disseminating (55 countries, 159 censuses, 325 millon person records) green = to be integrated (35 countries, 90 censuses, 150 mill.) Mollweide projection IPUMS-International 2011: Cambodia 2008 Egypt 2006 France 2006 Germany Indonesia Ireland etc. 2012: why not yours?
2011 launch at the 58 th Session ISI: Dublin, Aug 21-26, » European samples to be launched » France, 2006 » Germany ( ; DFR ‘71, ‘81) » Ireland ( ) » Beyond Europe, samples for: » Cambodia 2008 » Egypt 2006 » Jamaica, » Iran 2006 » Etc. » Successive annual launches planned for 2012, 2013, 2014.
Dissemination of microdata extracts via IPUMS-International » IPUMS-International NEVER disseminates source microdata! » Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentiality » IPUMS disseminates extracts, custom-tailored to researchers needs » Unlike most statistical agencies which disseminates an identical entire sample to every user
Dissemination of microdata and metadata extracts » The massive scale of IPUMS requires users to be selective: » Select country (or countries) » Select samples (census years) » Select variables (e.g., age, sex, educational attainment, etc.) » Select sub-populations (e.g., nurses) » Select sample density » Once an extract request is submitted, the IPUMS extract engine: » Constructs the microdata extract » Constructs the metadata » s the researcher to retrieve the extract password protected, transmission is encrypted 128 bit SSL » The researcher downloads the extract, un-zips and analyzes » Extract system validated as usage has soared
2. IPUMS-International Usage statistics 2. IPUMS-International Usage statistics See card hand-out for list of current samples and usage statistics
Usage Statistics (June 4, 2010) » 59,170 extracts (jumped 10% in June) » Average: 1,000 extracts per country » Smallest number of extracts: Kyrgyz Republic, 116 census of 1999; first year of availability » Largest number of extracts: Mexico, 7,637 6 censuses, 8 years of availability Mexico 2000: 2,464 extracts » Usage statistics by country: see Table 2 »
Table 2. Usage statistics: Sample Rank and Details Table 2. Extract Rank and Sample Details for the Top Five and all European Countries RankCountrySample %*Variables (n)*Years of census samplesExtracts 1Mexico p, 70, 90, 95, 2000, 057,637 2Brazil , 70, 80, 91, 20005,191 3United States , 70, 80, 90, 2000, 054,559 4Colombia p, 72, 85, 93, 20053,428 5France , 68, 75, 82, 90, 992,795 10Canada p, 81p, 91p, 2001p1,614 12Spain , 91, 20011,514 13Greece , 81, 91, 20011,496 19Hungary , 80, 90, 20011,132 21Austria , 81, 91, 20011,087 22Portugal , 91, 20011,028 23Romania , 92, 20021,012 23Austria , 81, 91, 20011,087 29UK , 2001p657 30Netherlands p, 71p, 2001p570 32Belarus Italy Slovenia Total extracts from the IPUMS-International database for 55 countries (158 samples) Jun 4, ,170 *2000 round census; refers to all integrated variables, including IPUMS constructed variables. “p” = person sample; all other samples are of households
Table most popular variables Table 3. Thirty-two most popular variables in IPUMS-International LabelExtractsMnemonicComment 1Educational attainment19,307EDATTAN 2Age (single years to 85+)19,009AGEGrouped age n=3,838 3Employment status18,490EMPSTAT 4Marital status18,214MARST 5Person weight17,511WTPERTechnical variable 6Relationship to head15,783RELATE 7Sex14,595SEX 8Class of work12,583CLASSWK 9Ownership of dwelling8,050OWNRSHP 10Occupation ISCO recode8,004OCCISCO 11School attendance7,919SCHOOL 12Years of schooling7,576YRSCHL 13Literate7,290LIT 14Urban/rural7,098URBAN 15Industry-general code7,044INDGEN 16Household weight6,656WTHHTechnical variable 17Children ever born6,363CHBORN 18Nativity (native/foreign born)6,332NATIVTY 19Occupation6,246OCC
Table most popular variables (cont.) Table 3. Thirty-two most popular variables in IPUMS-International LabelExtractsMnemonicComment 1Educational attainment19,307EDATTAN 19Occupation6,246OCC 20Country of birth6,153BPLCTRY 21Religion6,075RELIG 22Industry5,670IND 23Location of spouse in household5,007SPLOCConstructed (household) 24Rule for locating spouse4,171SPRULEConstructed (household) 25Location of mother in household4,153MOMLOCConstructed (household) 26Number of children surviving4,074CHSURV 27Place of residence 5 years ago4,064MGRATE5 28Location of father in household3,983POPLOCConstructed (household) 29Total household income3,965INCTOTHousehold variable 30Earned income3,655INCEARN 31Number of rooms3,465ROOMS 32Consensual union3,443CONSENS
For uses, see
And: scholar.google.com IPUMS & name of country, subject, etc.
Minimum Standards for Samples Entrusted to IPUMS for dissemination 1. Household samples only 2. High precision: 5% minimum, 10% preferred 3. Broad set of variables—omit only those required for statistical confidentiality (low-level geography, low frequency attributes) 4. Detailed codes » Age: single year to 85 » Occupation, industry: 3 digit ISCO, ISIC » Country of birth: detail individual countries consistent with statistical confidentiality » Thanks to INSEE France for sample of recensement renovee, : 20 million person records to be launched next year.
Conclusion: Invitation to continued cooperation » In 1999, our dream: integrate samples of 21 countries in 10 years » Thanks to generous cooperation of 55 National Statistical Offices » Undreamed technological innovations » By 2009, integrated samples for 44 countries » Number of users and usage far exceeded expectations » For the 2010 decade, our dream: » Double the number of users » Double the number of integrated samples » Re-draw samples that do not meet minimum standards, where feasible » Participating statistical agencies: please entrust 2010 samples in due course » Other statistical agencies: entrust series of samples for each census for which microdata exist
…and to the 58 th Session ISI: Dublin, Aug 21-26, » IPUMS Workshop, Aug » New IPUMS initiatives » Reports by IPUMS users » Reports by National Statistical Office-partners » IPUMS sponsorship for delegates from participating countries: » economy air, » registration fees, » 8 nights accomodations and modest per-diem » Simultaneous interpretation: Russian/French/English