Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center

Similar presentations


Presentation on theme: "Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center"— Presentation transcript:

1 Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center sobek@pop.umn.edu

2 IPUMS-International Development Process 1. Acquisition 2. Metadata Preparation 3. Data Preparation 4. Harmonization 5. Data Enhancements 6. Dissemination

3 IPUMS-International Development Process 1. Acquisition Data Data Data dictionary Data dictionary Census questionnaire and instructions Census questionnaire and instructions Sample design Sample design

4 IPUMS-International Development Process 2. Metadata Preparation English translation English translation Data dictionaries Data dictionaries

5 Original Data Dictionary (Kenya 1989)

6 Original Data Dictionary (Romania 1992)

7 Original Data Dictionary (China 1982)

8 Original Data Dictionary (Mexico 1990)

9 Variable Labels File – IPUMS Metadata (Costa Rica 2000)

10 IPUMS-International Development Process 2. Metadata Preparation English translation English translation Data dictionaries Data dictionaries Questionnaires and instructions Questionnaires and instructions

11 Census Questionnaire (Mexico 2000) WaterAccess

12 Text of Census Questionnaire (Mexico 2000)

13 XML-Tagged Census Questionnaire (Mexico 2000) Source variable MX00A016 MX00A017 MX00A018 (water access)

14 Source variable MX00A018 XML-Tagged Census Instructions (Mexico 2000)

15 IPUMS-International Development Process 3. Data Preparation Data reformatting Data reformatting

16 geographyhousing person (head) person (child) geographyhousingperson (head) geographyhousingperson (child) geographyhousingperson (child) geographyhousingperson (head) geographyhousingperson (spouse) geographyhousingperson (child) geographyhousingperson (child) geographyhousing person (head) person (spouse) person (child) (Brazil 1980) (Person records only; household data duplicated on person records) Reformat Rectangular Sample

17 dwelling household person (head) person (spouse) person (child) household person (head) person (child) person (head) person (spouse) dwelling household dwellinghousehold person (head) person (spouse) person (child) dwellinghousehold person (head) person (child) dwellinghousehold person (head) person (spouse) (Chile 1992) (Separate dwelling and household records) Reformat Dwelling-Household-Person Sample

18 serial 001head serial 001spouse serial 002head serial 002child serial 003head serial 001geog & housing serial 002geog & housing serial 003geog & housing serial 001household serial 001head serial 001spouse serial 003household serial 002household serial 002head serial 002child serial 003head Household File Person File (Brazil 2000) Merge Separate Household and Person Files

19 Data File before Reformatting

20 Data File after Reformatting

21 IPUMS-International Development Process 3. Data Preparation Data reformatting Data reformatting Draw samples Draw samples Confidentiality measures Confidentiality measures Stratified using geography, ethnicity, hh size, Stratified using geography, ethnicity, hh size, hh type, SES; adjusted as necessary for census hh type, SES; adjusted as necessary for census Limit geographic specificity Limit geographic specificity Swap across geographic units Swap across geographic units Randomize order within geographies Randomize order within geographies Merge small variable categories Merge small variable categories Top-code sensitive numeric variables Top-code sensitive numeric variables

22 IPUMS-International Development Process 4. Harmonization Data Data Translation tables Translation tables

23 Translation Table – Marital Status China1982Colombia1973Kenya1989Mexico1970U.S.A.1990

24 General Codes

25 IPUMS-International Development Process 4. Harmonization Data Data Translation tables Translation tables Supplemental programming Supplemental programming

26 Supplementary Variable Programming (INCTOT)

27 IPUMS-International Development Process 4. Harmonization Data Data Correspondence tables Correspondence tables Supplemental programming Supplemental programming Documentation Documentation Integration Integration Mark-up for web delivery Mark-up for web delivery

28 XML-Tagged Variable Text (Literacy)

29 Variable Description on Website (Literacy)

30 IPUMS-International Development Process 5. Data Enhancements Data editing Data editing Consistency edits Consistency edits Hot-deck imputation Hot-deck imputation

31 Missing Data Allocation Script (Occupation variable, USA) 5 dimensional table 324 cells

32 IPUMS-International Development Process 5. Data Enhancements Data editing Data editing Consistency edits Consistency edits Hot-deck imputation Hot-deck imputation Family interrelationship “pointers” Family interrelationship “pointers”

33 PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a Spouse’s Mother’sFather’s IPUMS “Pointer” Variables Location 2 1 0 0 0 0 0 0 00 0 0 21 1 1 2 2 (Simple household)

34 PernumRelationshipAgeSexMarstChborn 1head53femaleseparated6 2child28malesinglen/a 3child22malesinglen/a 4child21malesinglen/a 5child25femalemarried2 6child-in-law28malemarriedn/a 7grandchild3malesinglen/a 8grandchild1malesinglen/a 9non-relative32femaleseparated2 10non-relative10malesinglen/a 11non-relative5femalesinglen/a Location 0 0 0 0 0 6 5 0 0 0 0 0 0 1 1 1 1 0 5 5 0 9 9 0 0 0 6 6 0 0 0 0 0 Spouse’sFather’sMother’s IPUMS “Pointer” Variables (Complex household)

35 Rules for “Mother’s Location” Variable

36 IPUMS-International Development Process 6. Dissemination Documentation system Documentation system Preferences and dynamic content delivery Preferences and dynamic content delivery

37 Variable Codes Page – Current IPUMS System

38 IPUMS-International Development Process 6. Dissemination Documentation system Documentation system Preferences and dynamic content delivery Preferences and dynamic content delivery Data extraction system Data extraction system Sample, variable, and case selection Sample, variable, and case selection Advanced extract features Advanced extract features Input variables Input variables Full disclosure Full disclosure Reverse engineering Reverse engineering

39 End sobek@pop.umn.edu


Download ppt "Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center"

Similar presentations


Ads by Google