Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods, Series F,

Similar presentations


Presentation on theme: "1 Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods, Series F,"— Presentation transcript:

1 1 Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods, Series F, No.82

2 2 U.S. Census Bureau International Programs Center Microcomputer Processing of Census and Surveys (using the Census and Survey Processing System – CSPro)

3 3 Forms To Products Data User Processing of Census, Survey or Other Form Computer

4 4 Two ways of thinking Information Computer Products Data File Reports Tables Thematic Maps Graphs … Questionnaire

5 5 Data Processing Stages: 1. Get ready for enumeration 2. Monitor and evaluate enumeration 3. Capture the data 4. Validate the data [edit] 5. Produce products

6 6 Is there a Magic Button to help us?

7 7

8 8 Forms To Data Capture Data File Data Capture

9 9 Data Products n Tabulations n Graphs n Maps

10 10 The Goal n Produce useful products from census/survey information. n Useful products are those that meet the needs of the user community. n Produce these products in a quick and efficient manner

11 11 Resource Criteria for Census and Survey Processing n Time n Accuracy n Money n Staff n Regularity n Products

12 12 Data Processing Easy as 1-2-3? 1. Capture the information 2. Validate the information 3. Produce the data products

13 13 What software? n There is a lot of data processing software available! n Which best fits your needs? n Do you need training? n Do you need money? n Do you need help?

14 14 Why use CSPro? n Designed for census & survey processing n Easy to use n Modular in design n Can be used by novices and/or experts n Free n Excellent support n Windows environment

15 15 Census and Survey Processing Software (CSPro) n Tabulations n File descriptions (dictionary) n Data entry applications n Edit applications n Dissemination Products

16 16 CSPro Census and Survey Processing is a public-domain software package for n Entering Tabulating n Editing Mapping 1. Create Products [tables, maps, etc.] 2. Disseminate the results Census and Survey data CSPro was designed and implemented through a joint effort among the developers of IMPS and ISSA: the United States Census Bureau, Macro International, and Serpro, S.A. Funding for the development is provided by the Office of Population of the United States Agency for International Development. CSPro is designed to eventually replace both IMPS and ISSA.

17 17 Data Dictionary n The data dictionary is the base for most of the parts of CSPro n These parts include: n Data entry (CSEntry) n Data editing (CSBatch) n Tabulation

18 18 Data File Design n How are data stored in the data file? n What is a case? n What is on a record? n How many records?

19 19 Objectives: 1. Understand elements of a data file 2. Describe a field, record, and questionnaire 3. Describe data file structures 4. Learn how the CSPro data dictionary defines these elements

20 20 Needed information about data file n Need identification fields n Need information / data fields n Need “SIZE” [how many characters] n Need valid values/codes

21 21 Data File Structure ASCII/text n ALL data on ONE record/line n Different types of data on DIFFERENT record/lines

22 22 Data Processing/Data File Terminology n Item/variable/field n Record n Questionnaire (Case) n Data file

23 23 Item/variable/field n Is a single piece of information n Has the attributes of: Size Type Numeric/Alphanumeric Age 51 Sex M Income

24 24 Record n a collection of related items forming a single line of information. For example: n Housing Record contains information about the house n Population Record Contains information about each person in the house

25 25 Case/Questionnaire n all the records of all types for a processing unit such as a household

26 26 A data file is n a collection of all the questionnaires (cases)

27 27 CSPro Data Dictionary 1. Field names/labels 2. Field size 3. Field location 4. Field attributes 5. Record names 6. Record types 7. Records ID’s 8. Records allowed by type

28 28 Questionnaire sections

29 29 One Section ==> One record

30 30 From the questionnaire to the data file (one record type)

31 31 From the questionnaire to the data file (What are the data?)

32 32 From the questionnaire to the data file (Where are the data?)

33 33 Data Dictionary describes the data file

34 34 CSPro Support n Web site: n [End of CSPro demonstration]

35 35 UN Editing Handbook n Uses Principles and Recommendations as base n Covers how editing fits into whole process n Describes different types of edits n Gives examples

36 36 Purpose of Handbook n No census data are ever perfect n Changes are made -- little documentation n Promote communication between subject specialists and programmers n “Cookbook” of suggestions -- presents possible resolutions n But country edit teams must decide

37 37 Major Elements in a Census n Preparatory work n Enumeration n Data processing -- keying, editing and tabulations n Building data bases and dissemination n Evaluation of results n Analysis of results

38 38 Errors in Census Process n Coverage Errors n Questionnaire Design n Enumerator/respondent errors n Coding errors n Data entry errors n Computer editing errors n Tabulation errors

39 39 Errors Generated During Census Processing Activity Type of Error ┌───────────────────┐ │ Enumeration │ Respondent errors └─────────┬─────────┘ Enumerator errors V ┌─────────┴─────────┐ │ Field Editing │ Field checking └─────────┬─────────┘ Office checking V ┌─────────┴─────────┐ │ Office Coding │ Miscodes └─────────┬─────────┘ V ┌─────────┴─────────┐ │ Data Capture │ Miskeys └─────────┬─────────┘ V ┌─────────┴─────────┐ Logic Errors │ Computer Editing │ Misallocation └─────────┬─────────┘ Miscorrection V ┌─────────┴─────────┐ │ Tabulation │ Distribution of └─────────┬─────────┘ unknowns V ┌─────────┴─────────┐ │ Publication │ Misprints └───────────────────┘

40 40 Editing in Historical Perspective n Before computers: manual editing n With computers: Increased complexity n Automated changes n Generalized editing packages n New philosophies of editing n Personal computers n Appropriate levels of computer editing

41 41 Editing Team n Appropriate internal subject matter specialists n Computer Programmers n Work together as a team n Edit Specs as means of communication n Outside experts -- academicians n Outside experts -- private sector

42 42 WHAT CENSUS EDITING SHOULD DO 1 Give users measures of the quality of the data 2 Identify the types and sources of error, and 3 Provide adjusted census results

43 43 Sample table with & without unknowns

44 44 Table showing trends with unknowns

45 45 Basics of Census Editing n Systematic inspection and change (not always correction) n Fatal edits -- invalid or missing entries n Query edits -- inconsistencies n Must preserve the original data as much as possible n Quality enumeration more important than editing n Edit does not improve data quality -- makes more esthetic n Team must determine how far to do

46 46 More of Basics n Over-editing is harmful n Treatment of unknowns n Spurious changes n Determining tolerances n Learning from the edit process n Quality assurance n Costs of Editing n Imputation n Archiving

47 47 How Over-editing is Harmful n Timeliness n Finances n Distortion of true values n A false sense of security

48 48 Editing Applications n Manual versus automatic correction n Guidelines for correcting data n Validity and consistency checks n Methods of correcting and imputing data n Other editing systems

49 49 Manual versus Automatic Correction n Manual correction: takes a long time and very subject to error n Automatic correction: faster and consistent. n Not necessarily correct, just consistent. n Can look at many variables at the same time n Can keep an audit trail

50 50 Guidelines for Correcting Data n Make the fewest required changes possible to the originally collected data n Eliminate obvious inconsistencies among the entries n Systematically supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or group n When appropriate, use “not reported”

51 51

52 52 Dangers in editing n Male with fertility – so fertility deleted n Second male in spouse pair made female n Then, Female without fertility – so fertility imputed n So, before one error – now the initial error remains, but we have three MORE errors

53 53 Example of B-A-D Edit Changes PersonRelationshipSexChildren ever born Unedited data 1Head of householdMale03 2SpouseMaleBLANK Data after editing for sex 1Head of householdFemale03 2SpouseMaleBLANK

54 54 Sample house for hotdeck example ID numberRelationshipSexAge [ ] [ ] [ ] 13 [ ] 44 36

55 55 Initial and Final Hot Deck Values for single family Initial valuesRelationships Head of householdSpouseSon/daughterOther relativeNon-relative (1)(2)(3)(4)(5) Male (1) Female (2) Values after changes Relationships Head of householdSpouseSon/daughterOther relativeNon-relative (1)(2)(3)(4)(5) Male (1) Female (2)

56 56 Validity and Consistency Checks 1. Top-down editing approach 2. Multiple variable edit 3. Coding considerations

57 57 1. Top Down Approach: Order of Edits n HOUSING VARIABLES ON QUESTIONNAIRE n Type of Dwelling n Rooms n Walls n Roof n Tenure n HOUSING VARIABLES– ORDER OF EDITS n Tenure n Type of Dwelling n Rooms n Walls n Roof

58 58 2. Multiple Variable Approach – Young Widowed Head with 3 Children Number RuleRelationSexAgeMarStatFertility 1Head of household should be 15 years or older11 2Spouse should be 15 years or older 3A “spouse” should be married 4If spouse present, head of household should be married 5If spouse present, head of household and spouse should be opposite sex 6Person less than 15 years old should be never married11 7Male should have no fertility 8Female less than 15 years old should have no fertility11 9For female 15 years or older fertility entry should not be blank 10A “child” should be younger than head of household 11A “parent” should be older than head of household Totals

59 59 3. Common Codes Assist in Editing GroupBirthplaceCitizenshipLanguageEthnicity France/French10 Spain/Spanish20 Latin America Philippines/Filipino30 Ilokano32 Tagalog32 England/English40 Canada USA

60 60 Methods of Correcting and Imputing Data 1. Change to unknown 2. Static or “Cold Deck” imputation 3. Dynamic or “Hot deck” imputation

61 61 1. Changing to Unknown – When you don’t have enough information n Usually in censuses, we don’t have enough information to get a good estimate of paid occupations and industries: n If not OCCUPATION in 001:997 then n errmsg (“Occupation is invalid, assign unknown”); n OCCUPATION = 998; n Endif;

62 62 Changing to unknown: Countries choosing not to impute n These days, most countries impute at least items needed for planning and policy determination n If a country still decides not to impute n Then, staff might assign “unknown” even items used for planning: n If SEX is not 1 or 2 then n SEX = 9 n endif

63 63 2. Static Imputation – Making young people “Never married” n In Static Imputation, the same value or values are always assigned: n If AGE < 15 then n if MARITAL_STATUS <> NEVER_MARRIED then n errmsg (“Young person not never married”); n MARITAL_STATUS = NEVER_MARRIED; n endif; n Endif;

64 64 A kind of static imputation: changing using logical values n Since we have only two sexes, we can alternate between them when invalid or inconsistent values appear: n Keep a cell in the computer’s buffer: XSEX n If SEX is unknown then n let SEX = XSEX n change XSEX to the other SEX for the next usage n endif

65 65 3. Hot Deck Imputation n Geographic considerations n Use of related items n Sequence of the items n Complexity of the matrices n Standardized hot decks n Size of hot decks -- too big, audit trail, too small, difficult items

66 66 Types of Edits n Structure edits – Bookkeeping, getting each locality within each minor civil division within each major civil division n Content edits – Housing items n Content edits – Population items n Content edits – Inter-record checking

67 67 Standard Edit: Language Edit n If this is the head and language is missing, first look for someone else in the house with language, and assign that. n If this is the head without language, no one else has language, use neighboring head of similar characteristics to assign a best guess. n If this is someone else in the house and language is missing, assign the head’s language.

68 68 n PROC LANGUAGE n errmsg (" ******* Language ************ "), summary; n {. n. ****************************************************************************** n. ************** **************** n. ************** Language edit **************** n. ************** **************** n. ****************************************************************************** n.} n n if LANGUAGE in 1:17 then n if RELAT = 1 then n ALANGUAGE (AGE10,SEX) = LANGUAGE; n endif; n else n if RELAT = 1 then n PERSONPTR = 0; n do varying i = 1 until i > TOTOCC (POP_EDT) n if LANGUAGE (i) in 1:17 then n PERSONPTR = i; n endif; n enddo; n if PERSONPTR = 0 then n errmsg("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", n PERSNUM,LANGUAGE) denom = denomPop summary; n F1F2(); n write("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", n PERSNUM,LANGUAGE); n impute( LANGUAGE, ALANGUAGE (AGE10,SEX)); n else n errmsg("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE, n pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", n PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)) denom = denomPop summary; n F1F2(); n write("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE, n pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", n PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)); n impute (LANGUAGE, LANGUAGE (PERSONPTR)); n endif; n else n F1F2(); n errmsg("*D05-4* LANGUAGE imputed from Head's LANGUAGE, lang %d langhd %d", n LANGUAGE,LANGUAGE(headpt)) denom = denomPop summary; n write("*D05-4* LANGUAGE imputed from Head's LANGUAGE, pn= %02d, lang= %01d, n langhd= %01d",persnum,LANGUAGE,LANGUAGE(headpt)); n impute (LANGUAGE, LANGUAGE (1)); n endif;

69 69 Language OK and head, update the hotdeck n For the Standard edit, if the variable is valid, we update the hot deck. n This is the code: n if LANGUAGE in 1:17 then n if RELAT = 1 then n ALANGUAGE (AGE10,SEX) = LANGUAGE; n endif; n

70 70 Single person house, get language from nearby house n Normally, we want to look for others in the house with the variable. n But, in one-person houses, no one else to look at, so we have to impute: n if RELAT = 1 then n if TOTOCC (POP_EDT) = 1 then n errmsg("*D05-2A* Single person house: Language imputed from Age and Sex, n pn= %02d, lang = %01d",PERSNUM,LANGUAGE) denom = denomPop summary; n F1F2(); n write("*D05-2A* Single person house: Language imputed from Age and Sex, n pn= %02d, eth= %01d",PERSNUM,LANGUAGE); n impute( LANGUAGE, ALANGUAGE (AGE10,SEX));

71 71 Someone else in house has language, assign that to head n Assign the first other person’s language to head: n else n PERSONPTR = 0; n do varying i = 1 until i > TOTOCC (POP_EDT) n if LANGUAGE (i) in 1:17 then n PERSONPTR = i; n endif; n enddo; n if PERSONPTR = 0 then n errmsg("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", n PERSNUM,LANGUAGE) denom = denomPop summary; n F1F2(); n write("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", n PERSNUM,LANGUAGE); n impute( LANGUAGE, ALANGUAGE (AGE10,SEX));

72 72 If no one else has language, get from nearby head same age and sex n No one else has a valid entry for this item, so impute from the nearest neighbor with a valid entry: n errmsg("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE, n pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", n PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)) denom = denomPop summary; n F1F2(); n write("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE, n pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", n PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)); n impute (LANGUAGE, LANGUAGE (PERSONPTR));

73 73 For others in house, assign head’s language n Once the head has a valid entry for the variable, others can obtain theirs from the head: n F1F2(); n errmsg("*D05-4* LANGUAGE imputed from Head's LANGUAGE, lang %d langhd %d", n LANGUAGE,LANGUAGE(headpt)) denom = denomPop summary; n write("*D05-4* LANGUAGE imputed from Head's LANGUAGE, pn= %02d, lang= %01d, n langhd= %01d",persnum,LANGUAGE,LANGUAGE(headpt)); n impute (LANGUAGE, LANGUAGE (1)); n

74 74 Language Edit: Within House n Example of WRITE Statement in CSPro to assist in finding the error n Note: Before and after edit displays, with what is done in edit in the middle n Assigning Head’s language from other people

75 75 Language Edit: Imputed Head from Previous Household Head n No one has language, so first head gets language from previous head of same age and sex n Then the others in the house get their language from the head

76 76 Series of Edit Problems n We are going to do three exercises that will look at different kinds of editing problems n Note: these are all simplified – most of the time edits must be more complicated n But these cover the basics

77 77 Countries choosing not to impute n Exercise 1 in the packet: simple edits for population items n These days, most countries impute at least items needed for planning and policy determination n If a country still decides not to impute n Then, staff might assign “unknown” even items used for planning: n If SEX is not 1 or 2 then n SEX = 9 n endif

78 78 A simple kind of edit: when SEX is invalid n Since we only have two sexes, the easiest way to edit is to alternate between the sexes: n SEXCHANGE = 2 n... n If not SEX in 1:2 then n SEX = SEXCHANGE; n SEXCHANGE = 3 – SEXCHANGE; n Endif; n The program will assign the stored sex and then will change the holding variable to await the next instance of “bad” sex

79 79 What if this person has complete fertility information? n Use other intra-record variables to assist in an edit. If this person has an invalid entry for sex but has fertility information: n If not SEX in 1:2 then n if FERTILITY <> NOTAPPL then n errmsg (“Has Fertility info so Female”); n SEX = 2; n endif; n Endif;

80 80 But what if this is the Spouse and we know the Head’s Sex n For Programmers: Use inter-record information when it is available: n So when Head has sex reported, but the Spouse does not: n If SEX (1) in 1:2 then n errmsg (“Sex of spouse from sex of head”); n SEX = 3 – SEX(1); n Endif;

81 81 Exercise 2: Housing edits n Since housing does not usually require crosstabulations, except by geography, edits tend to be more simple n But still must edit for invalids and certain inconsistencies

82 82 Housing edits: Rooms and bedrooms n When a census collects both rooms and bedrooms, the numbers of bedrooms should not be more than the number of rooms n Some countries collect the information independently – rooms except bedrooms, and then bedrooms, so this edit would not work n Edit: If Bedrooms > rooms, then make them the same

83 83 Housing edits: Walls and Roof n Each variable needs a separate edit n If you use hotdeck, then invalids need to be assigned from nearest neighbor with similar characteristics n Then, you need to check for inconsistencies n For example, if you have a house with a concrete roof but thatch walls, the roof would collapse the walls, so you need an edit to correct for this

84 84 Inter-record checking – one record type n Sometimes you need to look between records, not just within a record n For example, Each household should have one and only one head n [This is exercise 3] n So you need to look through house counting the heads n Need to make sure you have exactly one head n So at least one head and not more than one head!!

85 85 Inter-record checking for spouses n Does every household have to have one and only one spouse? n Consider polygamous houses … do multiple spouses even live together ? n What about other types of household structures?

86 86 Other types of inter-record checking n If a spouse is present, the sex of the head and the sex of the spouse should be opposite n [This may no longer hold in some countries] n If a spouse is present, both the head and the spouse should be reported as “married” or in “common-law” arrangement – and these should be the same

87 87 Inter-record checking for population edits n Age of head and age of spouse Figure 4. Example of household with potential inconsistencies in age reporting Father Head of household Spouse (age 43) (age 70) Son Daughter (age 10) (age 8)

88 88 Figure 4. Example of household with potential inconsistencies in age reporting Father Head of household Spouse (age 43) (age 70) Son Daughter (age 10) (age 8) WHAT IS WRONG HERE? Note: Head is 43 years old, Spouse is 70 Note: Children are 10 and 8 SO: need to change age of spouse

89 89 Inter-record checking for age n Need to use a hot deck – you have choices n You could have a hot deck with age of head and age of spouse for previous households n OR, you could have a hot deck with age differences between heads and spouses n In either case, you should have separate categories for males and females – because they act differently

90 90 Inter-record checking: Between Record Types n Until now we looked at one record type – Population or Housing – but sometimes we need to compare them n Vacant houses should have no people and occupied houses should have people n CSPro code: n if TENURE = 5 then {Vacant unit} n if TOTOCC (POP) <> 0 then {people in vacant unit} n [determine tenure – owning or renting – code note shown here] n endif; n Else {For owned or rented units} n if TOTOCC(POP) = 0 then {no people in an occupied unit} n impute (TENURE,5); {make this a vacant unit} n endif; n Endif;

91 91 THANK YOU UN Editing Specifications Workshop


Download ppt "1 Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods, Series F,"

Similar presentations


Ads by Google