Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Archiving Michael J. Levin Harvard Center for Population and Development Studies

Similar presentations


Presentation on theme: "1 Archiving Michael J. Levin Harvard Center for Population and Development Studies"— Presentation transcript:

1 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

2 2 Two types of “Archiving” I. Data II. Metadata

3 3 I. Data archiving Every effort must be made to keep all versions of the data set. Every effort must be made to keep all versions of the data set. Separate series of data sets need to be preserved for the pilot, the census itself, and the PES. Separate series of data sets need to be preserved for the pilot, the census itself, and the PES. For the census, this data archiving needs to start with the output of the scanning or keying operation – the completely unedited data. For the census, this data archiving needs to start with the output of the scanning or keying operation – the completely unedited data.

4 4 Why preserve the unedited data The most important reason to keep unedited data is because they are closest to the respondents, and, therefore represent the “thoughts and feelings” before coding, editing, and tabulation operations. The most important reason to keep unedited data is because they are closest to the respondents, and, therefore represent the “thoughts and feelings” before coding, editing, and tabulation operations. As staff edit the data, they can refer back to this data set, as needed, to see changes are being made, at the individual level, and through frequency distributions, at the aggregate level. As staff edit the data, they can refer back to this data set, as needed, to see changes are being made, at the individual level, and through frequency distributions, at the aggregate level.

5 5 Another reason to keep the original, unedited data As new demographic and other direct and indirect techniques are developed, they can be tested on these data. As new demographic and other direct and indirect techniques are developed, they can be tested on these data. Without the original data, techniques developed to alleviate systematic problems in these data, or census data in general, cannot be tested as easily – or, in some cases, at all. Without the original data, techniques developed to alleviate systematic problems in these data, or census data in general, cannot be tested as easily – or, in some cases, at all.

6 6 Keeping original responses on the records Original responses should be kept on the population and housing records as part of the editing process. Original responses should be kept on the population and housing records as part of the editing process. In this way, both original and edited responses are always available to staff and researchers In this way, both original and edited responses are always available to staff and researchers For some items – e.g., fertility – intermediate values also kept For some items – e.g., fertility – intermediate values also kept

7 7 Flags Countries use flags to indicate changes in individual items include these on the final, archived data Countries use flags to indicate changes in individual items include these on the final, archived data – “no/yes” flag – “no/yes” flag -- a more complicated scheme -- a more complicated scheme

8 8 The final data set The final data set should be named in a strong, unambiguous way for current and future staff The final data set should be named in a strong, unambiguous way for current and future staff A country may choose to have several “final” data sets. A country may choose to have several “final” data sets. For most purposes, neither the original data nor the flags are needed for daily work in the office, that is, answering user requests. For most purposes, neither the original data nor the flags are needed for daily work in the office, that is, answering user requests.

9 9 De facto and De Jure data sets Three groups: Three groups: (1) respondents resident in the household, (2) visitors to the household, and (3) persons usually resident in the household but away on the reference date. So, the de facto file would have the persons indicating (1) or (2) and So, the de facto file would have the persons indicating (1) or (2) and the de jure data set would have those indicating (1) and (3). the de jure data set would have those indicating (1) and (3). And, no “Universe” would need to be selected for these runs. And, no “Universe” would need to be selected for these runs.

10 10 II. Meta data “Meta data” – basically “data about data” of any sort in any medium. “Meta data” – basically “data about data” of any sort in any medium. Meta data – text, tables, charts, maps, and other images that describe what users want or need to know about the census or survey. Meta data – text, tables, charts, maps, and other images that describe what users want or need to know about the census or survey. The users include individuals and groups. The users include individuals and groups. The census meta data – aids in clarifying and finding the actual data. The census meta data – aids in clarifying and finding the actual data.

11 11 More on metadata The meta data include the definitions of the items, their use, their interactions, information about the pretest and the post-enumeration survey, daily records of progress, weekly reports, monthly reports, and reports by activity. The meta data include the definitions of the items, their use, their interactions, information about the pretest and the post-enumeration survey, daily records of progress, weekly reports, monthly reports, and reports by activity. The data-processing metadata include the structure of the data dictionary, the keying screens for keyed data and verifying screens for scanned data, structure and content edits, the tabulations, and dissemination plans and activities. The data-processing metadata include the structure of the data dictionary, the keying screens for keyed data and verifying screens for scanned data, structure and content edits, the tabulations, and dissemination plans and activities. And, the procedural history of the census below. And, the procedural history of the census below.

12 12 The Procedural History Crucial to the complete success of a census Crucial to the complete success of a census Without it, even the best tables could become lost Without it, even the best tables could become lost As well as the ability to make subsequent tables after the end of the initial processing. As well as the ability to make subsequent tables after the end of the initial processing.

13 13 Each step in the process From the very beginning of the census operations. From the very beginning of the census operations. “what we did we do the last time” “what we did we do the last time” Each operation needs to be recorded Each operation needs to be recorded when it starts, when it starts, when it ends, when it ends, what is expected to be done, what is expected to be done, what is actually done, what is actually done, problems encountered, and problems encountered, and knowledge gained. knowledge gained. Sometimes a form is created to allow for filling in the blanks as individual operations take place. Sometimes a form is created to allow for filling in the blanks as individual operations take place.

14 14 Dedicated staff Group of staff (or in very small operations, a single staff member) should be assigned to collect for each operation the: Group of staff (or in very small operations, a single staff member) should be assigned to collect for each operation the: questionnaires, questionnaires, forms and manuals, forms and manuals, dictionaries and screens, dictionaries and screens, edits and tabulations, and edits and tabulations, and metadata metadata These various pieces of information need to be put in a data base or umbrella directory (like the TRS) and indexed for easy access both during the census and subsequently. These various pieces of information need to be put in a data base or umbrella directory (like the TRS) and indexed for easy access both during the census and subsequently.

15 15 Documenting table series Include: The item or items The item or items Definitions Definitions How the question was asked How the question was asked How the information derived from this question is used for planning and policy formation. How the information derived from this question is used for planning and policy formation. Limitations of the data item or items Limitations of the data item or items Compatibility with other censuses and surveys is also helpful to users. Compatibility with other censuses and surveys is also helpful to users.

16 16 Finally! All metadata, including All metadata, including Publicity announcements – both on paper and electronic announcements – Publicity announcements – both on paper and electronic announcements – notes, memos, emails, and so forth notes, memos, emails, and so forth need to be saved and organized, by date and topic. need to be saved and organized, by date and topic. It is only by being able to see the scope and flow of work, that the best planning can be done. It is only by being able to see the scope and flow of work, that the best planning can be done.


Download ppt "1 Archiving Michael J. Levin Harvard Center for Population and Development Studies"

Similar presentations


Ads by Google