Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

Similar presentations


Presentation on theme: "1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”"— Presentation transcript:

1

2 1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics” 10 May – 11 July 2006 M Q Hasan Lecturer/ Statistician UN Statistical Institute for Asia and the Pacific Chiba, Japan Email : hasan@unsiap.or.jp

3 2 Overview Data management Data management planning Data management procedures Data management software Hands on experience References

4 3 Data management and the NSO Data management during production –Individual case Data management after production –Individual case Data management –All case – long term

5 4 Data management Management of data files Management files during analysis Management files afterwards

6 5 Data management Management of data files –Labeling data files –Documentation

7 6 Data management Management files during analysis –Version managements –Subset data –Arrange files in different folder –Index files

8 7 Data management Management files afterwards –Pass them to system administrator for future reference

9 8 DATA MANAGEMENT

10 M Q Hasan, UN-SIAP 9 These will lead to … Production of creditable data Design of robust/ efficient / flexible storage and accessible system Efficient procedure for sharing data with others

11 10 Data management before and during data processing

12 11 Define the relevant aspects of a dataset. Formulate a data preservation strategy. Design an access procedure. During DP Planning :

13 12 File format and file structure Naming files Creation and naming of variables Variable labels Defining the relevant aspects of a dataset

14 13 Chose file structure according to available computing resources and the experience of the data processors. Defining the relevant aspects of a dataset

15 14 Documentation –Provide responsibility to log all processing activities –Problems encounter –How problems are to be solved –Major decision taken Defining the relevant aspects of a dataset

16 15 Can be time consuming. Should contain all information about data, such as, survey method, sample information, time of collection, information about variables, missing values etc. Should start well before actual data processing. Follow standards. Preferably one file with reference to other files. DP : Documentation

17 16 Title: Child labour in Portugal: Social characterization of school-age children and their families, 1998. Subtitle : Child labour in Portugal, 1998. Alternative title : SIMPOC Portugal survey, 1998. Parallel title :Trabalho Infantil em Portugal: Caracterização social dos menores emidade escolar e suas famílias, 1998 files. DP : Documentation

18 17 Keywords. National survey, child, economic activity, child labour, household, household chores etc. Abstract. Purpose, nature, and scope of the child labour data collection. Special characteristics of the contents etc. Time period covered. If the data was collected in 1999, and one question was “did you work last year?”, The time period should be 1998-99. DP : Documentation

19 18 Date of collection. Date(s) when the data were collected. Country. Name of the country where the survey was conducted. Geographic coverage. Total geographic scope of the data. Geographic unit. Lowest level of geographic aggregation covered by the data—for example province, state, or district. Unit of analysis. For most child labour surveys, the basic unit of analysis or observation is the individual person. DP : Documentation

20 19 Time method. Panel, cross-sectional, trend, and time-series etc. Data collector. Responsible for administering the questionnaire or interview or for compiling the data. E.G NSO. Frequency of data collection. For example, in first-time. Sampling procedure. Reference to sampling documents. DP : Documentation

21 20 Mode of data collection. CAPI, CATI etc. Type of research instrument. Structured, semi- structured, open-ended questions etc. Actions to minimize losses. E.G follow-up visits, supervisory checks, historical matching etc. Control operations. Methods used to facilitate data control. DP : Documentation

22 21 Weighting. Reference to appropriate document. Cleaning operation. E.g consistency checking, wild code checking, etc. Response rate. Percentage of sample members who provided information. Estimates of sampling error. Indication of how precisely one can estimate a population value from a given sample. DP : Documentation

23 22 Location. Say where the data is currently stored (e.g. A national statistics office). Availability status. Provide a statement of data availability. Extent of data. Number of physical files that exist in a dataset. Completeness of dataset. Describe if items of collected information were not included in the data file. DP : Documentation

24 23 Access authority. Contact person or organization that controls access to the data collection. Date use statement. Reference to the terms of use for the data collection, if any. Citation requirement. Specify any text that should be cited in publications based on analysis of the data. DP : Documentation

25 24 File contents. Short description of the file(s). File structure. E.G. Hierarchical, rectangular, or relational etc. Record or record group. Describe the record groupings for hierarchical or relational. Label (of record). Detailed information for each record group. Dimensions (of record). Physical characteristics of the record, such items as number of variables per record, number of cases, etc. DP : Documentation

26 25 Overall case count. Number of cases or observations. Overall variable count. Number of variables. Data format. Delimited format, free format, software dependent, etc. Missing data. Provide information such standardized across the collection, that missing data are the result of merging, etc. Software. Identify the software used to create the file, including the software version number. Version statement. Version statement for the data file. DP : Documentation

27 26 list of variables with followings : –if variable is a weight; and if not reference weight variable for this variable; –question ID for the variable; –which format has been used (e.g. SAS, SPSS); –the number of decimal points in the variable; –whether the options are discrete or continuous which record type this variable belongs to; DP : Documentation

28 27 Usually generated in a package-specific format Convert data into other formats, if possible, Convert data into ASCII and generate codebook Reload ASCII data using same codebook Recheck data Conversion of data files to other formats as required DP

29 28 Possible list/type of files –Data in a package-specific format –Data in ASCII with necessary data dictionary –Public use data –Public use data in ASCII with necessary data dictionary –Final documentation –Questionnaire Storage of all files. DATA MANAGEMENT

30 29 Possible list/type of files contd. –Logical rules for consistency check. –Computer program files. –Interviewer and/or supervisor’s instruction manual. –Coding file/s. –Sampling and weight files. Storage of all files. DATA MANAGEMENT

31 30 Group them considering version, type etc. Create index file associated with each sub- directory. Add short description to each file according to the file contents in the index file. Storage of all files DATA MANAGEMENT

32 31 Hardware Automation software Directory structure Formulating a data preservation strategy DATA MANAGEMENT

33 32 DATA MANAGEMENT

34 33 DATA MANAGEMENT

35 34 DATA MANAGEMENT

36 35  Access policy  Safe keeping person : system administrator  Contact person : supervisor  Content modifying authority : supervisor  Finalize access condition to each file Designing an access procedure DATA MANAGEMENT

37 36  Micro data  Aggregate tables  Executive summary  Reports Data type DATA DISSEMINATION

38 37  Online : direct access through internet in real time  Off line : available on request Methods DATA DISSEMINATION

39 38  Backup policy  During during data processing  Data processors responsibility  After finalization of data and documentation  System administrator’s responsibility Designing an access procedure DATA MANAGEMENT

40 39 END


Download ppt "1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”"

Similar presentations


Ads by Google