Presentation is loading. Please wait.

Presentation is loading. Please wait.

Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change.

Similar presentations


Presentation on theme: "Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change."— Presentation transcript:

1 Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change impacts workshop Accra, Ghana

2 Database organisation and cleaning, or data management is generally seen as a set of tasks related to the tabulation phase of the survey, in other words, activities that are conducted towards the end of the survey project, that use computers in clean offices. Survey data management should begin concurrently with questionnaire design. Keys points to consider: – Nature and identification of the statistical units observed – Built-in redundancies – Length and complexity of the questionnaire – Sample size and design – Survey timing and scheduling

3 DATA ENTRY : “flat file”

4 codification of the statistical unit ADM0ADM1ADM2CADM0CADM1CADM2CODE South AfricaEastern CapeAberden711700101

5 Household code 8 digits code HHCODE 70010101

6 DATA ENTRY SYSTEM A complex household survey typically contains hundreds of variables. For example household survey dataset 2003 GEF study : 1342 variables After the survey instrument has been finalized, you develop the data entry system and provide a protocol for data entry. Coding questionnaire Coding sheet Household data: 12 worksheets Climate data; soil data, runoff data

7 DATA ENTRY hhcodeTIBfarmtyperelheadhhsizegender1age1 HHCODETIB1.0.11.0.21.11.2.1.11.2.2.1 7001010113:50334134 7001010214:30118183 7001010313:55113168 7001010417:30312171 7001010509:25314145 7001010615:301361-99 7001010707:303-996138 7001010813:00113175 7001010908:363-9951

8 Data cleaning Generally data is subjected to control mechanisms: 1.range checks, 2.consistency checks and 3.typographical checks

9 Range checks Every variable in the survey contains only data within a limited domain of valid values. tab farmtype, missing farmtype | Freq. Percent Cum. ------------+----------------------------------- -99 | 4 0.99 0.99 1 | 191 47.16 48.15 2 | 71 17.53 65.68 3 | 138 34.07 99.75 9 | 1 0.25 100.00 ------------+----------------------------------- Total | 405 100.00 hhcode farmtype remark 39. 70013308 9 CHECK DATA FOR THIS OBS.

10 Consistency check Values from one question are consistent with values from another question.  Demographic consistency of the household  Consistency of age and other individual characteristics gen test=hhmales+hhfemales list hhcode hhsize hhmales hhfemales test remark if test!=hhsize, hhcode hhsize hhmales hhfemales test remark 70013319 18 3 3 6 CHECK DATA FOR THIS OBS 70030507 14 4 4 8 CHECK DATA FOR THIS OBS. tab age5 hhcode age5 remark 70041703 281 CHECK DATA FOR THIS OBS.

11 Typographical checks Typographical error consists in the transposition of digits like entering : 41 rather than 14 This error can be check through the double data entry of all questionnaires -999 rather than.-99 in a numerical input foreach var of varlist _all { replace `var'=-99 if `var'==-999 replace `var'=. if `var'==-99 } Use the tab function to obtain frequency tables of the datafrequency tables of the data


Download ppt "Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change."

Similar presentations


Ads by Google