# SADC Course in Statistics Assessing data critically Module B1 Session 17.

## Presentation on theme: "SADC Course in Statistics Assessing data critically Module B1 Session 17."— Presentation transcript:

SADC Course in Statistics Assessing data critically Module B1 Session 17

To put your footer here go to View > Header and Footer 2 Objectives At the end of this session the students will be able to: Apply basic techniques for error detection Ask relevant questions that allow for the explanation or correction of discrepancies

To put your footer here go to View > Header and Footer 3 Detecting errors in primary data Checks to detect errors in primary data should be made at various stages: Immediately after data collection (and during data entry) After data computerisation During exploratory data analysis

To put your footer here go to View > Header and Footer 4 Checking for errors after data collection Have all questions been answered? If not, are the reasons for non-response clear? Are recorded values within their expected range? Do all questions or items have meaningful entries? Are they internally consistent? Are any zero entries genuinely zeros? Are IDs unique?

To put your footer here go to View > Header and Footer 5 Checking for errors after data entry Compute new (temporary) variables to check if: Rates recorded per 1000 of population are less than 1000 Percentages expected to be less than 100% are indeed so There is internal consistency amongst variables, and between tables – for example, date of interviewing should be earlier than the date when the supervisor checked the questionnaire totals are consistent across different tables, and sub- totals add to overall totals. Codes for missing values have been identified correctly according to their reason for missing and have been set as missing in the database to be used for analysis.

To put your footer here go to View > Header and Footer 6 Tips for error detection Look for counts or categories that do not make sense If you have a series of data in chronological order, look for jumps in the data. They may be errors Always check your totals –Make sure they add to the expected total (e.g. 100%). –When looking at multiple tables in a single study, the sample size should be consistent in all tables What is expected to tally should tally! Dont just look at the numbers, look at the definitions that the numbers represent

To put your footer here go to View > Header and Footer 7 Checks during Exploratory Data Analysis Simple one-way or two-way tables can help identify errors. (a) Results are from a socio-economic survey in Uganda. Are these results reasonable? Average number of meals taken by HH in past weekFrequency 0 6 1 699 2 5547 3 3285 4 113 5 1 7 1 Total 9652

To put your footer here go to View > Header and Footer 8 Checks during Exploratory Data Analysis (b) A second example from the British Crime Survey, 2000 Number of times something was stolen from respondents hands, pockets, bag or case since 1 Jan 99Frequency 0 413 1 39 2 4 3 2 5 1 10 1 15 1 36 1 97 1 Total 463 Can the last figure be correct?

To put your footer here go to View > Header and Footer 9 Checks during Exploratory Data Analysis (c) Detection rate of property crimes in one police force. (Data are fictitious) Property Crime JanFebMar Vandalism 101314 Burglary 141916 Vehicle thefts 158117 Bicycle thefts 433 Thefts from person 325 Other thefts 7911

To put your footer here go to View > Header and Footer 10 Checks during Exploratory Data Analysis Consistency checks across related variables The following examples show: (i)Current number of cars at household versus whether respondent was worried about having car stolen. (ii)Current number of cars at household versus whether respondent was worried about having things stolen from car. (iii)Distance to reach any type of formal court versus distance from nearest Magistrate s Court.

To put your footer here go to View > Header and Footer 11 Use of cross-tabulations Table 1. Cross-tabulation of current number of cars at household versus extent to which respondent is worried about having car stolen (Source: BCS, 2000)

To put your footer here go to View > Header and Footer 12 Use of cross-tabulations Table 2. Cross-tabulation of current number of cars at household versus extent to which respondent is worried about having things stolen from the car (Source: BCS, 2000)

To put your footer here go to View > Header and Footer 13 Detecting errors in secondary data Procedures similar to the above can be undertaken,but in addition: Ask questions regarding the source from where data arose, e.g. to assess competence, adequacy of funding, motivation for study, etc. Ask about the data collection procedure and associated documentation. In particular seek answers to what, who, why, when, where, and how. Important to follow the whole data chain.

To put your footer here go to View > Header and Footer 14