Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maintaining data quality: fundamental steps

Similar presentations

Presentation on theme: "Maintaining data quality: fundamental steps"— Presentation transcript:

1 Maintaining data quality: fundamental steps

2 Agenda The whole process Questionnaire design Data collection
Software design Data entry

3 The whole process Questionnaire Design Data collection Software design
Asking the right questions, in the right way Structure the questionnaire effectively  Pilot & Back-Translate Veracity Quality of survey Quality of filling questionnaires  Back Checks & Accompaniments Software design Data entry and management Minimize data entry errors Organize data in an effective way Clean data  Double entry & error checking

4 Agenda The whole process Questionnaire design Data collection
Software design Data entry

5 Questionnaire design Clear skip patterns whenever needed. Grids
The software designer will then need to include those in the data entry software. Grids Single/multiple options Interviewer checkpoints When coding your questions, make sure that all options are included. For example, if there is a chance, even small, that people will say “I don’t know”, do include the code “-999” in the question.

6 Pilot and translate survey
Pilot: in non research areas, but similar setting Depending on how ready questionnaire is, 30 to 40 pilots Can also pilot some sections more intensively Translation: back translation is MANDATORY

7 Agenda The whole process Questionnaire design Data collection
Software design Data entry

8 Data collection: surveyors
Selection Training: before survey, and on-going Before survey: Classroom and field Questionnaire + field instructions + behavior on field Training on the issue of interest Also, if you have time to do an instruction manual, it is useful Keep going to the field with them and do reminder trainings (ex. You notice they prompt too much etc.) Maintain motivation: go out with them, bonuses etc. STAY IN THE FIELD WITH THEM

9 Data collection: quality checks
Team structure One supervisor for five surveyors A field monitor if your team is big to help you manage the team Monitoring on the field Accompaniments by supervisor: all the time Accompaniments by monitor: 75% of the time Accompaniments by yourself: maybe 15% of the time Back-checks by field monitor: 15% of questionnaires, some sections (mandatory!) Do some back-checks yourself Analyse the data from back-checks right away! If you use a survey company, you still need to do your own back-checks and some accompaniments

10 Questionnaire quality: scrutiny
Scrutinize questionnaires Have surveyors, and supervisors do it But also do it yourself! If you have a project assistant, ask him to scrutinize 100% but still scrutinize 50% or so yourself (at least most tricky sections) Examples of instances where only you can catch mistakes: codes for activity, logical consistency When scrutinizing, write all codes, even if not pre-coded “-777” for missing, or “-999” for “I don’t know” If you find too many missing data, or data not consistent, send surveyors back to the field

11 Agenda The whole process Questionnaire design Data collection
Software design Data entry

12 Data management: goals
Quality Timing Timing is important, and you need to monitor the Data Entry Officers (DEO) or the Data Entry (DE) company carefully to make sure they stick to timelines, but by no mean you should sacrifice any steps related to quality check (if you save time on those steps, you’ll lose time later).

13 Data entry software Software
Need to think about it as soon as questionnaire close to final Could be done by survey company or outsourced to someone else (less expensive, or someone you trust better) Goal is that DEO should be able to do as few mistakes as possible

14 Data entry software Software developing: send the developer a detailed spreadsheet indicating instructions for each question (what is the range of acceptable values, logical checks, etc.). The more detailed this will be, the more time you’ll save later. Software testing: When a software designer does the software, you need to test it your self by entering a bunch of questionnaires (for e.g pilot questionnaires, or also invent the responses, just make sure you test all the parts of the software). Check output: Then look at the output carefully and make sure it looks fine, and also send it to the professors you work with to make sure they are satisfied with the output.

15 Checking output When checking output try to imagine yourself analyze the data! All field need to be numerical (except text fields, like comments or “others – specify”). Again, there is not much you can do with text fields when you analyse. One example: when questions have multiple choice responses (let’s say the question is “where do you take your water from?” and there are 5 options “well, tap, etc.”) This question should be considered as 5 questions (1. Do you take your water from the well? Yes or no 2. Do you take your water from the tap? Yes or no etc.). The response for this question will be a binary variable (i.e either 1 (yes) or 0 (no). This becomes obvious if you put your self in the shoes of the person who will analyse the data (among others, you!). If this is considered as only one question, and the DEO fills “1, 2, 5” in the unique response field, you can not do anything with that data!

16 Agenda The whole process Questionnaire design Data collection
Software design Data entry

17 Data entry Timing: Data entry should start no as soon as possible after data collection start – and before collection is over! Double entry: Mandatory. Must be written in contract. One output Two outputs, reconciled Error checking: Check the error rate on a regular basis (batches of 200 or 300 questionnaires). And before you do any cleaning Payment to DE company: In contract, clause that the first payment will be done only after 200 or so questionnaires have been given to you, the error rate checked by you, and less than 0.5%. Pay only after that. Get bad data re-entered entirely: whatever is the nature of the errors

18 Error rate checking What is it? For each batch, re-enter a sample of data fields and compare this data with the data given by the company (for those fields) Need approximately 3000 by batch How to do? Divide your data in sub-sections (of about 25 questions) In some cases you will receive your data split in tabs – you can use those tabs as sub-sections – if small enough For each sub-section select 5% of questionnaires in your batch, randomly selected Enter data from that section of the selected questionnaires (using an excel spreadsheet, or the data entry software) Compare your dataset with original data (use stata, excel, or comparison software), and check on physical questionnaire who did the mistake Error rate: numbers of errors made by the company/number of fields (one error is one field with a mistake, not one question!) Calculate error rate for each section, and overall

19 Data cleaning and organizing
Clean your data in a different file Rename and label variables Check for logical errors Look at ranges and outliers Do basic data summaries Check for duplicate data Check for missing data Look at distribution of data by surveyors/teams

Download ppt "Maintaining data quality: fundamental steps"

Similar presentations

Ads by Google