Presentation on theme: "1 Data entry – principles and practices Module 2 Session 2."— Presentation transcript:
1 Data entry – principles and practices Module 2 Session 2
2 Overview This session is concerned with the principles and practices of data entry so that participants can: i. advise others on how to do effective data entry ii. explain principles of good data entry through practice with a small set of data
3 Design survey Design questionnaire Enumerators collect data in the field Data entered onto computer Manual checking, editing etc. Data analysis Reporting of results Computer data management Data management cycle Conception Now we start looking at entering data
4 Contents Review different types of questions that can be found on questionnaires Review different data types Enter a small dataset onto the computer Summarise steps in data entry, and principles of good data entry The Epi Info software is used for data entry.
5 Learning Objectives At the end of this session participants should be able to: enter questionnaire data onto the computer summarise the steps in the data entry process produce a checklist of data entry principles describe double data entry
6 Questions and data types Preliminary review of the questions on a questionnaire gives the data entry person an idea of: the types of data to be entered the complexity of the data to be entered quality of the data on the questionnaires. It is also essential for designing the computer data entry screens. Here we look at some example questions.
7 Types of questions These are examples of numeric data. First one take values of 1, 2, 3, etc. Units are in years. What is maximum value? Second one can take values of 0, 1, 2, 3, etc. Units are in months. Duration cannot be more than 12 months. For how long has (NAME) stayed in in the household during the last 12 months? (In months) What is (NAME'S) age in completed years?
8 Types of questions This is an example of categorical data. It has two possible values – male and female. Coded as 1 and 2. The codes are entered onto the computer. Sex Male Female Other similar examples are Yes/No types of response. Coding is often Yes = 1 No = 0; or Yes = 1 No = 2. [Should be consistent throughout.]
9 Types of questions This is also categorical data. There are 12 possible values; coded 1 to 12. Need sufficient space in computer system to be able to enter up to 2-digit numbers.
10 Types of questions This is also a categorical variable. Are the categories in any particular order? Are the categories mutually exclusive?
11 Multiple response questions Multiple response questions can be in the form of: Multiple dichotomy Responses listed but not ordered Ranked e.g. List 1 st, 2 nd, 3 rd. How should these be entered?
12 Example: Multiple dichotomy Question from UNHS. S5b10. Does this household own any of the following? Yes =1 No= 2 Motor vehicle1 Motor cycle2 Bicycle1 Boat/canoe2 Donkey2
13 Example: Listed but not ordered multiple responses UNHS S3a3. What sort of sickness/injury did [x] suffer? (column.3) If code 01 (malaria) did in column (3) 5. What type of drug did [X] take? Malaria 01 Respiratory 02 Measles 03 Diarrhoea 04 Aids 05 Pregnancy related problems 06 Dental 07 Accident 08 Intestinal worms 09 Sick infections 10 Others 11 None ………………………………1 Chroloquine 2 Fansidar…………………………… 3 Camaquine ……………………….. 4 Quinine ……………………………..5 Panadol …………………………….6 Aspirin ……………………………...7 Herbs ……………………………... 8 Others …………………………….. 9 (5a)(5b)(5c) 253
14 Example: Ranked multiple responses UNHS S3bq3: What are the main channels of communication from which you receive AIDS/HIV information and Education? (Note that the channels should be ranked in order of the three most important) (use codes at the bottom of page) 1st2nd3rd (3) 08(4) 01(5) 07 Channels of communication (codes for col. (3), (4), and (5) Radio 01 Posters 05 Teachers 09 TV 02 Billboards 06 Political leaders 10 Film 03 Family 07 Trad. Leaders 11 Drama 04 Friends 08 Religious leaders 12
15 Computerisation The dichotomous Multiple Response questions require one column for each Yes/No (or 1/0) response each one indicating whether respondent ticked / did not tick item in the list. In the ordered or ranked multiple responses, can have as many columns as there are alternatives in the question, but the first records the most important etc..
16 More complex questions How should these data be entered?
Missing values Surveys will always have missing data Data can be missing for a variety of reasons: respondent did not know the answer; respondent refused to answer; question was not applicable; question was missed by the fieldworker; response was not recorded clearly; etc. 17
Coding missing data Assigning codes to missing data – avoids blanks in the data. Code must not be a possible value. For numeric data (e.g. Age) negative value often used (e.g. -99) For categorical data use a code higher than any valid code for the question (e.g. 99) 18
Missing value codes Different codes could be used for different types of missing data. 99 or -99 = question missed by fieldworker 88 or -88 = question not applicable 77 or -77 = dont know or refused to answer Should be consistent throughout 19
Unique Identifier Each set of data should have a unique identifier. Often referred to as a Primary Key. In household surveys for example you often have a Household ID. This would be unique for each household and enables you to easily find the data for the household. 20
21 Activity 2 In pairs. Look at questionnaires. Identify types of questions, and types of data. Class discussion.
22 Brief introduction to Epi Info… Epi Info is a series of freely distributable programs for Microsoft Windows, for managing databases (especially public health ones) can customize the data entry process (layout similar to questionnaire), enter and analyse data.
23 Brief introduction to Epi Info… Projects (file,.mdb) View: info about the screen appearance, or how the survey looks, and how data is entered into the data table. It has fields (variables) which are created to hold data. Data Tables stores the data Epi Info contains: which have
Data entry in Epi Info Points to note: View can span several pages Space assigned for Other, specify text Questions can be skipped if not relevant Demonstrate data entry using the Household Survey data 24
25 Activity 4 & 5 Entering a small dataset into Epi Info. Record some principles of good data entry. Record the steps in the data entry process.
26 Double data entry Data entry needs to be checked. If data set is small, can print out and check manually. If dataset is large, this can be resource- intensive and time consuming, - How many records do you need to check? Double data entry = dataset is entered twice (by different people) and datasets compared. Discrepancies are checked and corrected.
Data Compare Utility Utilities -> Data Compare File -> New Script Step 1: Epi Info View – select the files to compare Step 2: Checks that structure of the files is the same Step 3: Select the unique identifier Step 4: Select the fields to compare (all) View -> Read-Only Demonstration of Data Compare using data1 and data2 27
Activity 7 Use the Data Compare utility to compare data entered in Activity 4 with data entered by another group 28