TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.

Slides:



Advertisements
Similar presentations
Use of EpiData (questionnaire design and entry)
Advertisements

Archiving Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November 6th, 2006.
Quantitative Data Preparation Louise Corti ESDS/ UKDA Social Science Data Archives for Social Historians: creating, depositing and using qualitative data.
Quantitative Data Preparation Alasdair Crockett, Data Services Manager UK Data Archive.
Maintaining data quality: fundamental steps
Cal Grant GPA Submission Training – Non-SSN
Pengolahan dan Analisa Data Indra Budi Fasilkom UI.
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Database Software Application
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
Identifying Problem Sources at Data Entry and Collection National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional.
Organizing Your Data for Statistical Analysis in SPSS
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Microsoft Word 2000: Mail Merge Basics Peggy Serfazo Marple Molly Calvello Support Professionals Business Applications - Desktop Microsoft Corporation.
Data Quality: Treasure in/Treasure Out Victoria Essenmacher, SPEC Associates Melanie Hwalek, SPEC Associates Portions of this presentation were created.
Introduction to SPSS Edward A. Greenberg, PhD
MAIL MERGE Designing Documents with. Terms Mail Merge: A process that inserts variable information into a standardized document to produce a personalized.
Discipline, Crime, and Violence August New DCV Application The DCV application and submission process has been revised beginning with the
Research Methodology Lecture No : 21 Data Preparation and Data Entry.
Information Processing and Presentation by Rico Yu.
Emission Inventory Quality Assurance/Quality Control (QA/QC) Melinda Ronca-Battista ITEP/TAMS Center.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Using SPSS to Analyze Data Anastasia.
System Development Lifecycle Verification and Validation.
Data Management Seminar, 9-12th July 2007, Hamburg Data Entry Overview.
Downloading data from the TCM System  Only DHS Administrators and LGA Administrators can download data  LGA Administrators can only download encounter.
Data Management Seminar, 9-12th July 2007, Hamburg 11 ICCS 2009 – Field Trial Survey Operations Overview.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
1st NRC Meeting, October 2006, Amsterdam 1 Data Management Procedures Preview of software used in ICCS Michael Jung, IEA Data Processing Center.
CREATING A LABEL MAIL MERGE IN WORD. TERMS FIELDS RECORDS MERGE CODES.
Data Management Seminar, 9-12th July 2007, Hamburg Entering Data Part 2.
Data Management Seminar, 8-11th July 2008, Hamburg 1 Survey Administration Receiving Material Data Submission Instrument Preparation Codebook Adaptation.
Chapter Fifteen. Preliminary Plan of Data Analysis Questionnaire Checking Editing Coding Transcribing Data Cleaning Selecting a Data Analysis Strategy.
Chapter Fifteen Chapter 15.
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
 Handling ◦ documentation  Auditing ◦ Coding ◦ Scanning ◦ Final  Cleaning ◦ Excel ◦ Syntax.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
Data Management Seminar, 8-11th July 2008, Hamburg WinW3S – Listing & Sampling Teachers.
Data Verification and Validation
Data Validation.
DATA DESCRIPTION Research Methods College of Public and Community Services University of Massachusetts at Boston ©2012 William Holmes 1.
Data Management Seminar, 9-12th July 2007, Hamburg Introduction to WinDEM Software.
NIMAC for Publishers & Vendors: Using the Excel to OPF Feature & Manually Uploading Files December 2015.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Input, Output and Processing.. What data needs to be input into the system? Identify the sources – i.e. where does the data come from? What is the volume.
Data Entry, Coding & Cleaning SPSS Training Thomas Joshua, MS July, 2008.
Section 3 Computing with confidence. The purpose of this section The purpose of this section is to develop your skills to achieve two goals: 1-Becoming.
Data Management in Clinical Research
DATA TYPES.
WHO The World Health Survey Data Entry
Introduction to Marketing Research
ADE EDIS READ & Optimizer TRAINING Colorado Department of Education
Performing Mail Merges
Comments on ASFA Input Helen Wibley, FAO 2016 ASFA Advisory Board Meeting – Hanoi, Viet Nam.
INTAKE OF NEW PORTFOLIO AND INVOICES
2018 NM Community Survey Data Entry Training
IT Applications Theory Slideshows
Objectives TO UNDERSTAND THAT CAPTURING DATA IS VALIDATED AND VERIFIED TO CHECK THAT IT IS REASONABLE AND CORRECT.
Introduction to Databases
Validation and Verification
Indicator 3.05 Interpret marketing information to test hypotheses and/or to resolve issues.
Presentation transcript:

TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis

Data Preparation Overview Getting Started Organizing Data Cleaning Data  Data Entry Verification  Checking for Errors and Consistency  Formatting for Analysis: Data Transformations

Getting Started

Bridging the Gap Research Methodology Analysis

What Is Data Preparation? Creating and preparing a dataset to be analyzed Data can come from one or more data sources  Survey data (phone, web, mail)  Data tracked by your organization (internal reports)  Customer/client databases  Program data  Quality control data  Coded interviews

What Is Data Preparation? Before you start analysis, your data should be:  Organized  Consistently recorded  Error-free  Formatted for analysis

Allow for Ample Time Data Preparation can take up 50% or more of the time you dedicate to analysis Rushing/skipping data preparation Data errors  Low confidence  Starting analysis over

Document, Document, Document! Data collection and compilation process should be replicable  Where did you get the data?  How did you obtain it? Document problems  Collecting  Recording  Extracting

Document, Document, Document! Methodology report  Survey/Instrument/Experiment/Process Design  Sampling procedures (if applicable)  Response rates (if applicable)  Limitations

Organizing Data

Compile Your Data Start by figuring out all of the data components you will need for analysis.  What are the sources? Do you have access to all data that you need?  How do you get access to data that you need?  Track contact/retrieval information and date expectations  How much time do you need to build into your timeline?

Create Codes Code = A number or set of letters that stands for something else Question codes provide a way to reference a question Response codes provide a way to easily record results or answers

Create Codes

Create a Codebook

Enter Data Into a Database Two Methods:  Export from existing database  SurveyMonkey  Excel  Data entry  Paper survey data  Excel or SPSS  Create a codebook  Verification

Data Entry Enter information into your database one record at time Use your codebook to determine what you should enter into your database  Statistics program (i.e., SPSS)  Enter answer codes for data and define the codes in the program (i.e., Male = 1; Female = 2)  Excel or other spreadsheet program  Enter answer labels into spreadsheet (i.e., Male or Female)

Data Entry Considerations Create a unique ID for each record Illegible handwriting or unclear markings Missing data  How much can you tolerate?  Key questions?

Response Codes or Response Labels?

Data Cleaning

Data Entry Validation GIGO: Garbage In, Garbage Out Double user verification Double entry verification  Enter exact same data into two Excel tabs  Use a formula on a third tab to check the first two tabs  BLE_DATA_ENTRY.html BLE_DATA_ENTRY.html Data entry software / SPSS (using Compare Datasets)

Checking the Data: Ranges Are all answers within the accepted minimum and maximum values?  Age values of 150 or 16  Value of “7” on a 5-point scale

Checking the Data: Data Types Is data formatted in the correct way?  Age entered as thirty instead of 30  Date entered instead of

Checking the Data: Data length Do all of your data entries have the correct number of digits or letters? Examples  Zip code with four digits instead of five or nine  Phone number with 11 digits instead of 10

Checking the Data: Fixing Errors Can the data be clarified with assistance?  Trace data back to point of origin  Review original data/database/instrument/source  If someone answered a survey, can you contact that person for clarification?

Checking the Data: Fixing Errors Is it reasonably correctable on your own?  If valid values are 1 – 5 and you have “11”, entering a “1” might be considered reasonable  If the valid values are between 0 and 100 and you have “232”, you can not make a reasonable determination between 23 and 32. Do not guess or choose a value at random; make it a missing value

Checking the Data: Missing Data Data can be missing for a variety of reasons:  Unanswered questions (forgot/declined to answer, illegible handwriting)  Data point was not applicable for a portion of records  Errors in recording data  Manually removed because you lacked confidence in the data

Checking the Data: Missing Data Create a unique code for missing values  Missing responses:  Not applicable fields:  Don’t know responses: 8888, 888, 88  Declined to answer responses: 9999, 999, 99

Checking the Data: Missing Data Statistical programs can remove your missing data from analysis

Statistical Analysis in Excel Can do statistical analysis in Excel Excel 2010 and 2013 (Windows):  File  Options  Add-Ins  New Data Analysis option appears under the Data tab

Statistical Analysis in Excel

Data Transformations in SPSS

Questions?

TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Thank you!