Stata as a Data Entry Management Tool

Slides:



Advertisements
Similar presentations
Use of EpiData (questionnaire design and entry)
Advertisements

MICS Data Processing Workshop
MICS4 Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop CSPro Overview.
Housekeeping: Variable labels, value labels, calculations and recoding
Maintaining data quality: fundamental steps
Data Analysis using SPSS By Dr. Shaik Shaffi Ahamed Ph. D
DL Windows Software “Rules” Import a CSV File From Excel
The SAS ® System Additional Information on Statistical Analysis Programming.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Google Refine Tutorial April, Sathishwaran.R - 10BM60079 Vijaya Prabhu - 10BM60097 Vinod Gupta School of Management, IIT Kharagpur This Tutorial.
Strategies for solving scientific problems using computers.
Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Why python? Automate processes Batch programming Faster Open source Easy recognition of errors Good for data management What is python? Scripting programming.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Using the IEA IDB Analyzer to merge and analyze data.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer to merge and analyze data.
I NTRO TO S TATA James Ng Center for Digital Scholarship Hesburgh Libraries.
INTRODUCTION TO STATA Võ Tuấn Khoa Trần Thế Trung.
:NEUROPSYCHIATRIC GENETICS [BIOSTATISTICS|BIOINFORMATICS] CORE BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS DATA: DATA MANAGEMENT AND ANALYSIS RICHARD.
Chapter 1 Introduction to Spreadsheet. Agenda Download the practice files Spreadsheet application Workbook and worksheet Toolbar Cell Formatting Printing.
Guide To UNIX Using Linux Third Edition
Getting Started with your data
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
STATA User Group September 2007 Shuk-Li Man and Hannah Evans.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
L2: BECOMING SELF- SUFFICIENT IN STATA Getting started with Stata Angela Ambroz May 2015.
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
Air Quality System Precision and Accuracy Data Transaction Generator (AQSP&A) Training Session.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
A Brief Introduction to Stata(1). 1. Getting Started.
Learning the TSP2: a guide for students at the 国際総合学類筑波大学 RUNNING REGRESSIONS FROM A SPREADSHEET FILE If you are using a network browser to view this program,
Key Data Management Tasks in Stata
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
An introduction for Data Reporters. College Credit Plus Replaces PSEO Replaces dual enrollment.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Organizing a project, making a table Biostatistics 212 Session 5.
Excel Introducing Excel Lesson 1. Manage Workbooks Excel is a spreadsheet program Excel is a spreadsheet program It organizes and analyzes data It organizes.
L3: BIG STATA CONCEPTS Getting started with Stata Angela Ambroz May 2015.
Organizing a project, making a table Biostatistics 212 Lecture 7.
VIDEO: INTRODUCTION TO STATA EMBA Data Analysis Professor Timothy Simcoe Boston University School of Management.
American Housing Survey Introduction to the Data.
Data Entry Goal is to accurately transcribe data from data sheets into electronic form –Good database design helps –Validation rules help –Good data sheet.
Data Management Seminar, 8-11th July 2008, Hamburg WinDEM- Verification Checks Part I.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
INTRODUCTION TO ACCESS 2010 Winter Basics of Access Data Management System Allows for multiple levels of data Relational Database User defined relations.
Using a set-up file to read ASCII data into Stata
datalibweb – Stata module to access micro data
Improving Georeferencing Workflow with Python
Microsoft Excel A Spreadsheet Program.
DTIAtlasFiberAnalyzer Tutorial
Econometrics 704 Emilio Cuilty
REDCap Data Migration from CSV file
ECONOMETRICS ii – spring 2018
Dale Rhoda & Mary Kay Trimner Stata Conference 2018
Introduction to Stata Spring 2017.
STATA User Group September 2007
Claire Osgood November 2017
Objectives This is an introduction to the statistical software STATA aiming at: Preparing the participants in STATA basics (interphase and commands) for.
Stata Basic Course Lab 4.
Stata Basic Course Lab 2.
TransCAD Working with Matrices 2019/4/29.
Road Sensor Data Marco Puts
Evaluation of Public Policy
Presentation transcript:

Stata as a Data Entry Management Tool Ryan Knight Innovations for Poverty Action Stata Conference 2011

Why Pay Attention to Data Entry? It sounds so easy… Surveys type, type, type… Data!

…but it is not! Excellent Opportunities for DISASTER No one checked data quality. Turns out, there’s no unique ID variable. Lost data. No one monitored data entry contractor. Turns out, they copy + pasted data and changed the IDs. Lost data. RA didn’t know that append forces the string/numeric type of the master file onto the using file and deleted the originals. Lost data. Records existed in multiple datasets and were different. Data lost in the merging process. And many more!

Data Entry Quality Control Use two unique identifiers for every survey Extensive testing of data entry interface Double entry Double entry of first and second entry reconciliation Independent Audit

Managing Double Entry Stata Stata Stata Questionnaire 1st Entry 2nd Entry Stata Discrepancies 1st Reconciliation 2nd Reconciliation Stata Discrepancies Final Reconciliation Stata Final Dataset

Generating a List of Discrepancies cfout [varlist] using filename, id(varname) [options] Compares dataset in memory to another dataset and outputs a list of discrepancies. Can ignore differences in punctuation, spacing and case Substantially faster than looping through observations

Correcting Discrepancies March down the output from cfout, indicating which value is correct

Replacing Discrepancies readreplace using filename, id(varname) Reads a 3 column .csv file: ID, question, correct value And makes all of the replacements in your dataset

The whole process * Load the data insheet using "raw first entry.csv" save "first entry.dta", replace insheet using "raw second entry.csv" , clear save "second entry.dta" , replace * compare the files cfout region-no_good_at_all using "first entry.dta" , id(uniqueid) * Make replacements using corrected data readreplace using "corrected values.csv", id(uniqueid)

Other Useful Commands mergeall merges all of the files in a folder, checking for string/numeric differences and duplicate IDs before merging cfby calculates the number of discrepancies “by” a variable. Useful for calculating error rates.

Why Use Stata for Reconciliations Instead of Data Entry Software? Choose the best data entry best software for each project Independent corrections of discrepancies is more accurate than checks against existing values Synergy with physical workflow management More control over merging Reproducibility Analyze errors and performance over time