Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de SHARE Data Cleaning Stephanie Stuck MEA Vienna November 5/6 th.

Slides:



Advertisements
Similar presentations
Chapter 26 Testing Bjarne Stroustrup
Advertisements

MICS Data Processing Workshop
MICS4 Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Data Entry Editing.
MICS DATA PROCESSING Data Entry Editing. REMEMBER AND REMIND YOUR FIELD STAFF: The best place to correct data is in the field where the respondent is.
Maintaining data quality: fundamental steps
Recap of basic SPSS and statistics 5 th - 9 th December 2011, Rome.
Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Post-enumeration Survey-A.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
Statistics—Chapter 2 Levels of Measurement. Classifying Variables by Levels of Measurement Levels of measurement—the way researchers collect data Survey.
 Raw data is generated by the process of collecting information  From 20-question survey of 100 people, for example, 2000 ‘bits’ of information are.
Does mode matter? Comparing response burden and data quality of a paper and an electronic business questionnaire. Deirdre Giesen Statistics Netherlands.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Data Quality Tables.
Solving Algebraic Equations
© 2009 GroundWork Open Source, Inc. PROPRIETARY INFORMATION: Information contained herein is not for use or disclosure outside of GroundWork Open Source,
Release shipping from today! Christian Hunkler Agnes Orban Stephanie Stuck with support by Martina Brandt Dimitris Christelis Danilo Cavapozzi Giuseppe.
Validation and Verification
Chapter Sixteen Starting the Data Analysis Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service ,
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
The CSS Financial Aid PROFILE For Private Colleges and Universities.
Slovenian Experience on Measuring Health Status Darja Lavtar National Institute of Public Health, Slovenia Work Session of the Budapest Initiative on Measuring.
SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.
Grundtvig 2 Project Learning Partnership. D ESIGNING I NCLUSIVE S PORT A CTIVITIES F ACILITIES Questionnaire for Services Providers.
Mannheim Research Institute for the Economics of Aging SHARE IDs Stephanie Stuck MEA Frankfurt December 6 th.
Mannheim Research Institute for the Economics of Aging Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th.
Enrolment Services – Class Scheduling Fall 2014 Course Combinations.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
School Census Summer 2008 for Secondary Schools Jim Haywood – Version 1.1.
Harmonisation across countries in SHARE Workshop on Harmonisation of Social Survey Data for Cross-National Comparison Prague 19.
Research Methodology Lecture No : 21 Data Preparation and Data Entry.
© John M. Abowd 2007, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2007.
Laura Crespo SHARE Meeting on Data Cleaning The Analysis of Interviewers’ Remarks Laura Crespo Spanish Team CEMFI Frankfurt December 6, 2007.
Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI.
Software Engineering Experimentation Rules for Reviewing Papers Jeff Offutt See my editorials 17(3) and 17(4) in STVR
Jibby Medina & Kelly Ward CLEANING THE ELSA LIFE HISTORY INTERVIEW.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Essex Dependent Interviewing Workshop 17/09/2004 British Household Panel Survey.
Clods’ Guide 2 Pre-course questionnaire Exercises on full cumulative data 1.
SW318 Social Work Statistics Slide 1 Frequency: Nominal Variable Practice Problem This question asks the frequency of widowed respondents of the survey.
MICS Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Data Entry Using Tablets / Laptops.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Data Cleaning in Financial Modules Workshop in Frankfurt Mario Schnalzenberger.
Data Cleaning and Imputation Imputation done on economic variables (assets, income, consumption, financial transfers, health expenses), education, self-reported.
Mannheim Research Institute for the Economics of Aging SHARE data versions & IDs Stephanie Stuck MEA Antwerpen February 2008.
Outliers with „natural limits“ SHARE Data Cleaning Workshop Berlin, June 2009 Sabrina Zuber.
MICS Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Data Entry Using Tablets / Laptops.
SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,
Preparing to collect data. Make sure you have your materials Surveys –All surveys should have a unique numerical identifier on each page –You can write.
Data Preparation and Description Lecture 24 th. Recap If you intend to undertake quantitative analysis consider the following: type of data (scale of.
WG2A meeting 7-8 October 2004 Working Group 2A ECOSTAT Agenda item 9b Discussion on final Intercalibation register.
Millennium/Agresso Interface Yvonne Desmond, Gillian Donagher, Dublin Institute of Technology
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning General rules and procedures Stephanie Stuck MEA Antwerp.
QNT 351 genius Expert Success/qnt351geniusdotcom FOR MORE CLASSES VISIT
WHO The World Health Survey Data Entry
Just the basics: Learning about the essential steps to do some simple things in SPSS Larkin Lamarche.
Catch and Landings statistics
Dale Rhoda & Mary Kay Trimner Stata Conference 2018
2018 NM Community Survey Data Entry Training
Catch and Landings statistics
Programming.
ETS WG meeting 6-7 September 2006
Software Engineering Experimentation
By A.Arul Xavier Department of mathematics
After the Count: Data Entry and Cleaning
Presentation transcript:

Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning Stephanie Stuck MEA Vienna November 5/6 th

2 General philosophy  Respondents are experts of their own lives, in general we (still ) take their answers very seriously  Only change data if you are sure it is wrong, if answers seem implausible but you are not sure what to do  indicate this via flag variable

3 General rules  Please use data files with original sampid to check and correct data (don’t use data version with sampid2)  Always write programs to correct data (STATA do or SPSS sps files) please never change data directly (e.g. no changes in editors)

4 General rules  Keep original variables (name: "varname_original”)  Add flag variables to indicate changes (name: "varname_flag)  Save corrected data files with new name (e.g. “filename_corrected”)

5  don’t always take wave 1 information for granted, it can be wrong, too  sometimes we will have to change wave 1 data, too  we will have another release of wave 1 data together with the public release of wave 2  Probably we will already have a minor update of release early next year General rules

6 Very next steps  Check for country specific deviations! e.g. especially routing errors, ep071, ep098, hc module etc.  Send information on all country specific deviations to MEA, please don’t forget an English translation or explanation of deviations  Information on important deviations in central variables should be available to all FRB authors together with release 0

7 Very next steps Check financial amounts for implausible values, e.g. negative or very high amounts  outliers  zero values  wrong currencies  typing errors  “drunken interviewers” problem also consider frequencies of payments etc.

8 Wrong sampid, cvid or respid MEA already checks for mismatches within and between waves  Please ask survey agencies and send all information you have on renamed cases, mismatches etc. to MEA  Whenever you find new information on mismatches e.g. in remarks send the information to MEA  Please send data files with old and new ids for renamed cases to MEA, provide information on date and reason (if possible) in additional variables  Sometimes only the CV or only the individual modules (DN etc.) have to be renamed (especially but not only if respondents are exchanged within households). Please don’t forget to provide information where changes have to be done. MEA will correct files and send lists with hard cases to country teams to check/ask survey agencies again

9 General checks  Corrections based on checks of frequency distributions, e.g. outliers, values out of range  Corrections based on consistency checks  within and between modules and waves

10 More concrete  Check for empty cases  Check for duplicates  Check year of birth between coverscreen (cv_r and cv_h) and dn module, drop-offs and vignettes respectively, and possibly with the gross sample  Check gender CV/DN vs. drop-off/vignettes  Check for consistency of dates:  Check information on marital status:  Check respondent dummies  Check ch module against coververscreen  Check relation to coverscreen respondent

11 Interviewer remarks  Go through remarks  a lot of them are not helpful, but some are very important (e.g. exchanged respondent, amounts apply to all familiy members, different time horizons etc.)  Categorize problems as much as possible  Write programs to correct data if possible  Flag cases where unsure  Collect information on questions that caused a lot of problems / didn’t work for future waves

12 Open questions  Go through open questions and code answers into original values if possible  Priority list of variables education, employment status

13 How to go on  Your experience is very appreciated  Please send information on what you have done, what problems you found etc. to MEA  MEA will send out more information, results of our discussion now, ‘checking lists’, ‘common problems’, etc.  We should have another meeting/workshop maybe in February or we could have an extra meeting e.g. in Mannheim