Some birds, a cool cat and a wolf

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Handling Missing Data on ALSPAC
Managerial Economics in a Global Economy
Treatment of missing values
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Missing Data Analysis. Complete Data: n=100 Sample means of X and Y Sample variances and covariances of X Y
Approaches for Addressing Issues of Missing Data in the Statistical Modeling of Adolescent Fertility Dudley L. Poston, Jr. Texas A&M University & Eugenia.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
Concept of Measurement
Sample size computations Petter Mostad
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Today Concepts underlying inferential statistics
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Chapter 14 Inferential Data Analysis
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Fundamentals of Data Analysis. Four Types of Data Alphabetical / Categorical / Nominal data: –Information falls only in certain categories, not in-between.
Collecting Quantitative Data
Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Trends in Inequality of Educational Opportunity in the Netherlands : The Effect of Missing Data Maarten L. Buis & Harry B.G. Ganzeboom Department.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SAMPLE SELECTION in Earnings Equation Cheti Nicoletti ISER, University of Essex.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Item-Non-Response and Imputation of Labor Income in Panel Surveys: A Cross-National Comparison ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Tutorial I: Missing Value Analysis
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
Introduction to Multivariate Data Analysis Pekka Malo 30E00500 – Quantitative Empirical Research Spring 2016.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Handling Attrition and Non-response in the 1970 British Cohort Study
Multiple Imputation using SOLAS for Missing Data Analysis
MISSING DATA AND DROPOUT
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Presenter: Ting-Ting Chung July 11, 2017
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
Missing Data Mechanisms
15.1 The Role of Statistics in the Research Process
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Chapter 13: Item nonresponse
Missing data: Is it all the same?
Presentation transcript:

Some birds, a cool cat and a wolf Tricks of the trade Some birds, a cool cat and a wolf Dick Wiggins, City University, London Gopal Netuveli, Imperial College, University of London RSS Official Statistics/Statistical Computing Section 18th May 2005

Acknowledgments Economic and Social Research Council Human Capability and Resilience Network

Missing data is a pervasive fact of life.

Sample dataset 100 records randomly selected from British Household Panel Survey with the condition that all cases had complete information on age, sex and socio-economic position. The data contains variables selected from wave 1 and wave11.

Terminology Unit nonresponse: complete absence of any information from a sampled individual or case. Item nonresponse: an individual who cooperates but for some reason has missing values for certain items. Attrition: In longitudinal data, attrition is the cumulative rate of unit nonresponse across waves.

Levels of measurement Nominal Ordinal Interval Ratio Values are just names e.g. 1 = male 2 = female Ordinal Inherent ranking, but intervals are not equal e.g. RG’s social class Interval Numerical, intervals are meaningful, but no zero e.g. temperature scales Celsius and Farenheit Ratio Numerical, meaningful intervals, zero defined e.g. height, income

How is your measure distributed? The distribution of the measure is important and needs to be specified.

Pattern of missingness -monotone Percentage of missingness (Lambda) = number of missing values/number of values *100 Pattern of missingness -monotone Lambda for both monotone and non-monotone missingness = 820/3500 = 23.4

Process of missingness Missing completely at random (MCAR) assumes that missing values are a simple random sample of all data values. Missing at random (MAR) assumes that missing values are a simple random sample of all data values with in subclasses defined by observed data. Missing not at random (MNAR)

MCAR, MAR, MNAR Let Y represent the data which actually consists of Yobs (observed data) and Ymis (missing data) Let the missingness be described by a binary variable R R = 1 if data is missing, 0 otherwise Then a simple way of describing the pattern of missingness will be by evaluating the probability P(R=1) using the data Y. P(R=1|Y) In MCAR we can not evaluate that probability using Y In MAR we assume we can evaluate the probability using Yobs, Ymis is not needed In MNAR, we need both Yobs & Ymis to evaluate the probability

Dick’s menagerie The Ostrich The Hawk The Cuckoo The Owl The Pussycat The Wolf

The Ostrich aka Listwise Deletion Ignores missingness i.e. assumes MCAR and drops all cases with missing values. The Hawk aka ad hoc methods Ad hoc methods used are pairwise deletion, mean substituition, last value carry forward

The Cuckoo aka hot decking Like the cuckoo, hot decking ‘steals’ from other complete records to replace missing records The choice of the complete record is based on a set of observed variables so that the complete and the missing records are as much similar as possible Substituting from an adjacent record is a very simple application of this principle on the assumption that adjacent records will be very similar

The Owl aka Multiple imputation Works with standard complete-data analysis methods One set of imputations may be used for many analyses Can be highly efficient

Efficiency= 1/(1+(proportion missing/No. of imputations))

Rubin’s rule for combining estimate Point estimate: Average of point estimates from each imputed sample Variance estimate: Average of within imputation variance + between imputation variance inflated by a factor equal to (1+(1/number of imputations))

The Pussy Cat – Modelling (Heckman 2 step procedure) What is modelled? The probability of having a missing value based on fully observed characteristics (e.g. age, sex, socio-economic status) AND The model of interest (e.g. predictors of casp19)

Equations P(R=1) = f (age, sex, ses) Step 1 CASP-19= f (age, sex, financial situation, social network, P(R=1)) Step 2

Strengths and weaknesses Strength: Useful for sensitivity analysis. If the error terms in step 1 and step 2 are significantly correlated then MNAR should be considered. Weakness: Full information needed on variables in step 1

Setting up the illustration in STATA Listwise: default Hotdeck single imputation Multiple imputation m=5 Heckman ML

Comparison of results from different methods used to manage missingness Significant coefficients are emboldened Hot deck stratification by agegr & sex Heckman sample equation = -0.08 agegr+0.06 sex+ -0.12 ses Rho (correlation of errors terms in selection and sustantive equations) significantly different from 0. (p <0.0001). MNAR to be considered.

Advice Don’t be an Ostrich Ignore the Hawk Be the Cuckoo if Lambda is small Otherwise, use the Owl Always stroke the Pussy Cat Await the Wolf