Presentation is loading. Please wait.

Presentation is loading. Please wait.

Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.

Similar presentations


Presentation on theme: "Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly."— Presentation transcript:

1 Missing Data.

2 What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly collected so deleted Outcomes and/or Explanatory variables

3 Effect of Missing Data Can cause –Biased estimates, means, regression parameters –Biased standard errors, resulting in incorrect P-values and CI

4 Missing data mechanism 1. Missing Completely At Random : MCAR –Missing does not depend on observed or unobserved values –Eg. Missing FBC because a tube with blood material is accidently broken –BP missing due broken machine

5 Missing data mechanism 2. Missing At Random : MAR –Missing depends on observed data, but not on the unobserved data. –Eg. 18-25 year olds are less likely to respond to a follow up postal questionnaire – more likely to change address several times

6 3. Missing Not At Random: MNAR –Given all available observed information, the probability of being missing still depends on the unobserved data –Eg. Patient misses an appointment because they feel ill. This illness (e.g.flu) is related to the measurement intended to be made (e.g temperature) Missing data mechanism

7 The Assumptions –Cannot tell from data at hand whether the missing values are MCAR, MNAR or MAR –Can distinguish between MCAR and MAR –MAR can be made more likely by looking at associations between missing values and non missing observations in explanatory variables

8 Simple methods to handle missing data Complete Case (CC) analysis Mean Imputation Regression imputation Stochastic Imputation Problem: Makes results too certain

9 Multiple Imputation (MI) Under MAR assumption, gives less biased estimates and SEs, when compared to CC Covers many different data structures Never absolute best thing to do

10 Multiple Imputation (MI) IDx1x2 132.4204 2.5.8 326.7308 413.315.9 5.10.4 610.16.0 IDx1x2 132.4204 25.8 326.7308 413.315.9 510.4 610.16.0 ? ? 14.2 6.8 15.913.34 10.45 6.010.16 30826.73 5.82 20432.41 x2x1ID 12.2 5.6

11 Express our uncertainty about missing data by creating ‘m’ imputed data sets Analyse each of these in usual way Combine estimates using particular rules (Rubin’s rules) Key Idea behind Imputation

12 Two variables: X1 and X2 –X1 missing in some records –X2 not missing, observed in every unit Learn relationship between X1 and X2 Complete data set by drawing the missing observations from X1 | X2

13 Example 1 Longitudinal Breast Cancer study –Outcome: Early death or disease recurrence –Explanatory variables: age, meno, tam Cox regression

14 How much is missing? variables with no mv's: id meno rectime censrec _st _d _t _t0 lnt Variable | type obs mv variable label -------------+------------------------------------------------ age | float 554 132 age, years tam | byte 557 129 hormonal therapy -------------------------------------------------------------- N: 686

15 CC Analysis Cox regression -- Breslow method for ties No. of subjects = 452 Number of obs = 452 No. of failures = 193 Time at risk = 1412.848734 LR chi2(3) = 5.15 Log likelihood = -1073.9288 Prob > chi2 = 0.1613 ------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age |.993877.0108284 -0.56 0.573.9728787 1.015328 tam |.723719.1162513 -2.01 0.044.528252.991514 meno | 1.312512.2877824 1.24 0.215.85402 2.017151 ------------------------------------------------------------------------------

16 MI in Practice STATA : ICE –Multiple Imputation by Chained Equations (MICE) Univariate imputation - uvis Multivariate imputation - ice

17 0.02.04.06 0.02.04.06 20406080 0 1 Density Graphs by agemiss Age (years)

18 MI Analysis mim: stcox age tam meno Multiple-imputation estimates (stcox) Imputations = 5 Minimum obs = 686 Minimum dof = 69.9 ------------------------------------------------------------------------------ _t | Haz. Rat. Std. Err. t P>|t| [95% Conf. Int.] FMI -------------+---------------------------------------------------------------- age |.985514.010088 -1.43 0.158.965598 1.00584 0.247 tam |.724898.101434 -2.30 0.023.54933.956578 0.191 meno | 1.42128.276051 1.81 0.072.968226 2.08633 0.160 ------------------------------------------------------------------------------

19 Summary Most studies will have missing data MI suitable. Gives less biased estimates, SE, under MAR and MCAR MI is a useful tool for dealing with missing data.


Download ppt "Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly."

Similar presentations


Ads by Google