Presentation is loading. Please wait.

Presentation is loading. Please wait.

A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.

Similar presentations


Presentation on theme: "A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001."— Presentation transcript:

1 A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001

2 Imputation Techniques Implemented in SOLAS 3.0 SINGLE IMPUTATION Hot Decking Predicted Mean Imputation Last Value Carried Forward MULTIPLE IMPUTATIONS Propensity Score Based Imputation Predictive Model Based Imputation

3 Method 1: Propensity Score Based Imputation This was the only Method in Version 1. Method similar to Lavori,Dawson,Shera (1995) “A multiple imputation strategy for clinical trials with truncation of patient data” GOAL: To impute Missing values by minimal Distributional Assumptions

4 How it Works Let R be the indicator for the missingness pattern (R=0 or 1) X 1 X 2 ………. X P Y ??..???..? R 1111..00..01111..00..0 Model R from X 1, X 2,..., X P using logistic regression p=Prob(R=1| X 1, X 2,…, X P ) for each case yielding N p i ’s.

5 How it works…. (Approximate Bayesian bootstrap, Rubin, 1987) Group (user specified) the units by the value of the quintiles of p. Suppose that within a particular group there are n 1 observed and n 0 missing values. Quintiles of p

6 s ample n 1 +n 0 units with replacement from the observed values. From the sampled pool, subsample n 0 units with replacement Use these n 0 units as the imputed values for the n 0 missing values Repeat the procedure m times to get m imputations with replacement with replacement n 1 obs n 0 + n 1 n 0

7 Theoretical Justification It produces an imputed distribution of Y that has been corrected for biases due to missingness related to X. It's similar in spirit to reweighting but here we have a multiple imputation version of it. The method produces unbiased estimates for marginal distribution of Y.

8 Problems/Drawbacks The method does not preserve the association between Y and individual X i ’s. Reasoning: The only aspect of X i ’s that is used here is the linear prediction for Y (  0 +  1 X 1 + 2 X 2 …. + p X p ) in the logistic model. This is the function that predicts missingness of Y (R) but not Y itself.

9 Problems/Drawbacks (Continued….) Suppose X 1 is highly correlated with Y but is unrelated to P(R=1). X 1 will drop out of the the logistic model and it is not used in the imputation. As a result, the model will misrepresent the correlation of X 1 and Y. Suppose X 1 is highly correlated with Y but is unrelated to P(R=1). X 1 will drop out of the the logistic model and it is not used in the imputation. As a result, the model will misrepresent the correlation of X 1 and Y. Also, by not using X 1 in the imputation, we are failing to impute Y efficiently.

10 Simulation Results Using SOLAS 1.1 Data Generation Mechanism: Y=X+Z+ , whereand  ~  (0,1) Source: Paul D. Allison “Multiple Imputation for Missing Data, A Cautionary Tale”

11 Some Comments About the Propensity Score Based Method The method can provide valid but possibly inefficient inferences about Y (marginal). The method can lead to very misleading inferences about the relationships between Y and other variables.

12 Method 2: Predictive Model Based Multiple Imputation This method is implemented in SOLAS 2.0 and 3.0 HOW IT WORKS: Regress Y on X 1, X 2,…, X p Get the estimates of  0,  1,  2,….  p and   Draw  0 *,  1 *,  2 * ….  p *,  * from an approximate posterior distribution Impute Y * =  0 * +  1 * X 1 + 2 * X 2 …. + p * X p + * where  * Normal(0,  * ) Repeat m times to get the m imputed datasets

13 Good points The method provides correct model based MI under the regression model and MAR It also preserves the correlation between X i and Y It also preserves the correlation between X i 's and Y What is the difference with NORM ? NORM does the same thing with MCMC Under multivariate normal model, both methods give the same results

14 Which Software is More General ? I work for arbitrary missingness pattern I work for non-linear relation of y on X But that’s probably very similar to norm with rounding

15 Concluding Remarks SOLAS is the first commercial missing data software. It has good graphical interface. Easy data import and export to other softwares. Performs well under monotone missingness pattern. Estimates are not always unbiased.


Download ppt "A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001."

Similar presentations


Ads by Google