Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.

Slides:



Advertisements
Similar presentations
Statistics for Improving the Efficiency of Public Administration Daniel Peña Universidad Carlos III Madrid, Spain NTTS 2009 Brussels.
Advertisements

Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and.
Generalised linear mixed models in WinBUGS
MCMC estimation in MlwiN
Non response and missing data in longitudinal surveys.
Multilevel Multivariate Models with responses at several levels Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Lecture 11 (Chapter 9).
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
METHODS FOR HAPLOTYPE RECONSTRUCTION
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 A Common Measure of Identity and Value Disclosure Risk Krish Muralidhar University of Kentucky Rathin Sarathy Oklahoma State University.
Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Latent Growth Curve Modeling In Mplus:
By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.
Chapter 13 Additional Topics in Regression Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition

Statistics for Managers Using Microsoft® Excel 5th Edition
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Samples vs. Distributions Distributions: Discrete Random Variable Distributions: Continuous Random Variable Another Situation: Sample of Data.
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
Chapter 11 Multiple Regression.
GRA 6020 Multivariate Statistics Probit and Logit Models Ulf H. Olsson Professor of Statistics.
Jointly Distributed Random Variables
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
 1  Outline  stages and topics in simulation  generation of random variates.
3-2 Random Variables In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment, a variable whose measured.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Modelling non-independent random effects in multilevel models Harvey Goldstein and William Browne University of Bristol NCRM LEMMA 3.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
CS Statistical Machine learning Lecture 24
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Tutorial I: Missing Value Analysis
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
STA 216 Generalized Linear Models Instructor: David Dunson 211 Old Chem, (NIEHS)
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Probability plots.
Bayesian Semi-Parametric Multiple Shrinkage
Advanced Higher Statistics
Multiple Imputation using SOLAS for Missing Data Analysis
STA 216 Generalized Linear Models
Maximum Likelihood & Missing data
How to handle missing data values
School of Mathematical Sciences, University of Nottingham.
STA 216 Generalized Linear Models
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Task 6 Statistical Approaches
Missing Data Mechanisms
Non response and missing data in longitudinal surveys
Clinical prediction models
Presumptions Subgroups (samples) of data are formed.
Presentation transcript:

Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid procedure for discrete variables Useful to include sampling weights Can we deal with partially missing data?

Consider the imputation stage with a set of multivariate responses We illustrate first with a simple model where the response joint distribution is MVN and there are responses at 2 levels To illustrate how such a model is specified consider repeated measures of childrens heights: level 2 is the childs adult height.

Child heights + adult height Child height as a cubic polynomial with intercept + slope random at level 2 and both correlated with adult height random effect to give 3-variate normal. This allows us jointly to model level1 and level 2 variables with missing data. (see Goldstein and Kounali, JRSSA, 2009)

Results: Thus, if data are missing at either level 1 or level 2 they will get imputed via the MCMC algorithm.

Mixed response types For ordered, or unordered categorical data we can specify corresponding latent normal distributions. For ordered response we can consider a probit threshold model s.t. –the cumulative probability of being in one of the categories 1,…,s is and the associated latent normal model is For a p – category unordered response we can define a latent p-1 variate normal We can define MCMC steps to sample form observed categorical responses an underlying normal or MVN. Note that these are further conditioned on the remaining set of (correlated) normal variables. For details see Multilevel models with multivariate mixed response types (2009) Goldstein, H, Carpenter, J., Kenward, M., Levin, K. Statistical Modelling (to appear)

Imputation So now with any mixture of categorical and normal variables at any level, we sample, for each MCMC iteration, a MVN set of variables including imputed values. Thus imputation is standard and the reverse transformation is used to obtain imputed variables on the categorical scales. For non-normal continuous data we can use e.g. a Box-Cox normalising transformation to sample a latent normal. Further extensions for Poisson and other discrete distributions are also available. Release 2.10 of MLwiN has a link to REALCOM that allows these extensions.

Partially observed (coarsened) data: Where we have a prior (estimated) probability distribution (PD) for a missing discrete (or continuous) variable value we simply insert an extra MCMC step that accepts the standard MI value with a probability that is just the probability given by the PD. A corresponding step is used for normal data. This thus uses all of the data efficiently. No data are discarded so long as it is possible to assign a PD. Applications in record matching, rating scales with uncertain responses etc. Several completed data sets are produced and combined as in standard MI

Sampling weights- briefly Consider a 2-level model: Write level 2 weights as Level 1 weights for j-th level 2 unit as Final level 1 weights We use as the level 1 random part explanatory variable instead of the constant =1 This will be used for imputation and for MOI Ongoing work to incorporate this into MLwiN-REALCOM