Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

MCMC estimation in MlwiN
METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
HSRP 734: Advanced Statistical Methods July 24, 2008.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population Mean:  Unknown Population Mean:  Unknown n Determining the.
1 Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study Elizabeth S. Garrett Department of Biostatistics.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Chapter Seventeen HYPOTHESIS TESTING
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
Identifying essential genes in M. tuberculosis by random transposon mutagenesis Karl W. Broman Department of Biostatistics Johns Hopkins University
Bayes Factor Based on Han and Carlin (2001, JASA).
Methods for Evaluating the Performance of Diagnostic Tests in the Absence of a “Gold Standard:” A Latent Class Model Approach Elizabeth S. Garrett Division.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
Sections 6-1 and 6-2 Overview Estimating a Population Proportion.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population.
Confidence Interval & Unbiased Estimator Review and Foreword.
Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
INTRODUCTION TO CLINICAL RESEARCH Introduction to Statistical Inference Karen Bandeen-Roche, Ph.D. July 12, 2010.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Logistic regression (when you have a binary response variable)
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Probability and Statistics Confidence Intervals.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
1 Getting started with WinBUGS Mei LU Graduate Research Assistant Dept. of Epidemiology, MD Anderson Cancer Center Some material was taken from James and.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Markov Chain Monte Carlo in R
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
One-Sample Inference for Proportions
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Introducing Bayesian Approaches to Twin Data Analysis
Ch3: Model Building through Regression
Latent Variables, Mixture Models and EM
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Markov Networks.
CONCEPTS OF ESTIMATION
Chapter 8 Interval Estimation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ch13 Empirical Methods.
Estimating the Value of a Parameter Using Confidence Intervals
Generally Discriminant Analysis
Chapter 9 Estimation: Additional Topics
Yalchin Efendiev Texas A&M University
Presentation transcript:

Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University

Overview Latent class models can be useful tools for measuring latent constructs. Latent class model checking is somewhat complicated because we cannot “check” model fit using standard approaches which rely on comparing fitted values to observed. After fitting a latent class regression model, what can we do to see if we obey several key assumptions? –Conditional independence? –Non-differential measurement?

What is the association between depression and socio-economic status? Epidemiologic Catchment Area (ECA) Study N=1126 in 1993 in Baltimore Symptoms (DSM-IV): –dysphoria –weight/appetite change –sleep problems –slow/increased movement –loss of interest/pleasure –fatigue –guilt –concentration problems –thoughts of death Covariates of interest –gender –age –marital status –education –income How are education and income associated with depression? From standard LC model fit: –The symptoms listed at left define depression –Depression is a latent class variable with 3 classes –Classes are “ordered”: None Mild Severe

Latent Class Regression Model: Main Ideas There are J classes of individuals. p j represents the proportion of individuals in the population in class j (j=1,…,J) Each person is a member of one of the J classes, but we do not know which. The latent class of individual i is denoted by c i. Symptom prevalences vary by class. The prevalence for symptom m in class j is denoted by  mj. We assume that covariates, x, are associated with class membership Given class membership, the symptoms are independent of each other Given class membership, the symptoms are independent of covariates NON-DIFFERENTIAL MEASUREMENT CONDITIONAL INDEPENDENCE

Assumptions Conditional Independence: –given an individual’s depression class, his/her symptoms are independent –P(y ig, y ih | c i ) = P(y ig | c i ) P(y ih | c i ) Non-differential Measurement: –given an individual’s depression class, covariates are not associated with symptoms –P(y ig | x i, c i ) = P(y ig | c i ) Latent Class Regression Likelihood where

Latent Class Regression Results Class 1: Non- depressed Class 2: Mild depression Class 3: Severe depression dysphoria loss of interest/pleasure weight/appetite change sleep problems slow/increased movement fatigue guilt concentration problems< thoughts of death Class Size

Depression Example: LCR coefficients (log ORs) * indicates significant at the 0.10 level Note: class 1 is non-depressed,class 2 is mild, class 3 is severe

Checking Conditional Independence Assumption For each pair of symptoms (h and g), in each class (j), consider If assumption holds, this OR will be approximately equal to 1. (The log OR will be approximately equal to 0). Why may this get tricky? –We don’t KNOW class assignments. –Need a strategy for assigning individuals to classes. Checking Non-differential Measurement Assumption For each symptom (h), covariate (x), and class (j) combination, we can estimate an odds ratio. Example in the binary covariate case:

Model Estimation: Markov Chain Monte Carlo procedure Bayesian Approach Quantify beliefs about p, , and c before and after observing data. Prior Probability: What we believe about unknown parameters before observing data. Posterior Probability: What we believe about the parameters after observing data. Model specifications: –Specify prior probability distribution: P(p, , c) –Combine prior with likelihood to obtain posterior distribution: P(p, , c|Y)  P(p, , c) x L(Y| p, , c) –Estimate posterior distribution for each parameter using iterative procedure. P(p 1 |Y) =  P(p, ,  |Y)

Bayesian Estimation Approach The Gibbs Sampler is an iterative process used to estimate posterior distributions of parameters. –we sample parameters from conditional distributions e.g. P(p 1 |Y, p, c,  ) –At each iteration, we get ‘sampled’ values of p, , and c. –We use the samples from the iterations to estimate posterior distributions by averaging over other parameter values. This is a key feature for these methods!

Checking Assumptions: MCMC (Bayesian) approach At each iteration in the Gibbs sampler, individuals are automatically assigned to classes no need to “manually” assign. At each iteration, simply calculate the log OR’s of interest. Then, “marginalize” or average over all iterations. Result is posterior distribution of log OR From posterior distribution, we have both a point estimate and precision estimate of the log OR. We can calculate “posterior intervals” (similar to confidence intervals) to see if there is evidence of violation of assumptions.

Checking Conditional Independence

Checking Non-Differential Measurement

Implementation “Canned” implementation: –BUGS (unix and linux) –WinBugs (windows) – Scripts can be (have been) written in –R, Splus –SAS

Checking Assumptions: Maximum Likelihood Approach Using ML approach, we can get a result that will likely be quite similar –(a) assign individuals to “pseudo-classes” based on posterior probability of class membership –(b) calculate OR’s within classes. –(c) repeat (a) and (b) at least a few times –(d) compare OR’s to 1. Drawback: –In ML, additional post hoc computations are necessary. –Don’t get precision estimates as you do in MCMC approach. –MCMC approach is designed for computing posterior distribution of functions of parameters.