# Methods for Evaluating the Performance of Diagnostic Tests in the Absence of a Gold Standard: A Latent Class Model Approach Elizabeth Garrett-Mayer Division.

## Presentation on theme: "Methods for Evaluating the Performance of Diagnostic Tests in the Absence of a Gold Standard: A Latent Class Model Approach Elizabeth Garrett-Mayer Division."— Presentation transcript:

Methods for Evaluating the Performance of Diagnostic Tests in the Absence of a Gold Standard: A Latent Class Model Approach Elizabeth Garrett-Mayer Division of Biostatistics The Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University March 5, 2004 esg@jhu.edu http://astor.som.jhmi.edu/~esg

Latent Variables We call variables latent if they cannot be directly measured or observed. Examples of latent variables: Major depression Quality of life Pain IQ Socio-economic status Examples of measured variables: Blood pressure Age Eye color CD4 cell count Number of symptoms on a symptom checklist Vital status

Latent Variables Some debate over latent –Bollen: All variables are latent Approaches to latent variable situations: –Find suitable measured variable Total number of symptoms Pain score on a scale of 1-10 Sum of items on scale

Latent Variables Approaches to latent variable situations (cont.): –Model the latent variable Factor analysis: continuous latent variables Latent class analysis: categorical latent variables Latent trait analysis / Rasch modeling / Item response theory: continuous latent variables Structural equation modeling

Latent Variables When modeling the latent variable –We need some observed variables to inform us about the latent variable –These can be continuous or categorical –Usually, we like at least 3 or 4 –Examples: Depression: sadness, sleeping problems, guilty feelings, etc. Quality of Life: on a scale of 1-10, how much… IQ: multiple questions on exam

Latent variables in relation to gold-standards If we can measure a disease, disorder, or condition exactly, we call this a gold-standard measure. The gold-standard provides a benchmark for evaluating diagnoses based on other approaches Examples: –Mammogram is an imperfect measure of breast cancer. Tissue biopsy is a gold standard measure of breast cancer –A ELISA test is an imperfect measure of HIV infection. The HIV PCR test is a gold-standard measure of HIV infection

Evaluating Diagnostic Criteria However, relatively few areas of medicine have true gold standard tests, where test is perfectly accurate. –Pathognomic indicators –When indicator is present, disease is present –When indicator is absent, disease is absent Other situations: –Combination of signs and symptoms provide very accurate diagnosis. –Disease process is not well understood: controversy exists about how to define diagnosis. –Disease process is well understood but measuring disease via signs and symptoms is difficult.

Diagnostic Criteria in Psychiatry Currently, the DSM (Diagnostic and Statistical Manual of Mental Disorders) is the standard for defining mental disorders. Diagnostic algorithms are provided with which a determination of disorder absence or presence can be made Examples: major depression, schizophrenia, autism, alcoholism, generalized anxiety disorder. All of diagnostic algorithms are constructed to measure latent variables.

Major Depressive Episode, as diagnosed by the DSM-IV (APA, 1994) A. A person who suffers from major depressive disorder must either have depressed mood or a loss of interest or pleasure in daily activities for at least a 2 week period. B. The disorder is characterized by the presence of five or more of the following nine symptoms: 1. depressed mood most of the day, nearly every day, as indicated by either subjective report or observation made by others. 2. markedly diminished interest or pleasure in all, or almost all, activities most of the day, nearly every day. 3. significant weight loss when not dieting or weight gain, or decrease or increase in appetite nearly every day. 4. insomnia or hypersomnia nearly every day. 5. psychomotor agitation or retardation nearly every day. 6. fatigue or loss of energy nearly every day. 7. feelings of worthlessness or excessive inappropriate guilt nearly every day. 8. diminished ability to think or concentrate, or indecisiveness, nearly every day. 9. recurrent thoughts of death, recurrent suicidal ideation without a specific plan, or a suicide attempt or a specific plan for committing suicide. Symptoms are not better accounted for by bereavement, the symptoms persist for longer than 2 months or are characterized by marked functional impairment, morbid preoccupation with worthlessness, suicidal ideation, psychotic symptoms, or psychomotor retardation.

How do we validate the DSM criteria? How can we be sure that these definitions are valid measures? How can we determine the sensitivity and specificity of these measures? Is there a gold standard? Is psychiatrists diagnosis a gold standard? What types of individuals are the diagnostic criteria diagnosing as depressed? How often are individuals misdiagnosed? What are the implications of a positive or negative diagnosis?

Example: Major Depression Epidemiologic Catchment Area Study (ECA): Collected mental health data on individuals in 5 cities, beginning in 1981. Our sample: epidemiologic sample of 1322 individuals in the East Baltimore area collected in 1993 (wave 3). Depression questions are from Diagnostic Interview Schedule, which has been shown to be valid and reliable (Robins et al., 1981) Present if the symptom occurred within two weeks of the interview Symptom groups: some questions ask about the same type of symptom: –Have you had trouble sleeping? –Have you had trouble waking? –Do you sleep too much ? Related symptoms are categorized into the same symptom group.

Distribution of Symptoms GroupSymptomPrevalence 1Depressed mood0.06 2Disinterest in sex0.08 Less fun Loss of enjoyment 3Reduced energy/fatigued 0.05 4Reduced concentration 0.04 Slow thoughts Indecisive 5*Feel inferior0.03 Lacking self- confidence 6Guilty/sinful0.02 GroupSymptomPrevalence 7Ideas of self-harm0.05 Want to die Suicidal thoughts Suicide attempts 8Trouble falling asleep 0.09 Trouble waking Sleep too much 9Loss of appetite0.08 Weight loss Increased appetite Weight gain 10Slow movement0.04 Fast movement fidgety ECA Wave 3, N = 1322

Evaluating the DSM Criteria Without an available gold standard, we resort to other methods Suppose that the proposed symptom (groups) define depression. Without relying on the DSM definition of depression but imposing model assumptions, what types of symptom patterns are observed in the data? Do individuals tend to cluster into categories based on symptom response patterns? We can evaluate this using a Latent Class Model. Categorical analog of factor analysis.

The Latent Class Model Assumes that each individual in the population is a member of one of M latent classes. Each of the classes is defined by a vector of symptom prevalences, p m = (p 1m, p 2m, …p Km ) where there are K symptoms, m = 1,…,M. The vector y i = (y i1, y i2, …., y ik ) is individual is binary vector of symptom responses, i = 1,…,N. The proportion of individuals in class m is denoted by m. The true, yet unobserved, latent class of individual i is denoted by η i, where η i {1,2,..,M}. The symptoms define the latent variable of interest. M is fixed. Conditional Independence: Given class membership, symptoms are independent.

class 1 (η = 1) class 2 (η = 2) class 3 (η = 3) p 11, p 21, …,p K1 p 13, p 23, …,p K3 p 12, p 22, …,p K2 y i1, y i2, …,y iK Graphical Depiction of the Latent Class Model

Statistical Details Probability distribution of Y i : Likelihood function:

Interpretation Two class model: –A non-depressed class which reports on average no symptoms (93% of sample) –A depressed class which reports on average 4 to 5 of the 10 symptoms Three class model: –A non-depressed class which reports on average no symptoms (88% of sample) –A mildly depressed class which reports on average 2 to 3 of the 10 symptoms (9% of sample) –A severely depressed class which reports on average 6 to 7 of the 10 symptoms (3% of sample) The three class model is deemed more appropriate from a statistical standpoint (model fit, adherence to model assumptions) (Garrett and Zeger, 2000)

Results of Estimation p matrix vector 3 Posterior probability of class membership: –Tells us probability that individual i is in one of the classes, given his response pattern.

Examples: Assume M = 2 Individual reports absence of all symptoms:

Examples: Assume M = 2 Individual reports only fatigue and sleep problems: Individual reports all symptoms except self-esteem and guilt:

Estimation Options Maximum Likelihood Approach –Widely available –Accepted approach Bayesian Approach –Markov Chain Monte Carlo estimation –Easily implemented in WinBugs ( Imperial College of Science, Technology and Medicine: http://www.mrc-bsu.cam.ac.uk/bugs/) –Benefits: Model checking methods Identifiability can be assessed (Garrett and Zeger, 2000) ­ MCMC approach allows estimation of ANY function of parameters and standard errors.

Bayesian Estimation Approach The Gibbs Sampler is an iterative process used to estimate posterior distributions of parameters. –we sample parameters from conditional distributions e.g. P( 1 |Y,p,, 2, 3 ) –At each iteration, we get sampled values of p,, and. –We use the samples from the iterations to estimate posterior distributions by averaging over other parameter values.

Evaluating Depression Diagnosis Assumption: Treat the latent class model as our gold standard definition of depression. We can use the symptom responses to evaluate the DSM-IV diagnosis of depression Compare the DSM diagnosis to the latent class diagnosis using standard definitions: Assume two classes of depression –Class 1 is non-depressed class –Class 2 is depressed class

More specifically… where {y r : r R} is the set of symptom patterns that are classified as a diagnosis by the DSM-IV.

Predictive Values Positive and Negative Predictive Values are simply transformations of SE and SP:

Class assignment? Complication: latent class model provides us with posterior probabilities of class membership. We dont know the true latent classes, η, for individuals in the dataset. Example: M =3 –Posterior probabilities of class membership for a particular symptom pattern are 0.48, 0.48, 0.04. –To which class should this individual be assigned? –How do we account for the uncertainty in the assignment?

One Approach to Class Assignment Pseudo-classes (Maximum Likelihood) –assign individuals to pseudo-classes based on posterior probability of class membership (Bandeen-Roche et al., 1997) –e.g. individual with posterior probabilities of 0.20, 0.05, 0.75 better chance of being in class 3 not necessarily in class 3 Using class assignment, we can calculate sensitivity and specificity We can repeat assignment procedure T times, where T is large. On average, the sensitivity and specificity estimates will be correct. Drawback: we dont get precision associated with estimates. Standard deviation of repeated estimates does not account for imprecision in estimates of p and Confidence intervals based on the T repeats will be too narrow.

MCMC Approach to Class Assignment η is a vector of parameters At each iteration in the Gibbs sampler, each parameter is drawn from its conditional distribution At each iteration in Gibbs sampler, individuals are automatically assigned to classes no need to manually assign. For each of the W iterations of the chain, we can calculate sensitivity and specificity. Sensitivity and specificity are simply additional parameters. ­Due to the nature of the MCMC approach, the standard deviation of the posterior interval of sensitivity represents its standard error. Precision estimates for sensitivity and specificity are valid.

Operating Characteristics of Depression Diagnoses Several definitions of depression: –DSM-III –DSM-IV –ICD-10a (mild) –ICD10b (moderate) –ICD10c (severe) We calculate sensitivity and specificity for each of five diagnoses (above) for models with M = 2 and M = 3. We do the same for PPV and NPV. Vertical lines represent 95% posterior intervals.

Interpreting results from three class model Diagnoses only have two possibilities: depressed or not depressed Two class model also has two possibilities. ­Three class model has a non-depressed class and two depression classes (mild and severe). Should we think of BOTH or just SEVERE as the treatment class. Why does it matter? –Clinical decision making –Pre-clinical depression? Which is better?

Misclassification probabilities for identifying severe depression using the DSM-IV criteria Two-class modelThree-class model P(false positive)< 0.0010.004 P(false negative)0.0350.002 P(misclassification)0.0350.006 Misclassification probabilities for identifying any depression using the DSM-IV criteria Two-class modelThree-class model P(false positive)< 0.001 P(false negative)0.0350.078 P(misclassification)0.0350.078

Revisiting questions…. Recall that three class model was chosen versus the two class model as more appropriate. We answer questions posed earlier by examining agreement of DSM-IV and the three class model.

What types of individuals are the diagnostic criteria diagnosing as depressed? DSM-IV tends to diagnose individuals who are in class 3 of the three class model (i.e. our severe depression class) The mildly depressed class tends to be ignored. Not necessarily a bad thing: –DSM criteria are developed for deciding treatment. –If mild depression does not require any treatment, then diagnosis of DSM-IV is adequate. But what if: –Class 2 individuals (ie mildly depressed) would benefit from treatment. –Class 2 is a pre-clinical class: intervention could prevent transition to severe depression

How often are individuals misdiagnosed? Assuming that diagnosis of severely depressed individuals is intent of DSM-IV, there is LOW probability of misclassification: P(misclassification) = 0.006 If intent is to diagnose ANY depression (i.e., mild or severe), then there is much higher probability of misclassification: P(misclassification) = 0.078 (Note that of these 8%, almost all are false negatives)

What are the implications of a positive or negative diagnosis? The DSM-IV has high PPV for severe depression: PPV(3) 0.90 High NPV for no depression: NPV(1) 0.90 Essentially no information is provided as to an individuals likelihood of mild depression given either a negative or a positive diagnosis: PPV(2) 0.10 NPV(2) 0.10

Issues and Concerns Operating characteristics assume that two types of diagnosis being compared are determined independently. –Methods of assessment are different –But, large overlap of symptoms –Possibly/probably not truly independent

Issues and Concerns Conditional independence of tests given simply presence or absence of disease is a common problem. –Tests may be independent given continuum level of disease, but not when disease status is simply categorized. –However, the latent class model does not definitively assign individuals to classes. Instead, posterior probability is estimated –Because individuals are assigned posterior probabilities, we can more easily think of a continuum of disease. –This is true even in the case of classes which are not ordinal in nature, because the posterior probabilities for each class will be continuous.

Conclusions DSM-IV appears to be a valid approach for diagnoses of severe depression. There appears to be another class of milder depression that is not identified by any of the depression definitions. By using an MCMC approach to latent class model estimation, we can estimate operating characteristics of tests and their standard errors in a straightforward way. This approach can be used quite generally for other medical diagnoses –Psychiatric diagnoses –Arthritis More information? –esg@jhu.edu –http://astor.som.jhmi.edu/~esg/talks.html –Garrett ES, Eaton WW, Zeger S. (2002) Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: a latent class model approach. Statistics in Medicine, 2002 May 15;21(9):1289-307

Download ppt "Methods for Evaluating the Performance of Diagnostic Tests in the Absence of a Gold Standard: A Latent Class Model Approach Elizabeth Garrett-Mayer Division."

Similar presentations