1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center.

1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center for Devices and Radiological Health No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred. September, 2003

2 Outline Introduction Place for propensity scores Distributions of covariates (details) Distributions of a New Test results (details) Bias of naïve AUC estimation Matching for one covariate Weighted ROC analysis Stratification for one covariate Relationship between AUC by matching and by stratification Propensity score – pre-test risk of disease Conjunction of a New Test with other diagnostic tests

3 ROC Analysis New Test is quantitative. New Test Variable: X for Diseased population Y for Non-Diseased population ROC curve = relationship between sensitivity and specificity of a New Test over all possible cut-off values. The AUC (area under curve) is the most common measure of the test performance. AUC = sensitivity averaged over all values of specificity; specificity averaged over all values of sensitivity; AUC = P{X>Y} probability that a randomly selected Diseased subject has a test value bigger than that for a randomly chosen Non-Diseased subject

4 In order to correctly estimate the diagnostic accuracy of a New Test, we should compare the values of the New Test for Diseased subjects and the values of the New Test for the same Non-Diseased subjects. Each subject has two potential values of the New Test: a value X that would be observed if the subject was Diseased and a value Y that would be observed if the subject was Non-Diseased. But X and Y cannot be observed jointly for same subject. Subject = {New Test, Covariates (e.g., C 1 =Age, C 2 =BMI)} If we were able to assign randomly the subjects to Diseased and Non-Diseased clinical states then Diseased and Non-Diseased groups were comparable in the sense of covariates and diagnostic accuracy of Test was evaluated correctly. But such a random assignment is impossible.

5 Biased estimators of AUC occur if I. Distributions of covariates are different for the Disease and Non-Diseased study groups; and II. Distributions of New Test results are different for different sets of covariates. Problem: Consider M randomly selected Diseased subjects and N randomly selected Non-Diseased subjects. Naïve estimation of AUC is biased (usually overstated). Consider these two situations in more details for one covariate, Age.

6 I. Different Age distributions in Diseased and Non-Diseased study groups. Target Population Age distribution (t 1, t 2, t 3 ). t 1 =0.5; t 2 =0.3; t 3 =0.2 Pre-test risk of Disease (Age) = π1π2π3π1π2π3 0.1 Age 1 0.3 Age 2 0.5 Age 3 π population = π 1 ·t 1 + π 2 ·t 2 + π 3 ·t 3 0.24

7 Age distributions I. Study Groups: M randomly selected Diseased subjects, N randomly selected Non-Diseased subjects. M = m 1 + m 2 + m 3 N = n 1 + n 2 + n 3 E [m i /M] = p i E [n i /N] = q i Diseased Non-Diseased p 1 =0.21; p 2 =0.38; p 3 =0.41 q 1 =0.59; q 2 =0.28; q 3 =0.13

8 I. Study Groups: M randomly selected Diseased subjects, N randomly selected Non-Diseased subjects. Monotonic function of π i, depends on π population and π study. Pre-test risk of Disease in the study (Age i ) = related to the pre-test risk of Disease in the population Pre-test Risk of Disease Study (N=M)Population Age10.260.1 Age20.580.3 Age30.800.5

9 II. The distribution of the Test variable depends on Age. The New Test variables of Diseased subjects: X 1, X 2, X 3 with c.d.f. F 1 (x), F 2 (x), F 3 (x) Non-Diseased subjects: Y 1, Y 2, Y 3 with c.d.f. G 1 (y), G 2 (y), G 3 (y) Example. Disease=Fracture, Non-Diseased=No Fracture, New Test=Ultrasound test for body site. This is a hypothetical relationship between the average ultrasound test and the age. Usually, the ultrasound values becomes lower with increasing of age. PSA test values (for prostate cancer) are increasing with increasing age; BNP test values (for congestive heart failure) are increasing with increasing age.

10 This is a typical picture of the data (ultrasound test for the bone status). I.The age distributions for Diseased and Non-Diseased subjects are different. II. The values of the New Test depend on age. Prostate cancer is more prevalent in older men; Congestive heart failure is more prevalent in older people.

11 PROBLEM: Naïve estimation of AUC is biased (usually overstated). Indeed, Wilcoxon - Mann -Whitney statistic Ψ(A,B) =1 if A>B; ½ if A=B; 0 if A<B area under ROC curve when the Diseased subjects are Age k -years old and the Non-Diseased subjects are Age s -years old. where

12 Example. X 1, Y 1 ~ N(1,1/4) X 2,,Y 2 ~ N(2,1/4) X 3, Y 3 ~ N(3,1/4) New Test does not have diagnostic ability: New Test cannot discriminate Diseased and Non-Diseased subjects in every age group. AUC matrix is Non-diseased Age 1 Age 2 Age 3 Age 1 Age 2 Age 3 Diseased Age distribution of the Diseased subjects is p T =(0.21; 0.38; 0.41); age distribution of Non-Diseased subjects is q T =(0.59; 0.28; 0.13), Two groups, Diseased and Non-Diseased, appear different with respect to the values of the New Test.

13 Example (continued). If the age distribution of the Diseased subjects is p T =(0.21; 0.38; 0.41); age distribution of Non-Diseased subjects is q T =(0.59; 0.28; 0.13), then the mean value of the Wilcoxon-Mann-Whitney statistic, p T AUCq, is 0.68. The matrix element AUC 3,1 =0.98, which corresponds to the biggest age group of Diseased subjects (p 3 =0.41) and the biggest age group of Non-Diseased subjects (q 1 =0.59), makes the largest contribution to the bilinear form p T AUCq, computed for vectors p and q. AUC matrix: Non-diseased Age 1 Age 2 Age 3 Age 1 Age 2 Age 3 Diseased

14 Adjustments for one covariate Three common methods of adjusting for one confounding covariate: –Matching –Stratification –Covariate adjustment through logistic regression

15 Matching Matching of Diseased and Non-diseased subjects means that the age distributions of these subjects are the same. Let the diseased and non-diseased subjects be matched with common age distribution φ T = (φ 1, φ 2, φ 3 ) Theorem. A New Test cannot discriminate Diseased and Non-Diseased populations for each age group. Then the expected value of the Mann-Whitney statistic is 0.5 for any age distribution in the age-matched samples of Diseased and Non-Diseased subjects. Wilcoson-Mann-Whitney statistic correctly evaluates the test performance (area under ROC curve) only for age-matched samples.

16 Matching (continued) By matching, we create a quasi-randomized experiment. That is, if we find two subjects, one in the Diseased and one in Non-Diseased group, with the same pre-test risk of Disease (same age), then we could imagine that there was one subject to whom the value of the New Test was observed when this subject was Diseased and when this subject was Non-Diseased. The age-matched study groups are similar with respect to the Age (AUC for the covariate Age is exactly 0.5). Then we are sure that the difference in the New Test distributions for Diseased and Non-Diseased groups are not due to the difference in age. Problem: The data of unmatched subjects are not used in AUC. Then the weighted ROC analysis should be used.

17 Weighted ROC Analysis Data set: Diseased and Non-Diseased Subjects are not Age-matched. We want to have these two samples be age-matched with the common age distribution φ, where φ k = d k /D (d k = min(m k, n k )). Age distribution Diseased Non-diseased for matching Age 1 d 1 =3 m 1 =3 n 1 =5 Age 2 d 2 =3 m 2 =4 n 2 =3 Age 3 d 3 =1 m 2 =2 n 2 =1

18 Weighted ROC Analysis (continued) For each age Age k, we can take Some set of size d k of m k Diseased subjects. Then we consider all possible sets of matching, estimate AUC for each set, and then take the average of AUC over all these sets. There aredifferent variants. Some set of size d k of n k Non-Diseased subjects. There aredifferent variants. For Age 1, 10 variants; for Age 2, 4 variants; for Age 3, 2 variants. Total number of different matched sets: 80 (=10 x 4 x 2). Using the particular age-matched set of D Diseased and D Non-Diseased subjects, we can estimate age-matched AUC using the Wilcoxon statistic.

19 Weighted ROC Analysis (continued) This is equivalent to the calculation of AUC with all N Diseased subjects with weights d k /m k and with all M Non-Diseased subjects with weights d k /n k : The weighted ROC analysis is equivalent to consideration of all possible variants of age-matching with common age distribution φ. Also, the weighted estimate of AUC can be obtained using the bootstrap technique.

20 Weighted ROC Analysis (continued) Age distribution Diseased Non-diseased for matching Age 1 d 1 =3 m 1 =3 n 1 =5 Weights 1 1 1 3/5 3/5 3/5 3/5 3/5 Age 2 d 2 =3 m 2 =4 n 2 =3 Weights 3/4 3/4 3/4 3/4 1 1 1 Age 3 d 3 =1 m 2 =2 n 2 =1 Weights 1/2 1/2 1

21 Weighted ROC Analysis (continued) The weighted AUC is unbiased estimate of φ-age-matched AUC. The variance of the weighted estimate is: If d k min(m k, n k ) (all weights are not more than 1) then this variance is smaller than the variance for one matching set.

22 Stratification The strata are defined and Diseased and Non-Diseased subjects who are in the same stratum are compared. Diseased Non-diseased Age 1 m 1 =3 n 1 =5 Age 2 m 2 =4 n 2 =3 Age 3 m 2 =2 n 2 =1 AUC 1,1 AUC 2,2 AUC 3,3

23 Stratification (continued) Overall diagnostic accuracy of the New test can be the weighted average of AUC 1,1, AUC 2,2, and AUC 3,3. We can consider the linear combination: where φ is the same as in matching, φ k = d k /D (d k = min(m k, n k )). If AUC 1,1 =AUC 2,2 =AUC 3,3 =AUC, then the weights φ k are similar to the weights inversely proportional to variances of stratum AUC. Is there a relationship between AUC by matching and AUC by stratification ?

24 Example. New Test = Ultrasound test for bone status. The results of the ultrasound test are the normal variables with the means which are different for different ages and with the same standard deviation of 130 m/sec. Means for Diseased (m/sec) Means for Non-Diseased (m/sec) Age 1 4,0054,027 Age 2 3,9043,953 Age 3 3,8853,942 Matrix AUC φ T = (0.2; 0.5; 0.3) AUC by matching: φ T AUCφ = 0.624 AUC by stratification: 0.639

25 Relationship between AUC by matching and AUC by stratification Matrix Δ from previous Example. Theorem. Let φ T =(φ 1, φ 2, φ 3 ) be the age distribution in the age-matched Diseased and Non-Diseased groups. Then, where the matrix Δ is a symmetric matrix with elements For broad class of distributions, AUC by matching AUC by stratification

26 Covariates (C 1, C 2, …, C L ) Matching based on many covariates is difficult. Stratification: As the number of covariates increases, the number of strata grows exponentially.

27 Replace the collection of confounding covariates with one scalar function of these covariates: the propensity score. Propensity score (PS): conditional probability be in Diseased group rather than Non- Diseased group, given a collection of observed covariates. PS (C 1, C 2, …, C L ) = Pr (Disease| C 1, C 2, …, C L ). Propensity Score = Pre-test risk of Disease given a collection of covariates, C 1, C 2, …, C L. Propensity Scores

28 Construction of propensity score (pre-test risk) Logistic regression or others (neural networks,..) Outcome: Disease – 1, Non-Disease – 0. Predictors: all measured covariates, some interaction terms or squared terms, and so on. New Test is not included. AUC for combined covariates – a measure of covariates unbalance. The distributions of X and Y variables, the values of a New Test for Diseased and Non-Diseased groups, depend on the covariates but this dependence is approximated well through the pre-test risk: F (x, C 1, C 2, …, C L ) = F (x, PS(C 1, C 2, …, C L )); G (y, C 1, C 2, …, C L ) = G (y, PS(C 1, C 2, …, C L )).

29 Propensity Scores (continued) Calculate estimated propensity scores (pre-test risk) for all subjects using the propensity score model. Sort all subjects by propensity scores. Divide subjects into strata that have similar PS. Estimate AUC by matching (use weighted AUC) or AUC by stratification. BMI Age m k Diseased n k Non-Diseased Five strata based on logistic regression model of age and BMI (linear terms).

30 Propensity Scores (continued). Example: conjunction of a New Test with other diagnostic tests A New test is used in conjunction with other clinical tests to detect the clinical state Disease. The use of propensity scores technique is convenient tool for the matching based on all available prior information (covariates) about the subjects. Example: Disease= any stenosis during coronary angiography; New Test; C 1 = Age; C 2 = Gender; C 3 = Total cholesterol; C 4 = HDL (good cholesterol) C 5 = LDL (bad cholesterol) In order to correctly evaluate the diagnostic ability of a New Test, matched AUC analysis should be performed. Matching based on propensity score is recommended.

31 Use of matched ROC analysis when New Test results do not depend on the covariates. If the distribution of the New Test results for each strata is the same (F 1 =F 2 =F 3 =F, G 1 =G 2 =G 3 =G) but we do not have any information about that and use the matched ROC analysis. How is the matched estimate of AUC related to the usual empirical estimate? Theorem. The matched estimate of the AUC is unbiased estimate of AUC but the variance of the matched estimate is inflated. Proof based on the Hölders inequality (see [1]).

32 Summary If the results of a New Test depend on covariates and distributions of covariates in Diseased and Non-Diseased groups are different then only matched ROC analysis correctly evaluates the diagnostic accuracy of the New Test. Matching based on propensity scores (pre-test risk of Disease) reduces bias. Propensity score is seriously degraded when important covariates influencing pre-test risk have not been collected. Weighted ROC analysis allows more effectively utilizing all the data.

33 References 1.Kondratovich, Marina V. (2000). Methodology of removing the effect of confounding variables in receiver operating characteristic (ROC) analysis. Proceedings of the 2000 Joint Statistical Meeting, Biopharmaceutical Section, Indianapolis, IN. 2. Kondratovich, Marina V. (2002). Matched receiver operating characteristic (ROC) analysis and propensity scores. Proceedings of the 2002 Joint Statistical Meeting, Biopharmaceutical Section, New York, NY. 3. Zweig, M.H. and Campbell, G. (1993). Receiver operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39, p. 561-577. The propensity scores technique is well developed in the context of observational studies and studies for the therapeutic devices. In the context of diagnostic studies, however, there has been little papers.

34 Rubin, DB, Estimating casual effects from large data sets using propensity scores. Ann Intern Med 1997; 127:757-763 Grunkemeier, GL and et al, Propensity score analysis of stroke after off-pump coronary artery bypass grafting, Ann Thorac Surg 2002; 74:301-305 Wolfgang, C. and et al, Comparing mortality of elder patients on hemodialysis versus peritoneal dialysis: A propensity score approach, J. Am Soc Nephrol 2002; 13:2353-2362 Rosenbaum, PR, Rubin DB, Reducing bias in observational studies using subclassification on the propensity score. JASA 1984; 79:516-524 Blackstone, EH, Comparing apples and oranges, J. Thoracic and Cardiovascular Surgery, January 2002; 1:8-15 Dagostino, RB, Jr., Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group, Statistics in medicine, 1998,17:2265-2281 References for the propensity scores technique

1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center.

Similar presentations

Presentation on theme: "1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center.

Similar presentations

Presentation on theme: "1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center."— Presentation transcript:

Similar presentations

About project

Feedback