Presentation is loading. Please wait.

Presentation is loading. Please wait.

On ranking in survival analysis: Bounds on the concordance index

Similar presentations


Presentation on theme: "On ranking in survival analysis: Bounds on the concordance index"— Presentation transcript:

1 On ranking in survival analysis: Bounds on the concordance index
Vikas C. Raykar | Harald Steck | Balaji Krishnapuram CAD & Knowledge Solutions (IKM CKS), Siemens Medical Solutions USA, Inc., Malvern, USA Cary Dehing-Oberije | Philippe Lambin Maastro clinic, University Hospital Maastricht, University Maastricht-GROW, The Netherlands NIPS 2007

2 Organization Motivation Brief review of survival analysis
Concordance index Our proposed ranking approach Connections to survival analysis Results

3 Motivation: Personalized medicine
Predict survival time of lung cancer patients. Different kinds of treatment Chemo/radiotherapy dosage Survival time Different patient characteristics Age/gender/health Dataset available from MAASTRO hospital our collaborator.

4 Why not use regression? Not amenable to standard statistical/ machine learning methods due to censored data. Well studied in statistics as survival analysis. Give

5 Review: Survival Analysis
Branch of statistics that deals with time until the occurrence of a event When did a patient die ? When did the disease manifest? When did the machine fail? Given humidity and temperature predict the amount of snowfall Widely used in medical statistics, epidemiology, reliability engineering, economics, sociology, marketing, insurance, etc.

6 What is censored data? Start of the study Data collected at this time
At the end of the study a lot of patients may still survive. 2001 TIME Start of the study Data collected at this time Patient unavailable for follow-up Censored Data Some patients die during the study period. End of study A patient may move to a different town and thus be no longer available for follow-up. For such cases the exact survival time may be longer than the observation period. Patient Death 2005 The exact survival time may be longer than the observation period

7 Censoring provides only partial information
Typically a large portion of the data is censored. Observed Data Given humidity and temperature predict the amount of snowfall Survival Time Censored Data

8 Notation: Survival analysis
Given humidity and temperature predict the amount of snowfall

9 Proportional Hazard (PH) Model
Has become a standard model for studying the effect of covariates on survival time distributions. unknown regression parameters relative hazard function Given humidity and temperature predict the amount of snowfall Baseline hazard function covariate Parameter estimates for PH model are obtained by maximizing Cox’s partial likelihood.

10 Concordance Index or c-index
Standard performance measure for model assessment in survival analysis. Generalization of the area under the ROC curve to regression problems/censored data. Fraction of all pairs of subjects who's survival times can be ordered such that the subject with higher predicted survival is the one who actually survived longer. Given humidity and temperature predict the amount of snowfall

11 Concordance Index-no censoring
5 1 5 4 Survival time 2 3 covariate 4 3 2 C=1 perfect prediction accuracy C=0.5 as good as a random predictor 1

12 Concordance Index-with censoring
5 5 Survival time 4 4 3 Given humidity and temperature predict the amount of snowfall 3 2 1 No arrow can go above a censored point 2 1 Censored

13 Proposed approach: Maximize CI directly
While CI is widely used to evaluate a learnt model, it is not generally used as an objective function for training. CI is invariant to monotone transformation of the survival times. Hence the model learnt by maximizing the CI is a ranking function. (N-partite ranking problem) Given humidity and temperature predict the amount of snowfall

14 Lower bounds on the CI Discrete optimization problem
Use a differentiable concave lower bound Given humidity and temperature predict the amount of snowfall Related to the PH model

15 Maximize lower bounds on the CI
Linear ranking functions Given humidity and temperature predict the amount of snowfall Regularization Use gradient based methods to maximize this

16 Connection to the PH model
Log-likelihood for correct ranking For a proportional hazard model we can show that Given humidity and temperature predict the amount of snowfall This is a common assumption made in ranking literature. We have shown that if we use PH models this is exactly the case.

17 Penalized log-likelihood
Compare this with the objective function using the lower bound approach Given humidity and temperature predict the amount of snowfall

18 Cox partial likelihood
Our proposed method explicitly maximizes a lower bound. Cox method maximizes partial likelihood. Experimental results indicate that both do well. Conjecture: Is Cox’s partial likelihood also a lower bound on the CI? Given humidity and temperature predict the amount of snowfall

19 Cox partial likelihood (cont.)
Given humidity and temperature predict the amount of snowfall

20 Results Proposed method slightly better than Cox-PH.
However differences not significant. Given humidity and temperature predict the amount of snowfall

21 Thank You ! | Questions ?


Download ppt "On ranking in survival analysis: Bounds on the concordance index"

Similar presentations


Ads by Google