Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K.

Similar presentations


Presentation on theme: "1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K."— Presentation transcript:

1 1 Active learning based survival regression for censored data Bhanukiran Vinzamuri (bhanukiranv@wayne.edu) Yan Li (rockli_yan@wayne.edu) Chandan K. Reddy(reddy@cs.wayne.edu)

2 2 Index of presentation Introduction Problem Description Cox Regression Regularized Cox Regression Coordinate Majorization Descent solver for EN-COX. KEN-COX Model discriminative Gradient based Sampling Active learning based Regularized Cox Regression Experimental Evaluation Conclusions and Future Work References

3 3 Introduction Censored data are observed in many real world applications such as clinical health informatics, genomic analysis and finance applications. Mining censored data poses a unique challenge to the data mining community due to the fact that standard regression models cannot be directly applied on them. Censored data consists of two different entries associated with the feature vector. A time-to-event variable to represent the time. A binary indicator variable for representing the censored status.

4 4 Problem description Observing the trends of 10 patients for 12 days post their discharge from their index hospitalization. Patients (2,3,5) have the event of interest observed. Patients (1,8,9) do not have the event of interest before the end of the 12 th day. Finally, patients (4,7) dropout from the study.

5 5 Notations used in this paper NameDescription Xn x m matrix of feature vectors. Tn x 1 vector of failure times. Knumber of unique failure times. δn x 1 binary vector of censored status. β m x 1 regression coefficient vector L(β)L(β)partial log-likelihood h(t|X)conditional hazard probability base hazard rate base survival rate S(t|X)conditional survival probability Kecolumn wise kernel matrix

6 6 Cox regression Cox models the effect of covariates on the hazard rate but leaves the baseline hazard rate unspecified. It does NOT assume knowledge of absolute risk and estimates relative rather than absolute risk. It uses the proportional hazards assumption which states that the hazard for any individual is a fixed proportion of the hazard for any other individual.

7 7 Mathematical formulation of Cox regression

8 8 Hazard probabilities predicted for EHR data

9 9 Regularized Cox Regression

10 10 CMD solver for EN-COX

11 11 Regularized Cox Regression Algorithm

12 12 Extending solver for KEN-COX

13 13 Model discriminative gradient based sampling

14 14 Active Regularized Cox Regression (ARC)

15 15 Flowchart for ARC with KEN-COX

16 16 Metrics for evaluation in survival analysis

17 17 Experimental Setup We conduct experiments to evaluate our ARC framework on EHR data from Henry Ford Hospital Detroit, Michigan. In addition, publicly available censored datasets are also used for our evaluation. Time-to-event (30 day readmission) values are calculated using the prior admission and discharge dates and patients are right censored using the 30 day readmission study period. For other survival datasets right censoring information is inherently provided. We also generate synthetic datasets using the normal distribution to generate the feature vectors and a Weibull distribution to generate the synthetic response times.

18 18 # Instances, # Features and Active Learning Sampling Size in Dataset Dataset#Inst#FeatTrain(Samp Size) Breast68610100(20) Colon3111950(10) PBC88815200(20) EHR 1567598500(100) EHR 2437998500(100) EHR 3354398500(100) EHR 4282698500(50) Syn150015100(15) Syn250050100(15) Syn31005050(1)

19 19 Experimental Results DatasetL-COXEN-COXC-BoostRSFGBCIARC-LARC- EN ARC- KEN Breast0.610.630.670.680.690.650.68560.734 Colon0.6510.650.620.600.640.7380.7350.859 PBC0.7350.7590.860.8630.790.810.8250.862 EHR 10.540.550.590.580.590.600.640.671 EHR 20.560.58220.600.610.6010.660.680.71 EHR 30.5330.5530.59 0.580.5750.580.601 EHR 40.540.550.580.5690.560.5850.5810.645 Syn10.590.6280.600.610.5890.78230.8380.92 Syn20.8010.8150.860.940.930.860.8670.921 Syn30.670.6880.64 0.6640.730.780.81

20 20 DatasetARC(LASSO)ARC(EN)ARC(KEN) Breast Colon PBC EHR 1 EHR 2 EHR 3 EHR 4 Syn1 Syn2 Syn3

21 21 Active learning curves Breast EHR 2 EHR 1 Colon

22 22 Active learning curves contd.. PBC Synthetic1 Synthetic3 Synthetic2

23 23 Conclusions and Future Work We present an active learning extension to the cox regression framework using a novel model discriminative gradient based sampling procedure. The proposed method uses a fast and scalable optimization method to converge efficiently. Experimental results on EHR data and censored datasets indicates that the proposed models have good discriminative ability and outperform other competing survival regression methods. Future work includes studying extending this active learning model to transfer learning using regularized cox regression and accelerated failure time models.

24 24 References J. P. Klein and M. L. Moeschberger. Survival analysis:techniques for censored and truncated data. Springer,2003. P. Sasieni. Cox regression model. Encyclopedia of Biostatistics, 1999. N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for cox proportional hazards model via coordinate descent. Journal of Statistical Software, 39(5):1–13, 2011. B. Vinzamuri and C. K. Reddy. Cox regression with correlation based regularization for electronic health records. In Proceedings of the IEEE International Conference on Data Mining (ICDM), pages 757–767.IEEE, 2013. Y. Chen, Z. Jia, D. Mercola, and X. Xie. A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and mathematical methods in medicine, 2013. B. Settles. Active learning literature survey. University of Wisconsin, Madison, 2010. Y. Yang and H. Zou. A Cocktail Algorithm for Solving The Elastic Net Penalized Cox Regression in High Dimensions. Statistics and Its Interface, 2012. http://dmkd.cs.wayne.edu/survival


Download ppt "1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K."

Similar presentations


Ads by Google