Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Similar presentations


Presentation on theme: "Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational."— Presentation transcript:

1 Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational and Applied Mathematics Seminar April 19, 2005

2 Breast Cancer Estimates American Cancer Society & World Health Organization  Breast cancer is the most common cancer among women in the US.  212,930 new cases of breast cancer are estimated by the ACS to occur in the US in 2005: 211,240 in women and 1,690 in men.  40,870 deaths are estimated to occur from breast cancer in the US in 2005: 40,410 among women and 460 among men.  WHO estimates: More than 1.2 million people worldwide were diagnosed with breast cancer in 2001 and 0.5 million died from breast cancer in 2000.

3 Key Objective  Identify breast cancer patients for whom chemotherapy prolongs survival time  Main Difficulty: Cannot carry out comparative tests on human subjects  Similar patients must be treated similarly  Our Approach: Classify patients into: Good, Intermediate & Poor groups such that:  Good group does not need chemotherapy  Intermediate group benefits from chemotherapy  Poor group not likely to benefit from chemotherapy

4 Outline  Tools used  Support vector machines (Linear & Nonlinear SVMs)  Feature selection & classification  Clustering (k-Median algorithm not k-Means)  Cluster into good & intermediate & poor classes  Cluster no-chemo patients into 2 groups: good & poor  Cluster chemo patients into 2 groups : good & poor  Generate three final classes  Good class (Good from no-chemo cluster group)  Poor class (Poor from chemo cluster group)  Intermediate class: Remaining patients (chemo & no-chemo)  Generate survival curves for three classes  Use SSVM to classify new patients into one of above three classes  Data description

5 Cell Nuclei of a Fine Needle Aspirate

6 Thirty Cytological Features Collected at Diagnosis Time

7 Two Histological Features Collected at Surgery Time

8 Breast Cancer Diagnosis Based on 3 FNA Features 97% Ten-fold Cross Validation Corrrectnes 780 Patients: 494 Benign, 286 Maignant Research by Mangasarian,Street, Wolberg

9 1- Norm Support Vector Machines Maximize the Margin between Bounding Planes A+ A-

10 Support Vector Machine Algebra of 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  Membership of each in class +1 or –1 specified by:  An m-by-m diagonal matrix D with +1 & -1 entries  More succinctly: where e is a vector of ones.  Separate by two bounding planes,

11 Feature Selection Using 1-Norm Linear SVM Classification Based on Lymph Node Status  Features selected: 6 out of 31 by above SVM:  Feature selection: 1-norm SVM: s. t. min,, denotes Lymph node > 0 or where Lymph node =0  5 out 30 cytological features that describe nuclear size, shape and texture from fine needle aspirate  Tumor size from surgery

12 Features Selected by Support Vector Machine

13 Nonlinear SVM for Classifying New Patients  Linear SVM: (Linear separating surface: ) (LP) min s.t.  Replace by a nonlinear kernel : min s.t. in the “dual space”, gives: By QP duality:. Maximizing the margin min s.t.

14 The Nonlinear Classifier  The nonlinear classifier:  Where K is a nonlinear kernel, e.g.:  Gaussian (Radial Basis) Kernel :  The -entry of represents “similarity” between the data points and

15 Clustering in Data Mining General Objective  Given: A dataset of m points in n-dimensional real space  Problem: Extract hidden distinct properties by clustering the dataset into k clusters

16 Concave Minimization Formulation of 1-Norm Clustering Problem (k-Median), and a number  Given: Set of m points in represented by the matrix of desired clusters  Objective Function: Sum of m minima of linear functions, hence it is piecewise-linear concave  Difficulty: Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard  Find: Cluster centers that minimize the sum of 1-norm distances of each point: to its closest cluster center.

17 Clustering via Finite Concave Minimization  Equivalent bilinear reformulation: min s.t. min s.t.  Minimize the sum of 1-norm distances between each data point: and the closest cluster center

18 K-Median Clustering Algorithm Finite Termination at Local Solution Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged, else go to Step 1 Step 0 (Initialization): Pick 2 initial cluster centers as medians of:  (L=0 & T<2) & (L 5 or T 4)

19 Feature Selection & Initial Cluster Centers  6 out of 31 features selected by 1-norm SVM ( )  SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0)  Apply k-Median algorithm in 6-dimensional input space  Initial cluster centers used: Medians of Good1 & Poor1  Good1: Patients with Lymph = 0 AND Tumor < 2  Poor1: Patients with Lymph > 4 OR Tumor  Typical indicator for chemotherapy

20 Overall Clustering Process 253 Patients (113 NoChemo, 140 Chemo) Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor Good Poor Intermediate Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Compute Initial Cluster Centers

21 Survival Curves for Good, Intermediate & Poor Groups (Nonlinear SSVM for New Patients)

22 Survival Curves for Intermediate Group: Split by Chemo & NoChemo

23 Survival Curves for Overall Patients: With & Without Chemotherapy

24 Survival Curves for Intermediate Group Split by Lymph Node & Chemotherapy

25 Survival Curves for Overall Patients Split by Lymph Node Positive & Negative

26 Conclusion  Used five cytological features & tumor size to cluster breast cancer patients into 3 groups:  Good – No chemotherapy recommended  Intermediate – Chemotherapy likely to prolong survival  Poor – Chemotherapy may or may not enhance survival  3 groups have very distinct survival curves  First categorization of a breast cancer group for which chemotherapy enhances longevity  SVM- based procedure assigns new patients into one of above three survival groups

27 Talk & Paper Available on Web  www.cs.wisc.edu/~olvi  Y.-J. Lee, O. L. Mangasarian & W. H. Wolberg: “ Computational Optimization and Applications” Volume 25, 2003, pages 151-166”


Download ppt "Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational."

Similar presentations


Ads by Google