Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg Second Annual Review June 1, 2001 Data Mining Institute University of Wisconsin - Madison

American Cancer Society 2001 Breast Cancer Estimates  Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer)  192,200 new cases of breast cancer in women will be diagnosed in the United States  40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States  According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide

Key Objective  Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time  Main Difficulty: Cannot carry out comparative tests on human subjects  Similar patients must be treated similarly  Our Approach: Classify patients into: Good, Intermediate & Poor groups  Classification based on: 5 cytological features plus Tumor size  Classification criteria: Tumor size & Lymph node status

Principal Results For 253 Breast Cancer Patients  All 69 patients in the Good group:  Had the best survival rate  Had no chemotherapy  All 73 patients in the Poor group:  Had the worst survival rate  Had chemotherapy  For the 121 patients in the Intermediate group:  The 67 patients who had chemotherapy had better survival rate than:  The 44 patients who did not have chemotherapy  Last result reverses role of chemotherapy for both the overall population as well as the Good & Poor groups

Outline  Tools used  Support vector machines (SVMs).  Feature selection  Classification  Clustering  k-Median (k-Mean fails!)  Cluster chemo patients into chemo-good & chemo-poor  Cluster no-chemo patients into no-chemo-good & no-chemo-poor  Three final classes  Good = No-chemo good  Poor = Chemo poor  Intermediate = Remaining patients  Generate survival curves for three classes  Use SVM to classify new patients into one of above three classes

Support Vector Machines Used in this Work  6 out of 31 features selected:  Feature selection: SVM with 1-norm approach, s. t. min,, denotes Lymph node > 0 or where Lymph node =0  Classification: Use SSVMs with Gaussian kernel  5 out 30 cytological features describe nuclear size, shape and texture  Tumor size

Clustering in Data Mining General Objective  Given: A dataset of m points in n-dimensional real space  Problem: Extract hidden distinct properties by clustering the dataset

Concave Minimization Formulation of Clustering Problem, and a number  Given: Set of m points in represented by the matrix of desired clusters  Problem: Determine centers,insuch that the sum of the minima over of the 1-norm distance between each point,, and cluster centers,is minimized  Objective: Sum of m minima of linear functions, hence it is piecewise-linear concave  Difficulty: Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard

Clustering via Concave Minimization  Reformulation: min s.t. min s.t.  Minimize the sum of 1-norm distances between each data point: and the closest cluster center

Finite K-Median Clustering Algorithm (Minimizing Piecewise-linear Concave Function) Step 0 (Initialization): Given k initial cluster centers  Different initial centers will lead to different clusters Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged, else go to Step 1

Clustering Process: Feature Selection & Initial Cluster Centers  6 out of 31 features selected by a linear SVM ( )  SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0)  Perform k-Median algorithm in 6-dimensional feature space  Initial cluster centers used: Medians of Good1 & Poor1  Good1: Patients with Lymph = 0 AND Tumor < 2  Poor1: Patients with Lymph > 4 OR Tumor  Typical indicator for chemotherapy

Clustering Process 253 Patients (113 NoChemo, 140 Chemo) Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor Good Poor Intermediate Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Compute Initial Cluster Centers

Survival Curves for Good, Intermediate & Poor Groups

Survival Curves for Intermediate Group: Split by Chemo & NoChemo

Survival Curves for All Patients Split by Chemo & NoChemo

Survival Curves for Intermediate Group Split by Lymph Node & Chemotherapy

Survival Curves for All Patients Split by Lymph Node Positive & Negative

Nonlinear SVM Classifier 82.7% Tenfold Test Correctness Good2: Good & ChemoGood Poor2: NoChemoPoor & Poor Compute LI(x) & CI(x) Compute LI(x) & CI(x) SVM Good Intermediate Good Poor Intermediate (ChemoGood) Intermediate (NoChemoPoor) Four groups from the clustering result: SVM Poor Intermediate SVM

Conclusion  Used five feature from a fine needle aspirate & tumor size to cluster breast cancer patients into 3 groups:  Good – No chemotherapy recommended  Intermediate – Chemotherapy likely to prolong survival  Poor – Chemotherapy may or may not enhance survival  3 groups have very distinct survival curves  First categorization of a breast cancer group for which chemotherapy enhances longevity  Prescribe a SVM classification procedure to classify new patients into one of above three groups

Simplest Support Vector Machine Linear Surface Maximizing the Margin A+ A-

Key Objective  Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time  Main Difficulty: Cannot carry out comparative tests on human subjects  Similar patients must be treated similarly  Our Approach: Classify patients into: good, intermediate & poor groups  Characterize classes by: Tumor size & lymph node status  Classification based on: 5 cytological features plus tumor size

Clustering Process: Feature Selection & Initial Cluster Centers  6 out of 31 features selected by a linear SVM  SVM separating lymph node positive (Lymph>0) from lymph node negative (Lymph=0)  Clustering performed in 6-dimensional feature space  Initial cluster centers used:  Good: Median in 6-dimensional space of patients with Lymph=0 AND Tumor <2  Poor: Median in 6-dimensional space of patients with of Lymph>4 OR Tumor >4  Typical indicator for chemotherapy

Conclusion  By using five features from a fine needle aspirate & tumor size, breast cancer patients can be classified into 3 classes  Good – Requiring no chemotherapy  Intermediate – Chemotherapy recommended for longer survival  Poor – Chemotherapy may or may not enhance survival  3 classes have very distinct survival curves  First categorization of a breast cancer group for which chemotherapy enhances longevity

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J.

Similar presentations

Presentation on theme: "Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J.

Similar presentations

Presentation on theme: "Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J."— Presentation transcript:

Similar presentations

About project

Feedback