Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J. Lee & O. L. Mangasarian Second Annual Review June 1, 2001 Data Mining Institute University of Wisconsin - Madison

American Cancer Society 2001 Breast Cancer Estimates  Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer)  192,200 new cases of breast cancer in women will be diagnosed in the United States  40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States  According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide

Key Objective  Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time  Main Difficulty: Cannot carry out comparative tests on human subjects  Similar patients must be treated similarly  Our Approach: Classify patients into: good, intermediate & poor groups  Characterize classes by: Tumor size & lymph node status  Classification based on: 5 cytological features plus tumor size

Principal Results For 253 Breast Cancer Patients  All 69 patients in the good group:  Had no chemotherapy  Had the best survival rate  All 73 patients in the poor group:  Had chemotherapy  Had the worst survival rate  For the 121 patients in the intermediate group:  The 67 patients who had chemotherapy had better survival rate than:  The 44 patients who did not have chemotherapy  Last result reverses role of chemotherapy for both the overall population as well as the good & poor groups

Outline  Tools used  Support vector machines (SVMs).  Feature selection  Classification  Clustering  k-Median (k-Mean fails!)  Cluster chemo patients into chemo-good & chemo-poor  Cluster no-chemo patients into no-chemo-good & no-chemo-poor  Three final classes  Good = No-chemo good  Poor = Chemo poor  Intermediate = Remaining patients  Generate survival curves for three classes  Use SVM to classify new patients into one of above three classes

Simplest Support Vector Machine Linear Surface Maximizing the Margin A+ A-

Clustering in Data Mining General Objective  Given: A dataset of m points in n-dimensional real space  Problem: Extract hidden distinct properties by clustering the dataset

Concave Minimization Formulation of Clustering Problem, and a number  Given: Set of m points in represented by the matrix of desired clusters  Problem: Determine centers,insuch that the sum of the minima over of the 1-norm distance between each point,, and cluster centers,is minimized  Objective: Sum of m minima of linear functions, hence it is piecewise-linear concave  Difficulty: Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard

Clustering via Concave Minimization  Reformulation: min s.t. min s.t.  Minimize the sum of 1-norm distances between each data point: and the closest cluster center

Finite K-Median Clustering Algorithm (Minimizing Piecewise-linear Concave Function) Step 0 (Initialization): Given k initial cluster centers  Different initial centers will lead to different clusters Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged, else go to Step 1

Clustering Process: Feature Selection & Initial Cluster Centers  6 out of 31 features selected by a linear SVM  SVM separating lymph node positive (Lymph>0) from lymph node negative (Lymph=0)  Clustering performed in 6-dimensional feature space  Initial cluster centers used:  Good: Median in 6-dimensional space of patients with Lymph=0 AND Tumor <2  Poor: Median in 6-dimensional space of patients with of Lymph>4 OR Tumor >4  Typical indicator for chemotherapy

Clustering Process 253 Patients Intermediate1: (0<Lymph<5 & Tumor<4) OR (Lymph<5 & 2<=Tumor<4) Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Cluster 113 NoChemo Patients 69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor Good Poor Intermediate Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Cluster 140 Chemo Patients Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features

Survival Curves for Good, Intermediate & Poor Groups

Survival Curves for Intermediate Group: Split by Chemo & NoChemo

Survival Curves for All Patients Split by Chemo & NoChemo

Survival Curves for Intermediate Group Split by Lymph Node & Chemotherapy

Survival Curves for All Patients Split by Lymph Node Positive & Negative

Nonlinear SVM Classifier 82.7% Tenfold Test Correctness Good Poor ChemoGood NoChemoPoor SVM Not Poor Not Good Good2: Good & ChemoGood Poor2: NoChemoPoor & Poor Compute LI(x) & CI(x) Compute LI(x) & CI(x) SVM Good Intermediate SVM Poor Intermediate

Conclusion  By using five features from a fine needle aspirate & tumor size, breast cancer patients can be classified into 3 classes  Good – Requiring no chemotherapy  Intermediate – Chemotherapy recommended for longer survival  Poor – Chemotherapy may or may not enhance survival  3 classes have very distinct survival curves  First categorization of a breast cancer group for which chemotherapy enhances longevity

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J.

Similar presentations

Presentation on theme: "Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J.

Similar presentations

Presentation on theme: "Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J."— Presentation transcript:

Similar presentations

About project

Feedback