Presentation is loading. Please wait.

Presentation is loading. Please wait.

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Similar presentations


Presentation on theme: "Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg"— Presentation transcript:

1 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg Data Mining Institute University of Wisconsin - Madison

2 Breast Cancer Estimates American Cancer Society & World Health Organization
Breast cancer is the most common cancer among women in the United States. 212,600 new cases of breast cancer will be diagnosed in the United States in 2003: 211,300 in women, 1,300 in men 40,200 deaths will occur from breast cancer in the United States in 2003: 39,800 in women, 400 in men WHO estimates: More than 1.2 million people worldwide were diagnosed with breast cancer in 2001 and 0.5 million died from breast cancer in 2000.

3 Key Objective tests on human subjects
Identify breast cancer patients for whom chemotherapy prolongs survival time Main Difficulty: Cannot carry out comparative tests on human subjects Similar patients must be treated similarly Our Approach: Classify patients into: Good, Intermediate & Poor groups such that: Good group does not need chemotherapy Intermediate group benefits from chemotherapy Poor group not likely to benefit from chemotherapy

4 Outline Data description Tools used
Support vector machines (Linear & Nonlinear SVMs) Feature selection & classification Clustering (k-Median algorithm not k-Means) Cluster into chemo & no-chemo groups Cluster chemo patients into 2 groups: good & poor Cluster no-chemo patients into 2 groups: good & poor Merge into three final classes Good (No-chemo) Poor (Chemo) Intermediate : Remaining patients (chemo & no-chemo) Generate survival curves for three classes Use SSVM to classify new patients into one of above three classes

5 Cell Nuclei of a Fine Needle Aspirate

6 Thirty Cytological Features Collected at Diagnosis Time

7 Two Histological Features Collected at Surgery Time

8 Features Selected by Support Vector Machine

9 1- Norm Support Vector Machines Maximize the Margin between Bounding Planes

10 Support Vector Machine Algebra of 2-Category Linearly Separable Case
Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by: An m-by-m diagonal matrix D with +1 & -1 entries Separate by two bounding planes, More succinctly: where e is a vector of ones.

11 Feature selection: 1-norm SVM: min
Feature Selection Using 1-Norm Linear SVM Classification Based on Lymph Node Status Feature selection: 1-norm SVM: s. t. min , , denotes Lymph node > 0 or where Lymph node =0 Features selected: 6 out of 31 by above SVM: 5 out 30 cytological features that describe nuclear size, shape and texture from fine needle aspirate Tumor size from surgery

12 Nonlinear SVM for Classifying New Patients
Linear SVM: (Linear separating surface: ) (LP) min s.t. in the “dual space” , gives: By QP duality: . Maximizing the margin min s.t. Replace by a nonlinear kernel : min s.t.

13 The Nonlinear Classifier
Where K is a nonlinear kernel, e.g.: Gaussian (Radial Basis) Kernel : The -entry of represents “similarity” between the data points and

14 Clustering in Data Mining
General Objective Given: A dataset of m points in n-dimensional real space Problem: Extract hidden distinct properties by clustering the dataset into k clusters

15 Concave Minimization Formulation of 1-Norm Clustering Problem (k-Median)
, and a number Given: Set of m points in represented by the matrix of desired clusters Find: Cluster centers that minimize the sum of 1-norm distances of each point: to its closest cluster center. Objective Function: Sum of m minima of linear functions, hence it is piecewise-linear concave Difficulty: Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard

16 Clustering via Finite Concave Minimization
Minimize the sum of 1-norm distances between each data point : and the closest cluster center Equivalent bilinear reformulation: min s.t.

17 K-Median Clustering Algorithm Finite Termination at Local Solution
Step 0 (Initialization): Pick 2 initial cluster centers (L=0 & T<2) & (L 5 or T 4) Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged, else go to Step 1

18 Feature Selection & Initial Cluster Centers
6 out of 31 features selected by 1-norm SVM ( ) SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0) Perform k-Median algorithm in 6-dimensional input space Initial cluster centers used: Medians of Good1 & Poor1 Good1: Patients with Lymph = 0 AND Tumor < 2 Poor1: Patients with Lymph > 4 OR Tumor Typical indicator for chemotherapy

19 Overall Clustering Process
253 Patients (113 NoChemo, 140 Chemo) Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Poor1: Lymph>=5 OR Tumor>=4 Compute Initial Cluster Centers Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor Good Intermediate Poor

20 Survival Curves for Good, Intermediate & Poor Groups (Classified by Nonlinear SSVM)

21 Survival Curves for Intermediate Group: Split by Chemo & NoChemo

22 Survival Curves for Overall Patients: With & Without Chemotherapy

23 Survival Curves for Intermediate Group Split by Lymph Node & Chemotherapy

24 Survival Curves for Overall Patients Split by Lymph Node Positive & Negative

25 Conclusion Used five cytological features & tumor size to cluster
breast cancer patients into 3 groups: Good – No chemotherapy recommended Intermediate – Chemotherapy likely to prolong survival Poor – Chemotherapy may or may not enhance survival 3 groups have very distinct survival curves First categorization of a breast cancer group for which chemotherapy enhances longevity SVM- based procedure assigns new patients into one of above three survival groups

26 Talk & Paper Available on Web
Y.-J. Lee, O. L. Mangasarian & W. H. Wolberg: “Computational Optimization and Applications” Volume 25, 2003, pages ”


Download ppt "Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg"

Similar presentations


Ads by Google