Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

Similar presentations


Presentation on theme: "1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci"— Presentation transcript:

1 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci http://www.cise.ufl.edu

2 2 Gene copy number The number of copies of genes can vary from person to person. –~0.4% of the gene copy numbers are different for pairs of people. Variations in copy numbers can alter resistance to disease –EGFR copy number can be higher than normal in Non-small cell lung cancer. Healthy Cancer Lung images (ALA)

3 3 Comparative Genomic Hybridization (CGH)

4 4 Raw and smoothed CGH data

5 5 Example CGH dataset 862 genomic intervals in the Progenetix database

6 6 Problem description Given a new sample, which class does this sample belong to? Which features should we use to make this decision?

7 7 Outline Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

8 8 SVM in a nutshell Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

9 9 Classification with SVM Consider a two-class, linearly separable classification problem Many decision boundaries! The decision boundary should be as far away from the data of both classes as possible –We should maximize the margin, m Class 1 Class 2 m

10 10 Let {x 1,..., x n } be our data set and let y i  {1,-1} be the class label of x i Maximize J over α i SVM Formulation Similarity between x i and x j Similarity between x i and x j The decision boundary can be constructed as

11 11 SVM for CGH data Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

12 12 Pairwise similarity measures Raw measure –Count the number of genomic intervals that both samples have gain (or loss) at that position. Raw = 3

13 13 SVM based on Raw kernel Using SVM with the Raw kernel amounts to solving the following quadratic program The resulting decision function is Maximize J over α i : Use Raw kernel to replace Use Raw kernel to replace Is this cool?

14 14 Is Raw kernel valid? Not all similarity function can serve as kernel. This requires the underlying kernel matrix M is “positive semi-definite”. M is positive semi-definite if for all vectors v, v T Mv ≥ 0

15 15 Proof: define a function Φ() where –Φ: a  {1, 0, -1} m  b  {1, 0} 2m,where Φ(gain)= Φ(1)= 01 Φ(no-change)= Φ(0)= 00 Φ(loss)= Φ(-1)= 10 –Raw(X, Y) =Φ(X) T Φ(Y) Is Raw kernel valid? X = 0 1 1 0 1 -1 Y = 0 1 0 -1 -1 -1 * * Φ(X) = 0 0 0 1 0 1 0 0 0 1 1 0 Φ(Y) = 0 0 0 1 0 0 1 0 1 0 1 0 * * Raw(X, Y) = 2Φ(X) T Φ(Y) = 2

16 16 Raw Kernel is valid! Raw kernel can be written as Raw(X, Y) =Φ(X) T Φ(Y) Define a 2m by n matrix Therefore, Let M denote the Kernel matrix of Raw

17 17 MIFS algorithm Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

18 18 MIFS for multi-class data One-versus-all SVM [1, 3, 8][1, 2, 31] [3, 4, 12][5, 8, 15] Sort ranks of features [8, 1, 3] [2, 31, 1][12, 4, 3]Ranks of features[5, 15, 8] Feature 1 Feature 2 Feature 3Feature 4 Sort features [1, 3, 8] [1, 2, 31][3, 4, 12][5, 8, 15] Most promising feature. Insert Feature 4 into feature set 1.Feature 8 2.Feature 4 3.Feature 9 4.Feature 33 5.Feature 2 6.Feature 48 7.Feature 27 8.Feature 1 … Contribution High Low

19 19 Results Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

20 20 Dataset Details Data taken from Progenetix database

21 21 Datasets Similarity level #cancers bestgoodfairpoor 2478466351373 41160790800 61100850880810 81000830750760 Dataset size

22 22 Experimental results Comparison of linear and Raw kernel On average, Raw kernel improves the predictive accuracy by 6.4% over sixteen datasets compared to linear kernel.

23 23 Using 80 features results in accuracy that is comparable or better than using all features Experimental results Using 40 features results in accuracy that is comparable to using all features Accuracy Number of Features (Fu and Fu-Liu, 2005) (Ding and Peng, 2005)

24 24 Using MIFS for feature selection Result to test the hypothesis that 40 features are enough and 80 features are better

25 25 A Web Server for Mining CGH Data http://cghmine.cise.ufl.edu:8007/CGH/Default.html

26 26 Thank you

27 27 Appendix

28 28 Minimum Redundancy and Maximum Relevance (MRMR) 0 1 1 0 0 0 0 1 0 0 x1x2x3x4x5x6x1x2x3x4x5x6 Class 1 Features 1 2 3 4 X Y 01 1 Relevance V is defined as the average mutual information between features and class labels Redundancy W is defined as the average mutual information between all pairs of features Incrementally select features by maximizing (V / W) or (V – W)

29 29 Compute the weight vector Support Vector Machine Recursive Feature Elimination (SVM-RFE) Train a linear SVM based on feature set Compute the ranking coefficient w i 2 for the ith feature Remove the feature with smallest ranking coefficient Is feature set empty? N Y

30 30 Pairwise similarity measures Sim measure –Segment is a contiguous block of aberrations of the same type. –Count the number of overlapping segment pairs. Sim = 2

31 31 Non-linear Decision Boundary How to generalize SVM when the two class classification problem is not linearly separable? Key idea: transform x i to a higher dimensional space to “make life easier” –Input space: the space the point x i are located –Feature space: the space of  (x i ) after transformation Input space  ( )  (.)  ( ) Feature space A linear decision boundary can be found! A linear decision boundary can be found!


Download ppt "1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci"

Similar presentations


Ads by Google