1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Classification / Regression Support Vector Machines

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Pattern Recognition and Machine Learning

An Introduction of Support Vector Machine

Support Vector Machines

SVM—Support Vector Machines

Support vector machine

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Classification and Decision Boundaries

Discriminative and generative methods for bags of features

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.

Reduced Support Vector Machine

Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.

A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.

SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

An Introduction to Support Vector Machines Martin Law.

An Evaluation of Gene Selection Methods for Multi-class Microarray Data Classification by Carlotta Domeniconi and Hong Chai.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

A presentation on the topic For CIS 595 Bioinformatics course

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Support Vector Machines Tao Department of computer science University of Illinois.

1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College.

CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.

SVMs in a Nutshell.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Neural networks and support vector machines

PREDICT 422: Practical Machine Learning

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

An Introduction to Support Vector Machines

CS 2750: Machine Learning Support Vector Machines

COSC 4335: Other Classification Techniques

Generally Discriminant Analysis

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

2 Gene copy number The number of copies of genes can vary from person to person. –~0.4% of the gene copy numbers are different for pairs of people. Variations in copy numbers can alter resistance to disease –EGFR copy number can be higher than normal in Non-small cell lung cancer. Healthy Cancer Lung images (ALA)

3 Comparative Genomic Hybridization (CGH)

4 Raw and smoothed CGH data

5 Example CGH dataset 862 genomic intervals in the Progenetix database

6 Problem description Given a new sample, which class does this sample belong to? Which features should we use to make this decision?

7 Outline Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

8 SVM in a nutshell Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

9 Classification with SVM Consider a two-class, linearly separable classification problem Many decision boundaries! The decision boundary should be as far away from the data of both classes as possible –We should maximize the margin, m Class 1 Class 2 m

10 Let {x 1,..., x n } be our data set and let y i  {1,-1} be the class label of x i Maximize J over α i SVM Formulation Similarity between x i and x j Similarity between x i and x j The decision boundary can be constructed as

11 SVM for CGH data Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

12 Pairwise similarity measures Raw measure –Count the number of genomic intervals that both samples have gain (or loss) at that position. Raw = 3

13 SVM based on Raw kernel Using SVM with the Raw kernel amounts to solving the following quadratic program The resulting decision function is Maximize J over α i : Use Raw kernel to replace Use Raw kernel to replace Is this cool?

14 Is Raw kernel valid? Not all similarity function can serve as kernel. This requires the underlying kernel matrix M is “positive semi-definite”. M is positive semi-definite if for all vectors v, v T Mv ≥ 0

15 Proof: define a function Φ() where –Φ: a  {1, 0, -1} m  b  {1, 0} 2m,where Φ(gain)= Φ(1)= 01 Φ(no-change)= Φ(0)= 00 Φ(loss)= Φ(-1)= 10 –Raw(X, Y) =Φ(X) T Φ(Y) Is Raw kernel valid? X = Y = * * Φ(X) = Φ(Y) = * * Raw(X, Y) = 2Φ(X) T Φ(Y) = 2

16 Raw Kernel is valid! Raw kernel can be written as Raw(X, Y) =Φ(X) T Φ(Y) Define a 2m by n matrix Therefore, Let M denote the Kernel matrix of Raw

17 MIFS algorithm Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

18 MIFS for multi-class data One-versus-all SVM [1, 3, 8][1, 2, 31] [3, 4, 12][5, 8, 15] Sort ranks of features [8, 1, 3] [2, 31, 1][12, 4, 3]Ranks of features[5, 15, 8] Feature 1 Feature 2 Feature 3Feature 4 Sort features [1, 3, 8] [1, 2, 31][3, 4, 12][5, 8, 15] Most promising feature. Insert Feature 4 into feature set 1.Feature 8 2.Feature 4 3.Feature 9 4.Feature 33 5.Feature 2 6.Feature 48 7.Feature 27 8.Feature 1 … Contribution High Low

19 Results Support Vector Machine (SVM) SVM for CGH data Maximum Influence Feature Selection algorithm Results

20 Dataset Details Data taken from Progenetix database

21 Datasets Similarity level #cancers bestgoodfairpoor Dataset size

22 Experimental results Comparison of linear and Raw kernel On average, Raw kernel improves the predictive accuracy by 6.4% over sixteen datasets compared to linear kernel.

23 Using 80 features results in accuracy that is comparable or better than using all features Experimental results Using 40 features results in accuracy that is comparable to using all features Accuracy Number of Features (Fu and Fu-Liu, 2005) (Ding and Peng, 2005)

24 Using MIFS for feature selection Result to test the hypothesis that 40 features are enough and 80 features are better

25 A Web Server for Mining CGH Data

26 Thank you

27 Appendix

28 Minimum Redundancy and Maximum Relevance (MRMR) x1x2x3x4x5x6x1x2x3x4x5x6 Class 1 Features X Y 01 1 Relevance V is defined as the average mutual information between features and class labels Redundancy W is defined as the average mutual information between all pairs of features Incrementally select features by maximizing (V / W) or (V – W)

29 Compute the weight vector Support Vector Machine Recursive Feature Elimination (SVM-RFE) Train a linear SVM based on feature set Compute the ranking coefficient w i 2 for the ith feature Remove the feature with smallest ranking coefficient Is feature set empty? N Y

30 Pairwise similarity measures Sim measure –Segment is a contiguous block of aberrations of the same type. –Count the number of overlapping segment pairs. Sim = 2

31 Non-linear Decision Boundary How to generalize SVM when the two class classification problem is not linearly separable? Key idea: transform x i to a higher dimensional space to “make life easier” –Input space: the space the point x i are located –Feature space: the space of  (x i ) after transformation Input space  ( )  (.)  ( ) Feature space A linear decision boundary can be found! A linear decision boundary can be found!