Evaluation of Techniques for Classifying Biological Sequences Authors: Mukund Deshpande and George Karypis Speaker: Sarah Chan CSIS DB Seminar May 31,

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Word Spotting DTW.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Discriminative and generative methods for bags of features
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Active Learning with Support Vector Machines
Support Vector Machines Kernel Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Masquerade Detection Mark Stamp 1Masquerade Detection.
This week: overview on pattern recognition (related to machine learning)
Efficient Model Selection for Support Vector Machines
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
John Lafferty Andrew McCallum Fernando Pereira
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Hidden Markov Models BMI/CS 576
Learning to Align: a Statistical Approach
CS 9633 Machine Learning Support Vector Machines
Support Feature Machine for DNA microarray data
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines
Machine Learning Basics
Learning with information of features
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Generalizations of Markov model to characterize biological sequences
Generally Discriminant Analysis
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Evaluation of Techniques for Classifying Biological Sequences Authors: Mukund Deshpande and George Karypis Speaker: Sarah Chan CSIS DB Seminar May 31, 2002

Presentation Outline  Introduction  Traditional Approaches (kNN, Markov Models) to Sequence Classification  Feature Based Sequence Classification  Experimental Evaluation  Conclusions

Introduction  The amount of biological sequences available in public databases is increasing exponentially GenBank: 16 billion DNA base-pairs PIR: over 230,000 protein sequences  Strong sequence similarity often translates to functional and structural relations  Classification algorithms applied on sequence data can be used to gain valuable insights on functions and relations of sequences E.g. to assign a protein sequence to a protein family

Introduction  K-nearest neighbor, Markov models and Hidden Markov models have been extensively used They have considered the sequential constraints present in datasets  Motivation: Few attempts to use traditional machine learning classification algorithms such as decision trees and support vector machines They were thought of not being able to model sequential nature of datasets

Focus of This Paper  To evaluate some widely used sequence classification algorithms K-nearest neighbor Markov models  To develop a framework to model sequences such that traditional machine learning algorithms can be easily applied Represent each sequence as a vector in a derived feature space, and then use SVMs to build a sequence classifier

Problem Definition - Sequence Classification  A sequence S r = {x 1, x 2, x 3,.. x l } is an ordered list of symbols  The alphabet  for symbols: known in advance and of fixed size N  Each sequence S r has a class label C r  Assumption: Two class labels only (C +, C - )  Goal: To correctly assign a class label to a test sequence

Approach 1: K Nearest Neighbor (KNN) Classifiers  To classify a test sequence S r Locate K training sequences being most similar to S r Assign to S r the class label which occurs the most in those K sequences  Key task: to compute similarity between two sequences

Approach 1: K Nearest Neighbor (KNN) Classifiers  Alignment score as similarity function Compute an optimal alignment between two sequences (by dynamic programming, hence computationally expensive), and then Score this alignment: the score is a function of the no. of matched and unmatched symbols in the alignment

Approach 1: K Nearest Neighbor (KNN) Classifiers  Two variations Global alignment score  Align sequences across their entire length  Can capture position specific patterns  Need to be normalized due to varying sequence lengths Local alignment score  Only portions of two sequences are aligned  Can capture small substrings of symbols which are present in the two sequences but not necessarily at the same position

Approach 2.1: Simple Markov Chain Classifiers  To build a simple Markov chain based classification model Partition training sequences according to class labels Build a simple Markov chain (M) for each smaller dataset  To classify a test sequence S r Compute the likelihood of S r being generated by each Markov chain M, i.e. P(S r | M) Assigns to S r the class label associated with the Markov chain that gives the highest likelihood

Approach 2.1: Simple Markov Chain Classifiers  Log-likelihood ratio (for two class problems):  If L(S r )  0, then C r  C + else C r  C -  Markov principle (for 1 st order Markov chain): each symbol in a sequence depends only on its preceding symbol, so

Approach 2.1: Simple Markov Chain Classifiers  Transition probability  xi-1, xi = P(x i | x i -1 )  Each symbol is associated with a state  A Transition Probability Matrix (TPM) is built for each class

Approach 2.1: Simple Markov Chain Classifiers  Example

Approach 2.1: Simple Markov Chain Classifiers  Higher (k th ) order Markov chain Transition probability for a symbol x l is computed by looking at its k preceding symbols No. of states = N k, each associated with a sequence of k symbols Size of TPM = N k+1 (N k rows x N columns) Pros: Better classification accuracy since they capture longer ordering constraints Cons: No. of states grow exponentially with the order  many infrequent states  poor probability estimates

Approach 2.2: Interpolated Markov Models (IMM)  Build a series of Markov chains starting from the 0 th order up to the k th order  Transition probability for a symbol P(x i |x i-1, x i-2,.., x 1, IMM k ) = sum of weighted transition probabilities of the different order chains from 0 th order up to k th order Weights: Often based on distribution of different states in various order Markov models The right method appears to be dataset dependent

Approach 2.3: Selective Markov Models (SMM)  Build various order Markov chains  Prune non-discriminatory states from higher order chains (will explain how)  Conditional probability P(x i |x i-1, x i-2,.., x 1, SMM k ) is the probability corresponding to highest order chain among remaining states

Approach 2.3: Selective Markov Models (SMM)  Key task: to decide which states are non- discriminatory  Simplest way: use a frequency threshold and prune all states which occur less than it  Method used in experiment: Specify frequency threshold as a parameter  A state-transition pair is kept only if it occurs  times more frequently than its expected frequency, when uniform distribution is assumed

Approach 3: Feature Based Sequence Classification  Sequences are modeled into a form that can be used by traditional machine learning algorithms  Extraction of features that take sequential nature of sequences into account  Motivated by Markov models, support vector machines (SVMs) are used

Approach 3: Feature Based Sequence Classification  SVM A relatively new learning algorithm by Vapnik (1995) Objective: Given a training set in a vector space, find the best hyperplane (with max. margin) that separates two classes Approach: Formulate a constrained optimization problem, then solve it using constrained quadratic programming (QP) Well-suited for high dimensional data Require lots of memory and CPU time

Approach 3: Feature Based Sequence Classification  SVM – Maximum margin (a) A separating hyperplane with a small margin. (b) A separating hyperplane with a larger margin. A better generalization is expected from (b).

Approach 3: Feature Based Sequence Classification  SVM – Feature space mapping Mapping data into a higher dimensional feature space (by using kernel functions) where they are linearly separable.

Approach 3: Feature Based Sequence Classification  Vector space view (simple 1 st order Markov chain) is equivalent to L(S y ) = u t w u and w are of length N 2, each dimension corresponds to a unique pair of symbols  Element in u: frequency of a particular sequence  Element in w: log-ratio of conditional probabilities for + and – classes)

Approach 3: Feature Based Sequence Classification  Vector space view - Example (simple 1 st order Markov chain)

Approach 3: Feature Based Sequence Classification  Vector space view All variants of Markov chains described previously can be transformed in a similar manner  Dimensionality of new space: For higher order Markov chains: N k+1 For IMM: N + N N k+1 For SMM: no. of non-pruned states Each sequence is viewed as a frequency vector Allows the use of any traditional classifier that operates on objects represented in multi- dimensional vectors

Experimental Evaluation  5 different datasets, each with 2-3 classes Table 1

Experimental Evaluation  Methodology Performance of algorithms was measured using classification accuracy Ten-way cross validation was used Experiments were restricted to two class problems

KNN Classifiers  “Cosine” Sequence: Frequency vector of different symbols in it Similarity `/. sequences: cosine of the two vectors Does not take sequential constraints into account Table 2

KNN Classifiers 1. ‘Global’ outperforms the other two for all K 2. For PS-HT and PS-TS, performance of ‘Cosine’ is comparable to that of ‘Global’ as limited sequential info. can be exploited Table 2

KNN Classifiers 3. ‘Local’ performs very poorly esp. on protein seq.  Not good to base classification only on a single substring 4. Accuracy decreases when K increases Table 2

Simple Markov Chains vs. Their Feature Spaces 1. Accuracy improves with order of each model Only exceptions: For PS-*, accuracy peaks at 2 nd /1 st order, as sequences are very short  higher order models & their features spaces contain very few examples for calculating transition probabilities Table 3

Simple Markov Chains vs. Their Feature Spaces 2. SVM achieves higher accuracies than simple Markov chains (often 5-10% improvement) Table 3

IMM vs. Their Feature Spaces 1. SVM achieves higher accuracies than IMM for most datasets Exceptions: For P-*, higher order IMM models do considerably better (no explanation provided) Table 4

2. Simple Markov chain based classifiers usually outperform IMM Only exceptions: PS-*, since sequences are comparatively short  greater benefit in using different order Markov states IMM vs. Their Feature Spaces Table 4

IMM Based Classifiers vs. Simple Markov Chain Based Classifiers Table 4 IMM Based Part of Table 3 Simple Markov Chain Based

SMM vs. Their Feature Spaces Table 5a   : parameter (for frequency threshold) used in pruning states of different order Markov chains

Table 5b Table 5c

1. SVM usually achieves higher accuracies than SMM 2. For many problems SMM achieves higher accuracy when  increases, but the gains are rather small Maybe because pruning strategy is too simple SMM vs. Their Feature Spaces

Conclusions 1. SVM classifier used on the feature spaces of different Markov chains (and its variants) achieves substantially better accuracies than the corresponding Markov chain classifier.  The linear classification models learnt by SVM is better than those learnt by Markov chain based approaches

Conclusions 2. Proper feature selection can improve accuracy, but increase in amount of info. available does not necessarily guarantee so. (Except PS-*) The max. accuracy attained by SVM on IMM’s feature spaces is always lower than that attained by it on feature spaces of simple Markov chains. Even with simple frequency based feature selection, as done in SMM, overall accuracy is higher.

3. KNN by computing global alignments can take advantage of the relative positions of symbols in aligned sequences Simple experiment: SVM incorporated with info. about position of symbols was able to achieve an accuracy > 97%.  Position specific info. can be useful for building effective classifiers for biological sequences. Conclusions DatasetHighest accuracy Scheme which achieves the highest accuracy S-EI0.9390KNN (K=5, with global sequence alignment) P-MS0.9719KNN (K=1, with global sequence alignment)

References  Mukund Deshpande and George Karypis. Evaluation of Techniques for Classifying Biological Sequences. In proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery (PAKDD),  Ming-Husan Yang. Presentation entitled “Gentle Guide to Support Vector Machines”.  Alexanda Johannes Smola. Presentation entitled “Support Vector Learning: Concepts and Algorithms”.