Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids Y. Wang, O. Zaiane, R. Goebel.

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
ECG Signal processing (2)
Neural networks Introduction Fitting neural networks
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
CHAPTER 10: Linear Discrimination
Support Vector Machines
Machine learning continued Image source:
CMPUT 466/551 Principal Source: CMU
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Discriminative and generative methods for bags of features
Frequent-Subsequence-Based Prediction of Outer Membrane Proteins R. She, F. Chen, K. Wang, M. Ester, School of Computing Science J. L. Gardy, F. S. L.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Reduced Support Vector Machine
Ensemble Learning: An Introduction
Adaboost and its application
Three kinds of learning
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.
Machine Learning: Ensemble Methods
An Introduction to Support Vector Machines CSE 573 Autumn 2005 Henry Kautz based on slides stolen from Pierre Dönnes’ web site.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
This week: overview on pattern recognition (related to machine learning)
Truncation of Protein Sequences for Fast Profile Alignment with Application to Subcellular Localization Man-Wai MAK and Wei WANG The Hong Kong Polytechnic.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
An Introduction to Support Vector Machine Classification Bioinformatics Lecture 7/2/2003 by Pierre Dönnes.
An Introduction to Support Vector Machine (SVM)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Data Mining and Decision Support
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
SVMs in a Nutshell.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Neural networks and support vector machines
An Introduction to Support Vector Machines
Combining Base Learners
Support Vector Machine _ 2 (SVM)
Model generalization Brief summary of methods
INTRODUCTION TO Machine Learning 3rd Edition
CIS 519 Recitation 11/15/18.
Physics-guided machine learning for milling stability:
Modeling IDS using hybrid intelligent systems
Introduction to Machine Learning
Presentation transcript:

Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids Y. Wang, O. Zaiane, R. Goebel

2 Introduction Protein: linear sequence of amino acids Protein subcellular localization Plant: nuclear, cytoplamic, mitochondria, extracellular, … Intracellular vs. Extracellular Sequence information alone Class imbalance Transparency

3 Related Word N-terminal sorting signals Amino acid composition Lexical analysis Integrative approach Subsequence methods

4 Predicting Extracellular Proteins Feature Extraction Support Vector Machine Boosting Frequent Pattern Method

5 Feature Extraction Frequent subsequences: subsequences that occur in more than a certain percentage of extracellular proteins Strong discriminative power Perform similar functions via relationed biochemical mechanism Capture local similarity

6 Generalized Suffix Tree

7 Support Vector Machine Input data represented as feature vectors Find a linear separator that separate the data and maximize the margin Kernel function: nonlinear separator

8 SVM for extracellular protein prediction Data Transformation(sequence  vector) Frequent subsequences as features Transform protein sequence as binary vectors Kernel Functions Linear kernel Polynomial kernel RBF kernel

9 Boosting Iterative algorithms to improve weak classifier Different weighted distribution of examples in each iteration Increase the weights of incorrectly classified examples, and decrease the weights of correctly classified ones

10 AdaBoost

11 Frequent Pattern Method Frequent pattern: *X1*X2*…*Xn*  extracellular X1,X2,…Xn are frequent subsequences “*” can be substituted to zero or up to MaxGap amino acids when matching a protein sequence

12 FOIL algorithm

13 Z-number :accuracy of rule R :support of rule R

14

15 Experiments Dataset(PASub project at UofA) Plant: 3293 proteins, 171 extracellular Five-cross validation

16 Evaluation Matrix Overall accuracy is not good enough F-measure

17 Result(SVM with subsequence)

18 Result(Boosting with subsequence)

19 Result(Frequent Pattern) MinLen=3 Min_gain=0.1 MinSup=5% MinConf=80% MaxGap=300

20 Result(SVM with composition)

21 Result(Boosting with composition)

22 Cross Comparision

23 SVM with combined features

24 Boosting with combined features

25 Effects of MinLen on SVM

26 Effects of MinLen on boosting

27 Conclusion Presented three methods for identifying extracellular proteins based on frequent subsequence of amino acids SVM achieves the best result FSP method provides easily interpretable rules

28 Future Work Use for information about proteins (e.g., structure, function, …) Integrating amino acid composition into FSP method Incorporate more biological knowledge