Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc.

Slides:



Advertisements
Similar presentations
Major Histocompatibility Complex. Principles of Immune Response Highly specific recognition of foreign antigens Mechanisms for elimination of microbes.
Advertisements

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence information, logos and Hidden Markov Models Morten Nielsen, CBS, BioCentrum,
Vaxil BioTherapeutics Ltd.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Slide 1 of 38 T-cell EPITOPES PREDICTION OF HEMAGGLUTININ, NEURAMINIDASE AND MATRIX PROTEIN OF INFLUENZA A VIRUS USING SUPPORT VECTOR MACHINE AND HIDDEN.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
MHC Polymorphism Ole Lund. Objectives What is HLA polymorphism? What is it good for? How does it make life difficult for vaccine design? Definition of.
Profiles for Sequences
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
Machine Learning for Protein Classification Ashutosh Saxena CS 374 – Algorithms in Biology Thursday, Nov 16, 2006.
Profile-profile alignment using hidden Markov models Wing Wong.
MHC Polymorphism. MHC Class I pathway Figure by Eric A.J. Reits.
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Informatics Support for Vaccine Projects Using and extending the UCSC bioinformatics infrastructure.
Introduction to BioInformatics GCB/CIS535
Application of support vector machines for T-cell epitopes prediction By Yingdong Zhao, Clemencia Pinilla, Danila Valmori, Roland Martin and Richard Simon.
Biological sequence analysis and information processing by artificial neural networks.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Protein Therapeutics Immunogenicity Stephen Lynn CHEM645 0.
HLA MHCs are the gatekeepers of the immune system. 1.) LOCATE: Present peptides that may be viral. 2.) ACTIVATE: Activate immune defense mechanisms.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Structures.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Protein Tertiary Structure Prediction
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
Masquerade Detection Mark Stamp 1Masquerade Detection.
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Machine-learning in building bioinformatics databases for infectious diseases Victor Tong Institute for Infocomm Research A*STAR, Singapore ASEAN-China.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Selection of T Cell Epitopes Using an Integrative Approach Mette Voldby Larsen cand. scient. in Biology PhD in Immunological Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author: Aravind.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Antigen Receptors of Lymphocytes. Recognition: molecular patterns Recognition : molecular details (antigenic determinants) Innate immunity Aquired immunity.
1 Web Site: Dr. G P S Raghava, Head Bioinformatics Centre Institute of Microbial Technology, Chandigarh, India Prediction.
Evaluation of Techniques for Classifying Biological Sequences Authors: Mukund Deshpande and George Karypis Speaker: Sarah Chan CSIS DB Seminar May 31,
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
VAKSIN. INTRODUCTION AND HISTORY Vaccination can be defined as a deliberate attempt to induce protection against disease with the goal of inducing active.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
ANTIGENS  Antigen: Any substance reacting with the products of any specific immune response (Ig or T cells)  Immunogen: Any substance capable to induce.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
Lecture 1: Immunogenetics Dr ; Kwanama
HMMs and SVMs for Secondary Structure Prediction
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Protein Folding recognition with Committee Machine Mika Takata.
Bioinformatics in Vaccine Design
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Protein families, domains and motifs in functional prediction May 31, 2016.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
IMMUNOGRID Nikolai Petrovsky and Vladimir Brusic
Antigens Ali Al Khader, MD Faculty of Medicine
Combining HMMs with SVMs
Protein Structures.
Antigens Ali Al Khader, MD Faculty of Medicine
Presentation transcript:

Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc Faculty of Biology, University of Natural Sciences, VNU-HCMC, Vietnam Sixth International Conference on Bioinformatics InCoB2007

Epitope prediction “Epitope is the portion of an antigen that is recognized by the antigen receptor on lymphocytes” Molecular Biology Epitope prediction: Computers aid to develop epitope-based vaccines against various human pathogens for which no vaccines currently exist

T-cell epitope prediction T-cell epitopes are a subset of MHC binding peptides  prediction of the peptides binding to MHC is essential for design of peptide-based vaccines HLA-A0201 Sequence Binding motifs Quantitative matrices Decision tree Artificial neural networks Hidden Markov models Support vector machines Molecular Biology

HMMs & SVMs HMMs (Hidden Markov Models) Statistical model that can capture complex relationships in data sets. SVMs (Support Vector Machines): Learning machine that can find the optimal separating hyperplane.

Epitope prediction for dengue virus Tropical disease Dengue fever Dengue hemorraghic fever Dengue shock syndrome Hypothesis of pathogenesis Antibody – dependent enhancement Virus virulence No dengue vaccine is available In our research:. Develop procedure for building automatically T-cell epitope predicting models. Find candidates in silico for making multivalent vaccines on 4 types of Dengue virus

Building models for predicting T-cell epitopes & applying these models on dengue virus

Building effective prediction models? The predicting ability of HMM and SVM models depends on: Experimentally peptides binding to MHC molecules Partition of the peptides into training set and testing set Encoding method  A system finds easily and quickly the best prediction model when type of MHC molecules and quantity of binding peptides are changed

Processing MHC-binding experimental peptides

Create training and testing sets

Training & testing procedure HMMs (HMMer)SVMs (SVM_light)

Experiment 1 MethodHMMsSVMs DatabasesMHCBN, MHCPEP Homology7- amino acid No. homologous groupsbinding seq.: 11, non-binding seq.: 3 Kind of peptideBinding Non- binding Binding Non- binding No. peptides Training set Testing set Training times200 ParametersE-value = 0 ÷ 10 Linear kernel, c = 0 Encoding: binary, Blosum-62, physical-chemical method

Result of the training by HMMs HMM.7.136: A ROC =0.914 Choose parameter from HMM.7.136: At point: E=3.4, S=-8.5, SE=0.91, SP= 0.86, A ROC =0.885

Result of the training by SVMs Binary encoding: A ROC =0.42÷0.77 Blosum-62 encoding: A ROC = 0.47÷0.87 Chemical-physical encoding: A ROC = 0.41÷0.71 At blosum-62 encoding, data set SVM.7.blo62.46: SE=0.83, SP=0.90, A ROC =0.87

Experiment 2 MethodHMMsSVMs DatabasesMHCBN, MHCPEP, IEDB Homology7- amino acid, 6-amino acid, 5-amino acid Training times ParametersE-value = 40 ÷ 80 Linear kernel, c = 0 Encoding: binary, Blosum-62, Binary - Blosum-62 method

Result of the training by HMMs Homology5-amino acid6-amino acid7-amino acid Kind of peptideBinding No. homologous group No. Sequences in homologous groups Total peptides Training set Testing set A ROC 0.832÷ ÷ ÷0.876 The best HMM profileHMM.6.78

Training in 6-amino acid homologous groups Parameters of HMM.6.78: At point: E=42, S=-9.2, SE=0.91, SP= 0.84, A ROC =0.875 HMM.6.78: A ROC =0.883

Result of the training by SVMs methods Homology5-amino acid6-amino acid7-amino acid Kind of peptide Binding Non- binding Binding Non- binding Binding Non- binding Total homologous group Sequence in homologous groups Total sequences Training set Testing set A ROC Binary encoding (1) 0.847÷ ÷ ÷0.882 Blosum-62 encoding (2) 0.843÷ ÷ ÷0.894 Binary-Blosum- 62 encoding (3) 0.849÷ ÷ ÷0.891 Chosen set SVM.blo

Training in 7-amino acid homologous groups At SVM : SE=0.93, SP=0.86, A ROC =0.894 : Binary encoding : Blosum-62 encoding : Binary-Blosum-62 encoding

Epitope predicting procedure for dengue virus 1. Do multiple sequence alignment 2. Extract consensus sequences more than or equal 9 amino acids 3. Create 9-mer overlap sequences 4. Predict peptides binding to MHC by HMMs profile or SVMs model

Experiment 1 Proteins (1,2,3,4)Epitope sequencesMethods 537NS3, 536NS3, 2010DV3_gp1, 536NS3 LMRRGDLPVWL HMMs, SVMs 763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs, SVMs 358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI SVMs 658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV SVMs 472NS5, 473NS5, 223NS5, 473NS5 AIWYMWLGA SVMs 101E, 99E, 99glycoprot, 99E RGWGNGCGL SVMs 194NS1, 194NS1, 193NS1, 194NS1 VHADMGYWI SVMs 352NS5, 353NS5, 103NS5, 353NS5 RVFKEKVDT SVMs 13NS1, 13NS1, 12NS1, 13NS1 LKCGSGIFV SVMs 26NS1, 26NS1, 25NS1, 26NS1 HTWTEQYKF SVMs 230NS1, 230NS1, 229NS1, 230NS1 TLWSNGVLES SVMs 327NS1, 327NS1, 326NS1, 327NS1 DGCWYGMEIRP SVMs 148NS3, 148NS3, 142Pep_S7, 148NS3 GLYGNGVVT SVMs 256NS3, 255NS3, 67DEXHc, 255NS3 EIVDLMCHA SVMs 297NS3, 296NS3, 108DEXHc, 296NS3 ARGYISTRV SVMs 410NS3, 409NS3, 54HELICc, 409NS3 DISEMGANF SVMs 36NS4B, 35NS4B, 35NS4B, 32NS4B ASAWTLYAV SVMs 118NS4B, 117NS4B, 117NS4B, 114NS4B HYAIIGPGLQA SVMs 142NS4B, 141NS4B, 141NS4B, 138NS4B IMKNPTVDGI SVMs 224NS4B, 223NS4B, 223NS4B, 220NS4B NIFRGSYLAGA SVMs 81NS5, 81NS5, 27FtsJ, 81NS5 GCGRGGWSY SVMs 529NS5, 530NS5, 280NS5, 530NS5 MYADDTAGW SVMs 602NS5, 603NS5, 353NS5, 603NS5 QVGTYGLNT SVMs 606NS5, 607NS5, 357NS5, 607NS5 YGLNTFTNM SVMs 682NS5, 683NS5, 434NS5, 684NS5 DMGKVRKDI SVMs 745NS5, 746NS5, 497NS5, 747NS5 WSLRETACLG SVMs 788NS5, 789NS5, 540NS5, 790NS5 PTSRTTWSI SVMs Proteins (1,2,3,4)Epitope sequencesMethods 537NS3, 536NS3, 2010DV3_gp1, 536NS5 LMRRGDLPV HMMs 763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs 358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI HMMs 658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV HMMs 469NS5, 470NS5, 220NS5, 470NS5 GSRAIWYMWLGAR HMMs 103E, 101E, 101DV3_gp1, 101E WGNGCGLFG SVMs 193NS1, 193NS1, 192NS1, 193NS1 AVHADMGYWIES SVMs 348NS5, 349NS5, 99NS5, 349NS5 FGQQRVFKE SVMs 568NS5, 569NS5, 319NS5, 569NS5 FKLTYQNKV HMMs Experiment 2 Result of epitope prediction (peptide binding to HLA- A0201 prediction): Join overlap 9-amino acid peptides predicted binding to HLA-A0201 molecules

Result of prediction HMMs profile is stable and increase ability of prediction when there are additional data sets. SVMs model is good but ability of prediction decreases when amount of training data increases.

Conclusion Successfully building system for training Hidden Markov models and Support Vector Machines Generating training and testing data based on separating data set into homologous groups give us good result. Could predict consensus epitope for 4 types of Dengue virus based on data of peptides binding to HLA-A0201

Future plans Set other kernels on SVMs method Survey other encoding method for sequences having flexible length Survey other methods for classifying MHC data to homologous groups Automate procedure collecting and updating data of peptide binding MHC from databases

Thank you very much! Thank you very much!