Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.

Slides:



Advertisements
Similar presentations
Making Peptides for Presentation A Pictorial Introduction SAMSI 3 March 2005.
Advertisements

Major Histocompatibility Complex. Principles of Immune Response Highly specific recognition of foreign antigens Mechanisms for elimination of microbes.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Project in Immunological Bioinformatics Morten Nielsen, CBS, BioCentrum, DTU.
NEURAL NETWORKS Backpropagation Algorithm
T-cell epitope prediction by molecular dynamics simulations Irini Doytchinova Medical University of Sofia School of Pharmacy Medical University of Sofia.
Bioinformatical design of a vaccine against influenza virus N1 subtype Bonaccorsi, Irene; Clausen, Martin Bau; Høj, Leif Howalt; Kjær, Jesper and Sayyad,
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence information, logos and Hidden Markov Models Morten Nielsen, CBS, BioCentrum,
Understanding the immune system A Challenge or impossible dream.
HeleneAndersenMayaBondeAndersenSimonCarlsen MortenAhlgreenGronemann&MadsChristianHjortsø Figure 4, Alignment of the NS3 protein: Alignment of NS3 performed.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
MHC Polymorphism Ole Lund. Objectives What is HLA polymorphism? What is it good for? How does it make life difficult for vaccine design? Definition of.
Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Algorithms in Bioinformatics Morten Nielsen BioSys, DTU.
Biological sequence analysis and information processing by artificial neural networks.
Protein Fold recognition Morten Nielsen, CBS, BioSys, DTU.
Class I pathway Prediction of proteasomal cleavage and TAP binding.
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Department of Systems Biology Technical University of Denmark Immunological Bioinformatics Processing, combined.
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Protein structure and homology modeling Morten Nielsen, CBS, BioCentrum, DTU.
MHC Polymorphism. MHC Class I pathway Figure by Eric A.J. Reits.
Protein Fold recognition Morten Nielsen, CBS, Department of Systems Biology, DTU.
Class I pathway Prediction of proteasomal cleavage and TAP binidng Morten Nielsen, CBS, BioCentrum, DTU.
Neural Networks Marco Loog.
Class I pathway Prediction of proteasomal cleavage and TAP binidng Can Keşmir, TBB, Utrecht University, NL & CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Selection of T Cell Epitopes Using an Integrative Approach Mette Voldby Larsen cand. scient. in biology ph.d. student.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Biological sequence analysis and information processing by artificial neural networks.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
HLA MHCs are the gatekeepers of the immune system. 1.) LOCATE: Present peptides that may be viral. 2.) ACTIVATE: Activate immune defense mechanisms.
Epitope Selection Rational Vaccine design. Why? Therapeutic vaccines Therapeutic vaccines Treatment of viral infections (e.g., HIV, HCV), and resistant.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Prediction of CTL responses Mette Voldby Larsen cand. scient. in biology ph.d. student.
Protein Structures.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Algorithms in Bioinformatics Morten Nielsen Department of Systems Biology, DTU.
Methods MHC class-I T cell epitope prediction for Nef Consensus and ancestral sequences of the Nef protein for the different HIV-1 subtypes were obtained.
Evolving a Sigma-Pi Network as a Network Simulator by Justin Basilico.
Selection of T Cell Epitopes Using an Integrative Approach Mette Voldby Larsen cand. scient. in Biology PhD in Immunological Bioinformatics.
Artificial Neural Networks
The Major Histocompatibility Complex (MHC) In all vertebrates there is a genetic region that has a major influence on graft survival This region is referred.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
1 Computer-aided Subunit Vaccine Design G.P.S. Raghava, Institute of Microbial Technology, Chandigarh  Understanding immune system  Breaking complex.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Construction of Substitution Matrices
Telling self from non-self: Learning the language of the Immune System Rose Hoberman and Roni Rosenfeld BioLM Workshop May 2003.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Introduction H5N1 is an avian influenza. It was detected in humans for the first time in 1997 in Hong Kong. Since then the spread to humans has been limited.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.
Construction of Substitution matrices
Bioinformatics in Vaccine Design
Lecture 13 Immunology and disease: parasite antigenic diversity.
Immunology and disease: parasite antigenic diversity
Ligand Docking to MHC Class I Molecules
MHCflurry: Open-Source Class I MHC Binding Affinity Prediction
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Algorithms in Bioinformatics
Presentation transcript:

Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU

Objectives How to train a neural network to predict peptide MHC class I binding Understand why NN’s perform the best –Higher order sequence information The wisdom of the crowd! –Why enlightened despotism does not work even for Neural networks

Outline MHC class I epitopes –Why MHC binding? How to predict MHC binding? –Information content –Weight matrices –Neural networks Neural network theory –Sequence encoding Examples

Encounter with death

MHC binding Processing of intracellular proteins

Lauemøller et al., % processed0.5% bind MHC50% CTL response => 1/2000 peptide are immunogenic From proteins to immunogens

Anchor positions MHC class I with peptide

What makes a peptide a potential and effective epitope? Part of a pathogen protein Successful processing –Proteasome cleavage –TAP binding Binds to MHC molecule Protein function –Early in replication Sequence conservation in evolution

Prediction of HLA binding specificity Simple Motifs –Allowed/non allowed amino acids Extended motifs –Amino acid preferences (SYFPEITHI)SYFPEITHI) –Anchor/Preferred/other amino acids Hidden Markov models –Peptide statistics from sequence alignment (previous talk) Neural networks –Can take sequence correlations into account

SYFPEITHI predictions Extended motifs based on peptides from the literature and peptides eluted from cells expressing specific HLAs ( i.e., binding peptides) Scoring scheme is not readily accessible. Positions defined as anchor or auxiliary anchor positions are weighted differently (higher) The final score is the sum of the scores at each position Predictions can be made for several HLA-A, -B and -DRB1 alleles, as well as some mice K, D and L alleles.

BIMAS Matrix made from peptides with a measured T 1/2 for the MHC-peptide complex The matrices are available on the website The final score is the product of the scores of each position in the matrix multiplied with a constant, different for each MHC, to give a prediction of the T 1/2 Predictions can be obtained for several HLA-A, -B and -C alleles, mice K,D and L alleles, and a single cattle MHC.

How to predict The effect on the binding affinity of having a given amino acid at one position can be influenced by the amino acids at other positions in the peptide (sequence correlations). –Two adjacent amino acids may for example compete for the space in a pocket in the MHC molecule. Artificial neural networks (ANN) are ideally suited to take such correlations into account

Higher order sequence correlations Neural networks can learn higher order correlations! –What does this mean? S S => 0 L S => 1 S L => 1 L L => 0 No linear function can learn this (XOR) pattern Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind?

Learning higher order correlation 0 0 => 0; 1 0 => => 0; 0 1 => 1 Step function Threshold t Check it out. It does actually work!

Neural network learning higher order correlations

How is mutual information calculated? Information content was calculated as Gives information in a single position Similar relation for mutual information Gives mutual information between two positions Mutual information

Mutual information. Example ALWGFFPVA ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS YMNGTMSQV GILGFVFTL WLSLLVPFV FLPSDFFPS P1 P6 P(G 1 ) = 2/9 = 0.22,.. P(V 6 ) = 4/9 = 0.44,.. P(G 1,V 6 ) = 2/9 = 0.22, P(G 1 )*P(V 6 ) = 8/81 = 0.10 log(0.22/0.10) > 0 Knowing that you have G at P 1 allows you to make an educated guess on what you will find at P 6. P(V 6 ) = 4/9. P(V 6 |G 1 ) = 1.0!

313 binding peptides313 random peptides Mutual information

Neural network training Sequence encoding –Sparse –Blosum –Hidden Markov model Network ensembles –Cross validated training –Benefit from ensembles

Sequence encoding How to represent a peptide amino acid sequence to the neural network? Sparse encoding (all amino acids are equally disalike) Blosum encoding (encodes similarities between the different amino acids) Hidden Markov model (encodes the position specific amino acid preference of the HLA binding motif)

Sequence encoding (continued) Sparse encoding V: L: V. L=0 (unrelated) Blosum encoding V: L: V. L = 0.88 (highly related) V. R = (close to unrelated)

Weight matrices A R N D C Q E G H I L K M F P S T W Y V NLTISDVSV ….. 4.5

Evaluation of prediction accuracy

A Network contains a very large set of parameters –A network with 5 hidden neurons predicting binding for 9meric peptides has 9x20x5=900 weights Over fitting is a problem Stop training when test performance is optimal Neural network training

Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

Neural network training curve Maximum test set performance Most cable of generalizing

Network ensembles

The Wisdom of the Crowds The Wisdom of Crowds. Why the Many are Smarter than the Few. James Surowiecki One day in the fall of 1906, the British scientist Fracis Galton left his home and headed for a country fair… He believed that only a very few people had the characteristics necessary to keep societies healthy. He had devoted much of his career to measuring those characteristics, in fact, in order to prove that the vast majority of people did not have them. … Galton came across a weight-judging competition…Eight hundred people tried their luck. They were a diverse lot, butchers, farmers, clerks and many other no-experts…The crowd had guessed … pounds, the ox weighted 1.198

Network ensembles No one single network with a particular architecture and sequence encoding scheme, will constantly perform the best Also for Neural network predictions will enlightened despotism fail –For some peptides, BLOSUM encoding with a four neuron hidden layer can best predict the peptide/MHC binding, for other peptides a sparse encoded network with zero hidden neurons performs the best –Wisdom of the Crowd Never use just one neural network Use Network ensembles

Evaluation of prediction accuracy ENS : Ensemble of neural networks trained using sparse, Blosum, and hidden Markov model sequence encoding

T cell epitope identification Lauemøller et al., reviews in immunogenetics 2001

NetMHC Output

Examples. Hepatitis C virus. Epitope predictions Hotspots

SARS T cell epitope identification Peptides tested: 15/15 (100 %) Binders (K D < 500 nM): 14/15 (93%)

More SARS CTL epitopes 11/15 14/15 10/15 13/15 12/14 A0301A1101B0702 A0201B5801 B1501 A2 supertype: Molecule used: rA0201/ human  2 m 12/15

Vaccine design. Polytope optimization Successful immunization can be obtained only if the epitopes encoded by the polytope are correctly processed and presented. Cleavage by the proteasome in the cytosol, translocation into the ER by the TAP complex, as well as binding to MHC class I should be taken into account in an integrative manner. The design of a polytope can be done in an effective way by modifying the sequential order of the different epitopes, and by inserting specific amino acids that will favor optimal cleavage and transport by the TAP complex, as linkers between the epitopes.

Vaccine design. Polytope construction NH2 COOH Epitope Linker M C-terminal cleavage Cleavage within epitopes New epitopes cleavage

Polytope starting configuration Immunological Bioinformatics, The MIT press.

Polytope optimization Algorithm Optimization of four measures: 1.The number of poor C-terminal cleavage sites of epitopes (predicted cleavage < 0.9) 2.The number of internal cleavage sites (within epitope cleavages with a prediction larger than the predicted C- terminal cleavage) 3.The number of new epitopes (number of processed and presented epitopes in the fusing regions spanning the epitopes) 4.The length of the linker region inserted between epitopes. The optimization seeks to minimize the above four terms by use of Monte Carlo Metropolis simulations [Metropolis et al., 1953]

Polytope optimal configuration Immunological Bioinformatics, The MIT press.

Take home message Binding of peptides to MHC is guided by higher order sequence correlations Neural networks predict the binding Always use ensembles of prediction methods No method is consistently best Is does work For the SARS virus close to 80% of the predicted peptides did bind MHC.