Presentation on theme: "Prediction of T cell epitopes using artificial neural networks"— Presentation transcript:
1Prediction of T cell epitopes using artificial neural networks Morten Nielsen,CBS, BioCentrum,DTU
2ObjectivesHow to train a neural network to predict peptide MHC class I bindingUnderstand why NN’s perform the bestHigher order sequence informationThe wisdom of the crowd!Why enlightened despotism does not work even for Neural networks
3Outline MHC class I epitopes Why MHC binding? How to predict MHC binding?Information contentWeight matricesNeural networksNeural network theorySequence encodingExamples
4Prediction of HLA binding specificity Simple MotifsAllowed/non allowed amino acidsExtended motifsAmino acid preferences (SYFPEITHI)Anchor/Preferred/other amino acidsHidden Markov modelsPeptide statistics from sequence alignment (previous talk)Neural networksCan take sequence correlations into account
5SYFPEITHI predictions Extended motifs based on peptides from the literature and peptides eluted from cells expressing specific HLAs ( i.e., binding peptides)Scoring scheme is not readily accessible.Positions defined as anchor or auxiliary anchor positions are weighted differently (higher)The final score is the sum of the scores at each positionPredictions can be made for several HLA-A, -B and -DRB1 alleles, as well as some mice K, D and L alleles.
6BIMASMatrix made from peptides with a measured T1/2 for the MHC-peptide complexThe matrices are available on the websiteThe final score is the product of the scores of each position in the matrix multiplied with a constant, different for each MHC, to give a prediction of the T1/2Predictions can be obtained for several HLA-A, -B and -C alleles, mice K,D and L alleles, and a single cattle MHC.
7How to predictThe effect on the binding affinity of having a given amino acid at one position can be influenced by the amino acids at other positions in the peptide (sequence correlations).Two adjacent amino acids may for example compete for the space in a pocket in the MHC molecule.Artificial neural networks (ANN) are ideally suited to take such correlations into account
8Higher order sequence correlations Neural networks can learn higher order correlations!What does this mean?Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleftHow would you formulate this to test if a peptide can bind?S S => 0L S => 1S L => 1L L => 0No linear function can learn this (XOR) pattern
9Neural network learning higher order correlations
10Mutual information How is mutual information calculated? Information content was calculated asGives information in a single positionSimilar relation for mutual informationGives mutual information between two positions
11Mutual information. Example Knowing that you have G at P1 allows you to make an educated guess on what you will find at P6.P(V6) = 4/9. P(V6|G1) = 1.0!P1P6ALWGFFPVAILKEPVHGVILGFVFTLTLLFGYPVYVGLSPTVWLSYMNGTMSQVGILGFVFTLWLSLLVPFVFLPSDFFPSP(G1) = 2/9 = 0.22, ..P(V6) = 4/9 = 0.44,..P(G1,V6) = 2/9 = 0.22,P(G1)*P(V6) = 8/81 = 0.10log(0.22/0.10) > 0
12Mutual information313 binding peptides313 random peptides
14How to represent a peptide amino acid sequence to the neural network? Sequence encodingHow to represent a peptide amino acid sequence to the neural network?Sparse encoding (all amino acids are equally disalike)Blosum encoding (encodes similarities between the different amino acids)Weight matrix (encodes the position specific amino acid preference of the HLA binding motif)
19The Wisdom of the Crowds The Wisdom of Crowds. Why the Many are Smarter than the Few. James SurowieckiOne day in the fall of 1906, the British scientist Fracis Galton left his home and headed for a country fair… He believed that only a very few people had the characteristics necessary to keep societies healthy. He had devoted much of his career to measuring those characteristics, in fact, in order to prove that the vast majority of people did not have them. … Galton came across a weight-judging competition…Eight hundred people tried their luck. They were a diverse lot, butchers, farmers, clerks and many other no-experts…The crowd had guessed … pounds, the ox weighted 1.198
20Network ensemblesNo one single network with a particular architecture and sequence encoding scheme, will constantly perform the bestAlso for Neural network predictions will enlightened despotism failFor some peptides, BLOSUM encoding with a four neuron hidden layer can best predict the peptide/MHC binding, for other peptides a sparse encoded network with zero hidden neurons performs the bestWisdom of the CrowdNever use just one neural networkUse Network ensembles
21Evaluation of prediction accuracy ENS: Ensemble of neural networks trained using sparse,Blosum, and weight matrix sequence encoding
22T cell epitope identification Lauemøller et al., reviews in immunogenetics 2001
23IEDB + more proprietary data NetMHC-3.0 updateIEDB + more proprietary dataHigher accuracy for existing ANNsMore Human allelesNon human alleles (Mice + Primates)Prediction of 8mer binding peptides for some allelesPrediction of 10- and 11mer peptides for all allelesOutputs to spread sheet
29Prediction of 10- and 11mers using 9mer prediction tools Approach:For each peptide of length L create 6 pseudo peptides deleting a sliding window of L- 9 always keeping pos. 1,2,3, and 9Example:MLPQWESNTL = MLPWESNTLMLPQESNTLMLPQWSNTLMLPQWENTLMLPQWESTLMLPQWESNLQMLPWESNTL
30Prediction of 10- and 11mers using 9mer prediction tools
31Prediction of 10- and 11mers using 9mer prediction tools Final prediction = average of the 6 log scores:( )/6 = 0.505Affinity:Exp(log(50000)*( )) = nM
37Vaccine design. Polytope optimization Successful immunization can be obtained only if the epitopes encoded by the polytope are correctly processed and presented.Cleavage by the proteasome in the cytosol, translocation into the ER by the TAP complex, as well as binding to MHC class I should be taken into account in an integrative manner.The design of a polytope can be done in an effective way by modifying the sequential order of the different epitopes, and by inserting specific amino acids that will favor optimal cleavage and transport by the TAP complex, as linkers between the epitopes.
38Vaccine design. Polytope construction LinkerNH2MCOOHEpitopecleavageC-terminal cleavageNew epitopesCleavage within epitopes
39Polytope starting configuration Immunological Bioinformatics, The MIT press.
40Polytope optimization Algorithm Optimization of four measures:The number of poor C-terminal cleavage sites of epitopes (predicted cleavage < 0.9)The number of internal cleavage sites (within epitope cleavages with a prediction larger than the predicted C-terminal cleavage)The number of new epitopes (number of processed and presented epitopes in the fusing regions spanning the epitopes)The length of the linker region inserted between epitopes.The optimization seeks to minimize the above four terms by use of Monte Carlo Metropolis simulations [Metropolis et al., 1953]
41Polytope optimal configuration Immunological Bioinformatics, The MIT press.
42MHC class I binding can be very accurately predicted using ANN SummaryMHC class I binding can be very accurately predicted using ANNHigher order sequence correlations are important for peptide:MHC-I bindingANN can can be trained without overfittingUsing multiple sequence encoding schemesWisdom of the crowdOptimization can generate polytopes with high likelihood for antigen presentation