Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prediction of T cell epitopes using artificial neural networks

Similar presentations


Presentation on theme: "Prediction of T cell epitopes using artificial neural networks"— Presentation transcript:

1 Prediction of T cell epitopes using artificial neural networks
Morten Nielsen, CBS, BioCentrum, DTU

2 Objectives How to train a neural network to predict peptide MHC class I binding Understand why NN’s perform the best Higher order sequence information The wisdom of the crowd! Why enlightened despotism does not work even for Neural networks

3 Outline MHC class I epitopes Why MHC binding?
How to predict MHC binding? Information content Weight matrices Neural networks Neural network theory Sequence encoding Examples

4 Prediction of HLA binding specificity
Simple Motifs Allowed/non allowed amino acids Extended motifs Amino acid preferences (SYFPEITHI) Anchor/Preferred/other amino acids Hidden Markov models Peptide statistics from sequence alignment (previous talk) Neural networks Can take sequence correlations into account

5 SYFPEITHI predictions
Extended motifs based on peptides from the literature and peptides eluted from cells expressing specific HLAs ( i.e., binding peptides) Scoring scheme is not readily accessible. Positions defined as anchor or auxiliary anchor positions are weighted differently (higher) The final score is the sum of the scores at each position Predictions can be made for several HLA-A, -B and -DRB1 alleles, as well as some mice K, D and L alleles.

6 BIMAS Matrix made from peptides with a measured T1/2 for the MHC-peptide complex The matrices are available on the website The final score is the product of the scores of each position in the matrix multiplied with a constant, different for each MHC, to give a prediction of the T1/2 Predictions can be obtained for several HLA-A, -B and -C alleles, mice K,D and L alleles, and a single cattle MHC.

7 How to predict The effect on the binding affinity of having a given amino acid at one position can be influenced by the amino acids at other positions in the peptide (sequence correlations). Two adjacent amino acids may for example compete for the space in a pocket in the MHC molecule. Artificial neural networks (ANN) are ideally suited to take such correlations into account

8 Higher order sequence correlations
Neural networks can learn higher order correlations! What does this mean? Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind? S S => 0 L S => 1 S L => 1 L L => 0 No linear function can learn this (XOR) pattern

9 Neural network learning higher order correlations

10 Mutual information How is mutual information calculated?
Information content was calculated as Gives information in a single position Similar relation for mutual information Gives mutual information between two positions

11 Mutual information. Example
Knowing that you have G at P1 allows you to make an educated guess on what you will find at P6. P(V6) = 4/9. P(V6|G1) = 1.0! P1 P6 ALWGFFPVA ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS YMNGTMSQV GILGFVFTL WLSLLVPFV FLPSDFFPS P(G1) = 2/9 = 0.22, .. P(V6) = 4/9 = 0.44,.. P(G1,V6) = 2/9 = 0.22, P(G1)*P(V6) = 8/81 = 0.10 log(0.22/0.10) > 0

12 Mutual information 313 binding peptides 313 random peptides

13 Neural network training
SLLPAIVEL YLLPAIVHI TLWVDPYEV GLVPFLVSV KLLEPVLLL LLDVPTAAV LLDVPTAAV LLDVPTAAV LLDVPTAAV VLFRGGPRG MVDGTLLLL YMNGTMSQV MLLSVPLLL SLLGLLVEV ALLPPINIL TLIKIQHTL HLIDYLVTS ILAPPVVKL ALFPQLVIL GILGFVFTL STNRQSGRQ GLDVLTAKV RILGAVAKV QVCERIPTI ILFGHENRV ILMEHIHKL ILDQKINEV SLAGGIIGV LLIENVASL FLLWATAEA SLPDFGISY KKREEAPSL LERPGGNEI ALSNLEVKL ALNELLQHV DLERKVESL FLGENISNF ALSDHHIYL GLSEFTEYL STAPPAHGV PLDGEYFTL GVLVGVALI RTLDKVLEV HLSTAFARV RLDSYVRSL YMNGTMSQV GILGFVFTL ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CLGGLLTMV FIAGNSAYE KLGEFYNQM KLVALGINA DLMGYIPLV RLVTLKDIV MLLAVLYCL AAGIGILTV YLEPGPVTA LLDGTATLR ITDQVPFSV KTWGQYWQV TITDQVPFS AFHHVAREL YLNKIQNSL MMRKLAILS AIMDKNIIL IMDKNIILK SMVGNWAKV SLLAPGAKQ KIFGSLAFL ELVSEFSRM KLTPLCVTL VLYRYGSFS YIGEVLVSV CINGVCWTV VMNILLQYV ILTVILGVL KVLEYVIKV FLWGPRALV GLSRYVARL FLLTRILTI HLGNVKYLV GIAGGLALL GLQDCTMLV TGAPVTYST VIYQYMDDL VLPDVFIRC VLPDVFIRC AVGIGIAVV LVVLGLLAV ALGLGLLPV GIGIGVLAA GAGIGVAVL IAGIGILAI LIVIGILIL LAGIGLIAA VDGIGILTI GAGIGVLTA AAGIGIIQI QAGIGILLA KARDPHSGH KACDPHSGH ACDPHSGHF SLYNTVATL RGPGRAFVT NLVPMVATV GLHCYEQLV PLKQHFQIV AVFDRKSDA LLDFVRFMG VLVKSPNHV GLAPPQHLI LLGRNSFEV PLTFGWCYK VLEWRFDSR TLNAWVKVV GLCTLVAML FIDSYICQV IISAVVGIL VMAGVGSPY LLWTLVVLL SVRDRLARL LLMDCSGSI CLTSTVQLV VLHDDLLEA LMWITQCFL SLLMWITQC QLSLLMWIT LLGATCMFV RLTRFLSRV YMDGTMSQV FLTPKKLQC ISNDVCAQV VKTDGNPPE SVYDFFVWL FLYGALLLA VLFSSDFRI LMWAKIGPV SLLLELEEV SLSRFSWGA YTAFTIPSI RLMKQDFSV RLPRIFCSC FLWGPRAYA RLLQETELV SLFEGIDFY SLDQSVVEL RLNMFTPYI NMFTPYIGV LMIIPLINV TLFIGSHVV SLVIVTTFV VLQWASLAV ILAKFLHWL STAPPHVNV LLLLTVLTV VVLGVVFGI ILHNGAYSL MIMVKCWMI MLGTHTMEV MLGTHTMEV SLADTNSLA LLWAARPRL GVALQTMKQ GLYDGMEHL KMVELVHFL YLQLVFGIE MLMAQEALA LMAQEALAF VYDGREHTV YLSGANLNL RMFPNAPYL EAAGIGILT TLDSQVMSL STPPPGTRV KVAELVHFL IMIGVLVGV ALCRWGLLL LLFAGVQCQ VLLCESTAV YLSTAFARV YLLEMLWRL SLDDYNHLV RTLDKVLEV GLPVEYLQV KLIANNTRV FIYAGSLSA KLVANNTRL FLDEFMEGV ALQPGTALL VLDGLDVLL SLYSFPEPE ALYVDSLFF SLLQHLIGL ELTLGEFLK MINAYLDKL AAGIGILTV FLPSDFFPS SVRDRLARL SLREWLLRI LLSAWILTA AAGIGILTV AVPDEIPPL FAYDGKDYI AAGIGILTV FLPSDFFPS AAGIGILTV FLPSDFFPS AAGIGILTV FLWGPRALV ETVSEQSNV ITLWQRPLV Sequence encoding Sparse Blosum Hidden Markov model Network ensembles Cross validated training Benefit from ensembles

14 How to represent a peptide amino acid sequence to the neural network?
Sequence encoding How to represent a peptide amino acid sequence to the neural network? Sparse encoding (all amino acids are equally disalike) Blosum encoding (encodes similarities between the different amino acids) Weight matrix (encodes the position specific amino acid preference of the HLA binding motif)

15 Evaluation of prediction accuracy
PSSM

16 Neural network training. Cross validation
Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

17 Neural network training curve
Maximum test set performance Most cable of generalizing

18 Network ensembles

19 The Wisdom of the Crowds
The Wisdom of Crowds. Why the Many are Smarter than the Few. James Surowiecki One day in the fall of 1906, the British scientist Fracis Galton left his home and headed for a country fair… He believed that only a very few people had the characteristics necessary to keep societies healthy. He had devoted much of his career to measuring those characteristics, in fact, in order to prove that the vast majority of people did not have them. … Galton came across a weight-judging competition…Eight hundred people tried their luck. They were a diverse lot, butchers, farmers, clerks and many other no-experts…The crowd had guessed … pounds, the ox weighted 1.198

20 Network ensembles No one single network with a particular architecture and sequence encoding scheme, will constantly perform the best Also for Neural network predictions will enlightened despotism fail For some peptides, BLOSUM encoding with a four neuron hidden layer can best predict the peptide/MHC binding, for other peptides a sparse encoded network with zero hidden neurons performs the best Wisdom of the Crowd Never use just one neural network Use Network ensembles

21 Evaluation of prediction accuracy
ENS: Ensemble of neural networks trained using sparse, Blosum, and weight matrix sequence encoding

22 T cell epitope identification
Lauemøller et al., reviews in immunogenetics 2001

23 IEDB + more proprietary data
NetMHC-3.0 update IEDB + more proprietary data Higher accuracy for existing ANNs More Human alleles Non human alleles (Mice + Primates) Prediction of 8mer binding peptides for some alleles Prediction of 10- and 11mer peptides for all alleles Outputs to spread sheet

24

25

26

27 NetMHC Output 53 49 94 289 529

28

29 Prediction of 10- and 11mers using 9mer prediction tools
Approach: For each peptide of length L create 6 pseudo peptides deleting a sliding window of L- 9 always keeping pos. 1,2,3, and 9 Example: MLPQWESNTL = MLPWESNTL MLPQESNTL MLPQWSNTL MLPQWENTL MLPQWESTL MLPQWESNL Q M L P W E S N T L

30 Prediction of 10- and 11mers using 9mer prediction tools

31 Prediction of 10- and 11mers using 9mer prediction tools
Final prediction = average of the 6 log scores: ( )/6 = 0.505 Affinity: Exp(log(50000)*( )) = nM

32 Prediction using ANN trained on 10mer peptides

33 Prediction of 10- and 11mers using 9mer prediction tools

34 Examples. Hepatitis C virus. Epitope predictions
Hotspots

35 SARS T cell epitope identification
Peptides tested: 15/15 (100 %) Binders (KD < 500 nM): 14/15 (93%)

36 More SARS CTL epitopes A0301 A1101 B0702 11/15 14/15 10/15
A2 supertype: Molecule used: rA0201/ human b2m 12/14 12/15 13/15 B1501 A0201 B5801

37 Vaccine design. Polytope optimization
Successful immunization can be obtained only if the epitopes encoded by the polytope are correctly processed and presented. Cleavage by the proteasome in the cytosol, translocation into the ER by the TAP complex, as well as binding to MHC class I should be taken into account in an integrative manner. The design of a polytope can be done in an effective way by modifying the sequential order of the different epitopes, and by inserting specific amino acids that will favor optimal cleavage and transport by the TAP complex, as linkers between the epitopes.

38 Vaccine design. Polytope construction
Linker NH2 M COOH Epitope cleavage C-terminal cleavage New epitopes Cleavage within epitopes

39 Polytope starting configuration
Immunological Bioinformatics, The MIT press.

40 Polytope optimization Algorithm
Optimization of four measures: The number of poor C-terminal cleavage sites of epitopes (predicted cleavage < 0.9) The number of internal cleavage sites (within epitope cleavages with a prediction larger than the predicted C-terminal cleavage) The number of new epitopes (number of processed and presented epitopes in the fusing regions spanning the epitopes) The length of the linker region inserted between epitopes. The optimization seeks to minimize the above four terms by use of Monte Carlo Metropolis simulations [Metropolis et al., 1953]

41 Polytope optimal configuration
Immunological Bioinformatics, The MIT press.

42 MHC class I binding can be very accurately predicted using ANN
Summary MHC class I binding can be very accurately predicted using ANN Higher order sequence correlations are important for peptide:MHC-I binding ANN can can be trained without overfitting Using multiple sequence encoding schemes Wisdom of the crowd Optimization can generate polytopes with high likelihood for antigen presentation


Download ppt "Prediction of T cell epitopes using artificial neural networks"

Similar presentations


Ads by Google