Immunological Bioinformatics. The Immunological Bioinformatics group Immunological Bioinformatics group, CBS, Technical University of Denmark (www.cbs.dtu.dk)www.cbs.dtu.dk.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
Advertisements

Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Hidden Markov models) Morten.
Immune system overview in 10 minutes The non-immunologist guide to the immune system Morten Nielsen Department of Systems Biology DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS,
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Immune system overview in 10 minutes The non-immunologist guide to the immune system.
Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes G.L. Zhang, A.M.
Sequence motifs, information content, logos, and Weight matrices
Prediction of T cell epitopes using artificial neural networks
MHC binding and MHC polymorphism Or Finding the needle in the haystack.
MHC binding and MHC polymorphism. MHC-I molecules present peptides on the surface of most cells.
Morten Nielsen, CBS, BioCentrum, DTU
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Immunological Bioinformatics Or Finding the needle in the haystack Morten Nielsen
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Representing the Immune Epitope Database in OWL Jason A. Greenbaum 1, Randi Vita 1, Laura Zarebski 1, Hussein Emami 2, Alessandro Sette 1, Alan Ruttenberg.
Sequence motifs, information content, and sequence logos Morten Nielsen, CBS, Depart of Systems Biology, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Department of Systems Biology Technical University of Denmark Immunological Bioinformatics Introduction to the.
Hidden Markov Models, HMM’s Morten Nielsen, CBS, Department of Systems Biology, DTU.
Vaccine Design. Need for new vaccine technologies The classical way of making vaccines have in many cases been tried for the pathogens for which no vaccines.
MHC Polymorphism Ole Lund. Objectives What is HLA polymorphism? What is it good for? How does it make life difficult for vaccine design? Definition of.
Sequence motifs, information content, logos, and Weight matrices Morten Nielsen, CBS, BioCentrum, DTU.
Immunological feature predictions and databases on the web Ole Lund Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark.
Class I pathway Prediction of proteasomal cleavage and TAP binding.
Characterizing receptor ligand interactions Morten Nielsen, CBS, Depart of Systems Biology, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Department of Systems Biology Technical University of Denmark Immunological Bioinformatics Processing, combined.
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
MHC Polymorphism. MHC Class I pathway Figure by Eric A.J. Reits.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Class I pathway Prediction of proteasomal cleavage and TAP binidng Morten Nielsen, CBS, BioCentrum, DTU.
Class I pathway Prediction of proteasomal cleavage and TAP binidng Can Keşmir, TBB, Utrecht University, NL & CBS, BioCentrum, DTU.
Ontology development for the Immune Epitope Database Bjoern Peters La Jolla Institute for Allergy and Immunology.
Immunological Bioinformatics Introduction to the immune system.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Technical University of Denmark - DTU Department of systems biology Biopeople Tutorial 2011 Immunological Bioinformatics.
Selection of T Cell Epitopes Using an Integrative Approach Mette Voldby Larsen cand. scient. in biology ph.d. student.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioSys, DTU.
Immunological Bioinformatics Ole Lund. Challenges of the immune system Time Creation of self Creation of an immune system/ Tolerance to self Infection.
Biological sequence analysis and information processing by artificial neural networks.
Psi-Blast Morten Nielsen, CBS, Department of Systems Biology, DTU.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Immunological databases on the web Ole Lund Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Hidden Markov Models, HMM’s Morten Nielsen, CBS, BioSys, DTU.
Epitope Selection Rational Vaccine design. Why? Therapeutic vaccines Therapeutic vaccines Treatment of viral infections (e.g., HIV, HCV), and resistant.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioSys, DTU.
Prediction of CTL responses Mette Voldby Larsen cand. scient. in biology ph.d. student.
Bioinformatics in transplantation immunology PhD defense Malene Erup Larsen October 27th 2010.
Methods MHC class-I T cell epitope prediction for Nef Consensus and ancestral sequences of the Nef protein for the different HIV-1 subtypes were obtained.
Selection of T Cell Epitopes Using an Integrative Approach Mette Voldby Larsen cand. scient. in Biology PhD in Immunological Bioinformatics.
1 Computer-aided Subunit Vaccine Design G.P.S. Raghava, Institute of Microbial Technology, Chandigarh  Understanding immune system  Breaking complex.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Hidden Markov Models, HMM’s Morten Nielsen, CBS, BioSys, DTU.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
Weight matrices, Sequence motifs, information content, and sequence logos Morten Nielsen, CBS, Department of Systems Biology, DTU and Instituto de Investigaciones.
Bioinformatics in Vaccine Design
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Lecture 13 Immunology and disease: parasite antigenic diversity.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Immunological Bioinformatics
Motifs, logos, and Profile HMM’s
Sequence motifs, information content, and sequence logos
Telling self from non-self: Learning the language of the Immune System
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Immunological Bioinformatics
Sequence motifs, information content, logos, and HMM’s
Sequence motifs, information content, and sequence logos
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Presentation transcript:

Immunological Bioinformatics

The Immunological Bioinformatics group Immunological Bioinformatics group, CBS, Technical University of Denmark ( Ole Lund, Group Leader Morten Nielsen, Associate Professor Claus Lundegaard, Associate Professor Jean Vennestrøm, post doc. Thomas Blicher (50%), post doc. Mette Voldby Larsen, PhD student Pernille Haste Andersen, PhD student Sune Frankild, PhD student Sheila Tang, PhD student Thomas Rask (50%), PhD student Nicolas Rapin, PhD student Ilka Hoff, PhD student Jorid Sørli, PhD student Hao Zhang, PhD student MSc students Collaborators IMMI, University of Copenhagen Søren BuusMHC binding Mogens H ClaessonElispot Assay La Jolla Institute of Allergy and Infectious Diseases Allesandro SetteEpitope database Bjoern Peters Leiden University Medical Center Tom OttenhoffTuberculosis Michel Klein Ganymed Ugur SahinGenetic library University of Tubingen Stefan StevanovicMHC ligands INSERM Peter van EndertTap binding University of Mainz Hansjörg SchildProteasome Schafer-Nielsen Claus Schafer-NielsenPeptide synthesis ImmunoGrid Elda RossiSimulation of the Vladimir BrusicImmune system University of Utrectht Can KesmirIdeas

Figure 1-20

Effectiveness of vaccines 1958 start of small pox eradication program

The Immune System The innate immune system The adaptive immune system

The innate immune system Unspecific Antigen independent Immediate response No training/selection hence no memory Pathogen independent (but response might be pathogen type dependent)

The adaptive immune system Pathogen specific –Humoral –Cellular Bacteria Virus Parasite

Adaptive immune response Signal induced –Pathogens Antigens –Epitopes B Cell T Cell

Cartoon by Eric Reits Humoral immunity

Antibody - Antigen interaction Fab Antigen Epitope Paratope Antibody The antibody recognizes structural properties of the surface of the antigen

Cellular Immunity

Anchor positions MHC class I with peptide

HLA specificity clustering A0201 A0101 A6802 B0702

Prediction of HLA binding specificity Historical overview Simple Motifs –Allowed/non allowed amino acids Extended motifs –Amino acid preferences (SYFPEITHI)SYFPEITHI) –Anchor/Preferred/other amino acids Hidden Markov models –Peptide statistics from sequence alignment SVMs and neural networks –Can take sequence correlations into account

SLLPAIVEL YLLPAIVHI TLWVDPYEV GLVPFLVSV KLLEPVLLL LLDVPTAAV LLDVPTAAV LLDVPTAAV LLDVPTAAV VLFRGGPRG MVDGTLLLL YMNGTMSQV MLLSVPLLL SLLGLLVEV ALLPPINIL TLIKIQHTL HLIDYLVTS ILAPPVVKL ALFPQLVIL GILGFVFTL STNRQSGRQ GLDVLTAKV RILGAVAKV QVCERIPTI ILFGHENRV ILMEHIHKL ILDQKINEV SLAGGIIGV LLIENVASL FLLWATAEA SLPDFGISY KKREEAPSL LERPGGNEI ALSNLEVKL ALNELLQHV DLERKVESL FLGENISNF ALSDHHIYL GLSEFTEYL STAPPAHGV PLDGEYFTL GVLVGVALI RTLDKVLEV HLSTAFARV RLDSYVRSL YMNGTMSQV GILGFVFTL ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CLGGLLTMV FIAGNSAYE KLGEFYNQM KLVALGINA DLMGYIPLV RLVTLKDIV MLLAVLYCL AAGIGILTV YLEPGPVTA LLDGTATLR ITDQVPFSV KTWGQYWQV TITDQVPFS AFHHVAREL YLNKIQNSL MMRKLAILS AIMDKNIIL IMDKNIILK SMVGNWAKV SLLAPGAKQ KIFGSLAFL ELVSEFSRM KLTPLCVTL VLYRYGSFS YIGEVLVSV CINGVCWTV VMNILLQYV ILTVILGVL KVLEYVIKV FLWGPRALV GLSRYVARL FLLTRILTI HLGNVKYLV GIAGGLALL GLQDCTMLV TGAPVTYST VIYQYMDDL VLPDVFIRC VLPDVFIRC AVGIGIAVV LVVLGLLAV ALGLGLLPV GIGIGVLAA GAGIGVAVL IAGIGILAI LIVIGILIL LAGIGLIAA VDGIGILTI GAGIGVLTA AAGIGIIQI QAGIGILLA KARDPHSGH KACDPHSGH ACDPHSGHF SLYNTVATL RGPGRAFVT NLVPMVATV GLHCYEQLV PLKQHFQIV AVFDRKSDA LLDFVRFMG VLVKSPNHV GLAPPQHLI LLGRNSFEV PLTFGWCYK VLEWRFDSR TLNAWVKVV GLCTLVAML FIDSYICQV IISAVVGIL VMAGVGSPY LLWTLVVLL SVRDRLARL LLMDCSGSI CLTSTVQLV VLHDDLLEA LMWITQCFL SLLMWITQC QLSLLMWIT LLGATCMFV RLTRFLSRV YMDGTMSQV FLTPKKLQC ISNDVCAQV VKTDGNPPE SVYDFFVWL FLYGALLLA VLFSSDFRI LMWAKIGPV SLLLELEEV SLSRFSWGA YTAFTIPSI RLMKQDFSV RLPRIFCSC FLWGPRAYA RLLQETELV SLFEGIDFY SLDQSVVEL RLNMFTPYI NMFTPYIGV LMIIPLINV TLFIGSHVV SLVIVTTFV VLQWASLAV ILAKFLHWL STAPPHVNV LLLLTVLTV VVLGVVFGI ILHNGAYSL MIMVKCWMI MLGTHTMEV MLGTHTMEV SLADTNSLA LLWAARPRL GVALQTMKQ GLYDGMEHL KMVELVHFL YLQLVFGIE MLMAQEALA LMAQEALAF VYDGREHTV YLSGANLNL RMFPNAPYL EAAGIGILT TLDSQVMSL STPPPGTRV KVAELVHFL IMIGVLVGV ALCRWGLLL LLFAGVQCQ VLLCESTAV YLSTAFARV YLLEMLWRL SLDDYNHLV RTLDKVLEV GLPVEYLQV KLIANNTRV FIYAGSLSA KLVANNTRL FLDEFMEGV ALQPGTALL VLDGLDVLL SLYSFPEPE ALYVDSLFF SLLQHLIGL ELTLGEFLK MINAYLDKL AAGIGILTV FLPSDFFPS SVRDRLARL SLREWLLRI LLSAWILTA AAGIGILTV AVPDEIPPL FAYDGKDYI AAGIGILTV FLPSDFFPS AAGIGILTV FLPSDFFPS AAGIGILTV FLWGPRALV ETVSEQSNV ITLWQRPLV Sequence information

Score sequences to weight matrix by looking up and adding L values from the matrix A R N D C Q E G H I L K M F P S T W Y V Scoring a sequence to a weight matrix RLLDDTPEV GLLGNVSTV ALAKAAAAL Which peptide is most likely to bind? Which peptide second? nM 23nM 309nM

Example from real life 10 peptides from MHCpep database Bind to the MHC complex Relevant for immune system recognition Estimate sequence motif and weight matrix Evaluate motif “correctness” on 528 peptides lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV

Prediction accuracy Pearson correlation 0.45 Prediction score Measured affinity

Predictive performance

Higher order sequence correlations Neural networks can learn higher order correlations! –What does this mean? S S => 0 L S => 1 S L => 1 L L => 0 No linear function can learn this (XOR) pattern Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind?

313 binding peptides313 random peptides Mutual information

Sequence encoding (continued) Sparse encoding V: L: V. L=0 (unrelated) Blosum encoding V: L: R: V. L = 0.88 (highly related) V. R = (close to unrelated)

Evaluation of prediction accuracy

Network ensembles No one single network with a particular architecture and sequence encoding scheme, will constantly perform the best Also for Neural network predictions will enlightened despotism fail –For some peptides, BLOSUM encoding with a four neuron hidden layer can best predict the peptide/MHC binding, for other peptides a sparse encoded network with zero hidden neurons performs the best –Wisdom of the Crowd Never use just one neural network Use Network ensembles

Evaluation of prediction accuracy ENS : Ensemble of neural networks trained using sparse, Blosum, and hidden Markov model sequence encoding

NetMHC-3.0 update IEDB + more proprietary data Higher accuracy for existing ANNs More Human alleles Non human alleles (Mice + Primates) Prediction of 8mer binding peptides for some alleles Prediction of 10- and 11mer peptides for all alleles Outputs to spread sheet

Prediction of 10- and 11mers using 9mer prediction tools Approach: For each peptide of length L create 6 pseudo peptides deleting a sliding window of L- 9 always keeping pos. 1,2,3, and 9 Example: MLPQWESNTL =MLPWESNTL MLPQESNTL MLPQWSNTL MLPQWENTL MLPQWESTL MLPQWESNL

Prediction of 10- and 11mers using 9mer prediction tools

Final prediction = average of the 6 log scores: ( )/6 = Affinity: Exp(log(50000)*( ))= nM

Prediction using ANN trained on 10mer peptides

Prediction of 10- and 11mers using 9mer prediction tools

Cellular Immunity

Proteasome specificity Low polymorphism –Constitutive & Immuno- proteasome Evolutionary conserved Stochastic and low specificity –Only 70-80% of the cleavage sites are reproduced in repeated experiments

Proteasome specificity NetChop is one of the best available cleavage method –

Predicting TAP affinity 9 meric peptides>9 meric Peters et el., JI, 171: ILRGTSFVYV = -0.74

Integrating all three steps (protesaomal cleavage, TAP transport and MHC binding) should lead to improved identification of peptides capable of eliciting CTL responses Integration?

Identifying CTL epitopes 1 EBN3_EBV YQAYSSWMY EBN3_EBV QSDETATSH EBN3_EBV PVSPAVNQY EBN3_EBV AYSSWMYSY EBN3_EBV LAAGWPMGY EBN3_EBV IVQSCNPRY EBN3_EBV FLQRTDLSY EBN3_EBV YTDHQTTPT EBN3_EBV GTDVVQHQL HLA affinityProteasomal cleavageTAP affinity

Large scale method validation HIV A3 epitope predictions

Case I: SARS Sylvester-Hvid et al, Tissue Antigens. 2004

Sars virus HLA ligands 75% of predicted peptides were binding with an IC 50 <500 nM

Case II: Discovery of conserved Class I epitopes in Human Influenza Virus H1N1 Wang et al., Vaccine 2007

Pox Strategy

Influenza We selected the Influenza peptides with the top 15 combined scores with conservation p9 > 70% for each pf the 12 supertypes. 180 peptides selected 167 tested for binding and CTL response 89 (53%) of the influenza peptides tested have an affinity better than 500nM

Donors 35 normal healthy blood donors years old Expected to have had influenza more than 3 times HLA typed by SBT for HLA A and B

ELISPOT assay Measure number of white blood cells that in vitro produce interferon-  in response to a peptide A positive result means that the immune system have earlier reacted to the peptide (during a response of a vaccine/natural infection) FLDVMESM Two spots FLDVMESM

Peptides positive in ELISPOT assay Protein Supertype Affinity PASACombCons_9 Polymerase PB1A1 nt Nucleoprotein NPA1 nt Polymerase PB1A Polymerase PB1A Polymerase PB1B Nucleoprotein NPB Matrix protein M1B39 nt Nucleoprotein NPB Polymerase PB1B Polymerase PB1B Polymerase PB1B Polymerase PAB8 nb Protein: Common name for protein Supertype: HLA supertype that the peptide is predicted to bind to Affinity:Measured affinity (kD in nM) PA:Predicted affinity SA:Scaled affinity Comb:Combined score calculated as in Larsen et al., 2005 cons_9:Fraction of clusters (with more than 98% sequence identity) that contain the 9mer nb: non binder nt: not tested

Peptides positive in ELISPOT assay

Conservation of epitopes Number of 9mers 100% conserved: 10/12 conserved in Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) 11/12 conserved in Influenza A virus (A/chicken/Jilin/9/2004(H5N1))

EpiSelect Genotype 1 Top Scoring Peptides Genotype 2 Genotype 3 Genotype 4 Genotype 5 Genotype 6 Select peptide with maximal coverage Select peptide with maximal coverage preferring uncovered strains Select peptide with maximal coverage preferring lowest covered strains Repeat until the desired number of peptides is selected

HCV Results - B7 Genotype 1 Genotype 2 Genotype 3 Genotype 4 Genotype 5 Genotype 6 QPRGRRQPI Peptide Predicted affinity (nM) 5 SPRGSRPSW43 GenomeCoverage 5 4 DPRRRSRNL * 366 RARAVRAKL Peptides 63 TPAETTVRL * * Verified B7 supertype restricted CD8+ epitope in the Los Alamos HCV epitope database

Ongoing work Selection of epitopes covering host (HLA) and pathogen variability Selection of diagnostic peptides in TB Predict cross reactivity (T and B cell) –Applications in epitope prediction, autoimmune diseases, transplantation Virulence factor discovery by comparative genomics Function-antigenecity studies Bioinformatics immune system simulation