Peptide Binding motif Height of a column equal to I Relative height of a letter is p High information positions HLA-A0201
NetMHC-3.2 www.cbs.dtu.dk/services/NetMHC-3.2 79 ANN’s Covering Human, Primate and Mouse MHC Predictions can be made for 8-11 mer petides
The IMGT/HLA Sequence Database currently encompass more than 1500 HLA class I proteins Source: http://www.anthonynolan.com/HIG/index.html HLA polymorphism
< 70 HLA alleles are characterized by binding data Reliable MHC class I binding predictions (NetMHC-3.2) for 57 HLA A and B molecules No methods for HLA-C, and HLA-E Long way to over 1500!
More MHC molecules: more diversity in the presented peptides 1% probability that MHC molecule presents a peptide Different hosts sample different peptides from same pathogen.
HLA polymorphism Few human beings will share the same set of HLA alleles –Different persons will react to a pathogen infection in a non-similar manner A CTL based vaccine must include epitopes specific for each HLA allele in a population –A CTL based vaccine must consist of ~1000 HLA class I epitopes
Data Alleles characterized with 5 or more data points 3% covered
Supertypes. What are they good for? Alleles with in supertypes present the same set of peptides! Is this really so? –Less that 50% of A6802 binders will bind to A0201! –Less than 33% of A0201 binders will bind to A6802!
Pan-specific method The contact residues are defined as being within 4.0 Å of the peptide in any of a representative set of HLA-A and -B structures with nonamer peptides. Only polymorphic residues from A, B, and C alleles are included Pseudo-sequence consisting of 34 amino acid residues. Include polymorphic residues in potential contact with the bound peptide
HLA class II polymorphism More than 2000 HLA class II allele combinations –HLA-DR –HLA-DQ –HLA-DP Only data for 14 of the more than 500 known HLA-DR allele (< 3%) No data for HLA-DQ and HLA-DP
Class II MHC binding MHC class II binds peptides in the class II antigen presentation pathway Binds peptides of length 9-18 (even whole proteins can bind!) Binding cleft is open Binding core is 9 aa
MHC pseudo sequence The contact residues are defined as being within 4.0 Å of the peptide in any of a representative set of HLA-DR, -DQ, and DP structures with peptides. Only polymorphic residues are included Pseudo-sequence consisting of 25 amino acid residues. Include polymorphic residues in potential contact with the bound peptide
Epitope based vaccines and diagnostics Challenges Identify epitopes in pathogen genome A small viral genome contains >> 1000 potential CTL epitopes HLA diversity No two humans will induce the same reaction to a pathogen infection Viral escape No two viral strains will “host” the same set of T cell epitopes
Rational epitope selection We have more than 2000 MHC molecules We have more than 500 different pathogenic strains How to design a method to select a small pool of peptides that will cover both the MHC polymorphism and the pathogen diversity? –No peptide will bind to all MHC molecules and few (maybe even no) peptides will be present in all pathogenic strains
Polyvalent vaccines The equivalent of this in epitope based vaccines is to select epitopes in a way so that they together cover all strains. Strain 1 Strain 2 Strain 1 Strain 2 Epitope Uneven coverage, Average coverage = 2 Even coverage, Average coverage = 2
All HIV responsive patients respond to at least one of nine peptides Perez et al., JI, 2008
PopCover - Searching in two dimensions. HIV class II case story Data –396 full length genomes with annotated tat, nef, gag and pol proteins covering A(50), B(104),C(156), D(40) and AE(46) strains HLA-DR frequencies taken from –43 (allele frequency in at least one population > 2.5%) HLA class II alleles 36 HLA-DRB1, HLA-DR3,4,5, and 4 HLA-DQ alleles Select predicted peptide binders –5608(tat), 20961(nef), 31848(gag),42748(pol) Select peptides from each protein with optimal genomic and HLA coverage –tat(4), nef(15), gag(15) and pol(15)
EpiSelect and PoPCover EpiSelect The sum is over all genomes i. P j i is 1 if epitope j is present in genome i. C i is the number of times genome i has been targeted in the already selected set of epitopes PopCover The sum is over all genomes i and HLA alleles k. R j ki is 1 if epitope j is present in genome i and is presented by allele k, and E ki is the number of times allele k has been targets by epitopes in genome i by the already selected set of epitopes, and g i is the genomes frequency
Benchmark Create 10,000 virtual patients with a given HIV genomic sequence and HLA alleles as defined by the HLA allele frequencies and HIV genomic data Test how many of these patients that are targets by at least on of the selected peptides
HIV patient coverage Selected peptide pools –tat(4), nef(15), gag(15) and pol(15)
So, we can find the needle in the haystack Given a protein sequence and an HLA molecule, we can accurately predict with peptides will bind (70-95%) 15-80% of these will in turn be epitopes
Conclusions Rational epitope discovery is feasible –Prediction methods are an important guide for epitope identification –Given a protein sequence and an HLA molecule, we can predict the peptide binders (find the needle in the haystack) Pan-specific MHC prediction method can deal with the immense MHC polymorphism Epitope selection strategies can deal with pathogen diversity For large pathogens, we still have no handle on how to select immunogenic proteins (we cannot find the haystack)
Acknowledgements Immunological Bioinformatics group, CBS, DTU –Ole Lund - Group leader –Claus Lundegaard - Data bases, HLA binding predictions Collaborators –IMMI, University of Copenhagen Søren Buus: MHC binding –La Jolla Institute of Allergy and Infectious Diseases A. Sette, B. Peters: Epitope database and many, many more www.cbs.dtu.dk/services