Presentation is loading. Please wait.

Presentation is loading. Please wait.

Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc.

Similar presentations


Presentation on theme: "Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc."— Presentation transcript:

1 Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc Faculty of Biology, University of Natural Sciences, VNU-HCMC, Vietnam Sixth International Conference on Bioinformatics InCoB2007

2 Epitope prediction “Epitope is the portion of an antigen that is recognized by the antigen receptor on lymphocytes” Molecular Biology Epitope prediction: Computers aid to develop epitope-based vaccines against various human pathogens for which no vaccines currently exist http://www.scripps.edu/newsandviews/e_20050228/hiv.html

3 T-cell epitope prediction T-cell epitopes are a subset of MHC binding peptides  prediction of the peptides binding to MHC is essential for design of peptide-based vaccines HLA-A0201 Sequence Binding motifs Quantitative matrices Decision tree Artificial neural networks Hidden Markov models Support vector machines Molecular Biology

4 HMMs & SVMs HMMs (Hidden Markov Models) Statistical model that can capture complex relationships in data sets. SVMs (Support Vector Machines): Learning machine that can find the optimal separating hyperplane.

5 Epitope prediction for dengue virus Tropical disease Dengue fever Dengue hemorraghic fever Dengue shock syndrome Hypothesis of pathogenesis Antibody – dependent enhancement Virus virulence No dengue vaccine is available In our research:. Develop procedure for building automatically T-cell epitope predicting models. Find candidates in silico for making multivalent vaccines on 4 types of Dengue virus

6 Building models for predicting T-cell epitopes & applying these models on dengue virus

7 Building effective prediction models? The predicting ability of HMM and SVM models depends on: Experimentally peptides binding to MHC molecules Partition of the peptides into training set and testing set Encoding method  A system finds easily and quickly the best prediction model when type of MHC molecules and quantity of binding peptides are changed

8 Processing MHC-binding experimental peptides

9 Create training and testing sets

10 Training & testing procedure HMMs (HMMer)SVMs (SVM_light)

11 Experiment 1 MethodHMMsSVMs DatabasesMHCBN, MHCPEP Homology7- amino acid No. homologous groupsbinding seq.: 11, non-binding seq.: 3 Kind of peptideBinding Non- binding Binding Non- binding No. peptides Training set6232520 Testing set803067830 Training times200 ParametersE-value = 0 ÷ 10 Linear kernel, c = 0 Encoding: binary, Blosum-62, physical-chemical method

12 Result of the training by HMMs HMM.7.136: A ROC =0.914 Choose parameter from HMM.7.136: At point: E=3.4, S=-8.5, SE=0.91, SP= 0.86, A ROC =0.885

13 Result of the training by SVMs Binary encoding: A ROC =0.42÷0.77 Blosum-62 encoding: A ROC = 0.47÷0.87 Chemical-physical encoding: A ROC = 0.41÷0.71 At blosum-62 encoding, data set SVM.7.blo62.46: SE=0.83, SP=0.90, A ROC =0.87

14 Experiment 2 MethodHMMsSVMs DatabasesMHCBN, MHCPEP, IEDB Homology7- amino acid, 6-amino acid, 5-amino acid Training times200100 ParametersE-value = 40 ÷ 80 Linear kernel, c = 0 Encoding: binary, Blosum-62, Binary - Blosum-62 method

15 Result of the training by HMMs Homology5-amino acid6-amino acid7-amino acid Kind of peptideBinding No. homologous group8213984 No. Sequences in homologous groups 1232551374 Total peptides Training set118911651188 Testing set632656633 A ROC 0.832÷0.8770.835÷0.8830.828÷0.876 The best HMM profileHMM.6.78

16 Training in 6-amino acid homologous groups Parameters of HMM.6.78: At point: E=42, S=-9.2, SE=0.91, SP= 0.84, A ROC =0.875 HMM.6.78: A ROC =0.883

17 Result of the training by SVMs methods Homology5-amino acid6-amino acid7-amino acid Kind of peptide Binding Non- binding Binding Non- binding Binding Non- binding Total homologous group82176139458421 Sequence in homologous groups 123254055111637460 Total sequences Training set118912821165136511881367 Testing set632557656474633472 A ROC Binary encoding (1) 0.847÷0.8840.845÷0.8800.838÷0.882 Blosum-62 encoding (2) 0.843÷0.8840.846 ÷0.8830.838÷0.894 Binary-Blosum- 62 encoding (3) 0.849÷0.8790.847 ÷0.8890.850÷0.891 Chosen set SVM.blo62.7.8 5

18 Training in 7-amino acid homologous groups At SVM.2.7.85: SE=0.93, SP=0.86, A ROC =0.894 : Binary encoding : Blosum-62 encoding : Binary-Blosum-62 encoding

19 Epitope predicting procedure for dengue virus 1. Do multiple sequence alignment 2. Extract consensus sequences more than or equal 9 amino acids 3. Create 9-mer overlap sequences 4. Predict peptides binding to MHC by HMMs profile or SVMs model

20 Experiment 1 Proteins (1,2,3,4)Epitope sequencesMethods 537NS3, 536NS3, 2010DV3_gp1, 536NS3 LMRRGDLPVWL HMMs, SVMs 763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs, SVMs 358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI SVMs 658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV SVMs 472NS5, 473NS5, 223NS5, 473NS5 AIWYMWLGA SVMs 101E, 99E, 99glycoprot, 99E RGWGNGCGL SVMs 194NS1, 194NS1, 193NS1, 194NS1 VHADMGYWI SVMs 352NS5, 353NS5, 103NS5, 353NS5 RVFKEKVDT SVMs 13NS1, 13NS1, 12NS1, 13NS1 LKCGSGIFV SVMs 26NS1, 26NS1, 25NS1, 26NS1 HTWTEQYKF SVMs 230NS1, 230NS1, 229NS1, 230NS1 TLWSNGVLES SVMs 327NS1, 327NS1, 326NS1, 327NS1 DGCWYGMEIRP SVMs 148NS3, 148NS3, 142Pep_S7, 148NS3 GLYGNGVVT SVMs 256NS3, 255NS3, 67DEXHc, 255NS3 EIVDLMCHA SVMs 297NS3, 296NS3, 108DEXHc, 296NS3 ARGYISTRV SVMs 410NS3, 409NS3, 54HELICc, 409NS3 DISEMGANF SVMs 36NS4B, 35NS4B, 35NS4B, 32NS4B ASAWTLYAV SVMs 118NS4B, 117NS4B, 117NS4B, 114NS4B HYAIIGPGLQA SVMs 142NS4B, 141NS4B, 141NS4B, 138NS4B IMKNPTVDGI SVMs 224NS4B, 223NS4B, 223NS4B, 220NS4B NIFRGSYLAGA SVMs 81NS5, 81NS5, 27FtsJ, 81NS5 GCGRGGWSY SVMs 529NS5, 530NS5, 280NS5, 530NS5 MYADDTAGW SVMs 602NS5, 603NS5, 353NS5, 603NS5 QVGTYGLNT SVMs 606NS5, 607NS5, 357NS5, 607NS5 YGLNTFTNM SVMs 682NS5, 683NS5, 434NS5, 684NS5 DMGKVRKDI SVMs 745NS5, 746NS5, 497NS5, 747NS5 WSLRETACLG SVMs 788NS5, 789NS5, 540NS5, 790NS5 PTSRTTWSI SVMs Proteins (1,2,3,4)Epitope sequencesMethods 537NS3, 536NS3, 2010DV3_gp1, 536NS5 LMRRGDLPV HMMs 763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs 358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI HMMs 658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV HMMs 469NS5, 470NS5, 220NS5, 470NS5 GSRAIWYMWLGAR HMMs 103E, 101E, 101DV3_gp1, 101E WGNGCGLFG SVMs 193NS1, 193NS1, 192NS1, 193NS1 AVHADMGYWIES SVMs 348NS5, 349NS5, 99NS5, 349NS5 FGQQRVFKE SVMs 568NS5, 569NS5, 319NS5, 569NS5 FKLTYQNKV HMMs Experiment 2 Result of epitope prediction (peptide binding to HLA- A0201 prediction): Join overlap 9-amino acid peptides predicted binding to HLA-A0201 molecules

21 Result of prediction HMMs profile is stable and increase ability of prediction when there are additional data sets. SVMs model is good but ability of prediction decreases when amount of training data increases.

22 http://www.biology.hcmuns.edu.vn/epitope

23 Conclusion Successfully building system for training Hidden Markov models and Support Vector Machines Generating training and testing data based on separating data set into homologous groups give us good result. Could predict consensus epitope for 4 types of Dengue virus based on data of peptides binding to HLA-A0201

24 Future plans Set other kernels on SVMs method Survey other encoding method for sequences having flexible length Survey other methods for classifying MHC data to homologous groups Automate procedure collecting and updating data of peptide binding MHC from databases

25 Thank you very much! Thank you very much!


Download ppt "Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc."

Similar presentations


Ads by Google