Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 of 38 T-cell EPITOPES PREDICTION OF HEMAGGLUTININ, NEURAMINIDASE AND MATRIX PROTEIN OF INFLUENZA A VIRUS USING SUPPORT VECTOR MACHINE AND HIDDEN.

Similar presentations


Presentation on theme: "Slide 1 of 38 T-cell EPITOPES PREDICTION OF HEMAGGLUTININ, NEURAMINIDASE AND MATRIX PROTEIN OF INFLUENZA A VIRUS USING SUPPORT VECTOR MACHINE AND HIDDEN."— Presentation transcript:

1 Slide 1 of 38 T-cell EPITOPES PREDICTION OF HEMAGGLUTININ, NEURAMINIDASE AND MATRIX PROTEIN OF INFLUENZA A VIRUS USING SUPPORT VECTOR MACHINE AND HIDDEN MARKOV MODEL Vo Cam Quy, Nguyen Thanh Khoi, Nguyen Thi Truc Minh, Tran Linh Thuoc Department of Biotechnology University of Natural Sciences Vietnam National University – HoChiMinh city, VietNam Sixth International Conference on Bioinformatics InCob2007, HongKong

2 Slide 2 of 38OUTLINE  Introduction Epitope prediction methods Influenza A virus  Materials And Methods  Results And Discussion  Conclusion and future work

3 Slide 3 of 38 Epitope in silico Analysis Gene/Protein Sequence Database Disease related protein DB Candidate Epitope DB VACCINOME Peptide Multiepitope vaccines Epitope prediction

4 Slide 4 of 38Epitope An epitope is the part of a macromolecule that is recognized by the immune system, specifically by antibodies, B cells, or T cells. Most referred as three-dimensional surface features of an antigen molecule linear epitopes are determined by the amino acid sequence

5 Slide 5 of 38 EPITOPE PREDICTION STRATEGIES Epitope prediction B cell epitope predictionT cell epitope prediction structure chemical features Sequence Structure Binding motifs, matrices Statitical methodMachine learning method Hidden Markov Model Flexible model Support Vector Machine, Artifical Neural Network… High accuracy Quantitative Matrices

6 Slide 6 of 38 Tcell epitope prediction approach T cell epitope prediction Direct approachIndirect approach Negative: non-epitope Postive: Putative epitope Postive: MHC binding peptides (binder) Negative: MHC-I non- binding peptides (non- binder) Epitope Candidates Compar e

7 Slide 7 of 38 Influenza A virus Influenza A viruses continue to emerge from the aquatic avian reservoir and cause pandemics Many variances and mutations in the population  difficult for vaccine producing http://www.roche.com/pages/ facets/10/viruse.htm Genome: Consists of s/s (-) sense RNA in 8 segments Hemagglutinin, neuraminidase, matrix protein are 3 of proteins concerned much. Red:M2 protein Green: hemagglutinin Blue: euraminidase Inside: viral RNA

8 Slide 8 of 38 OBJECTIVE Building HMM and SVM models for T cell epitope prediction (MHC class I and II) Direct approach (epitope prediction) Indirect approach (MHC binder prediction)  combining the results to get epitope candidates Epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico

9 Slide 9 of 38 METHODS AntiJenMHCBNIEDB Data collection Raw data Training set Processing Training models Evaluating Optimal model EPITOPES epitopes predicted by both methods / both approachs were considered as epitopes Predict Protein 1 DATA COLLECTION AND PROCESSING 2 BUILDING MODEL 3 PARAMETERS OPTIMIZATION 4 APPLYING SVM methodHMM method

10 Slide 10 of 38 RESULTS OF DATA COLLECTION AND PROCESSING Alen MHC class IndirectDirect Positive data set (binder) Negative data set (non-binder) Positive data set (epitope) Negative data set (non-epitope) I H-2-Db 452335160344 H-2-Kb 446413219465 H-2-Kd 1707420891 II H-2-IAd 411143179195 H-2-IEd 2744119985 H-2-IEk 3262816696 Allele Peptid e type  24 data sets

11 Slide 11 of 38 METHODS AntiJenMHCBNIEDB Data collection Raw data Training set Processing Training models Evaluating Optimal model EPITOPES epitopes predicted by both methods were considered as epitopes Predict Protein 1 DATA COLLECTION AND PROCESSING 2 BUILDING MODEL 3 PARAMETERS OPTIMIZATION 4 APPLYING SVM methodHMM method

12 Slide 12 of 38 Step 2: BUIDLING MODEL – HMM method Positive training set ClustalW Perl script modelfromalign Initial model  Result: 11 matrices x 6 allele x 2 approaches = 132 initial models

13 Slide 13 of 38 Step 2: BUIDLING MODEL – SVM method Motif 9mer (binding core) MHC class II binder/epitope data processing non-binder/non-epitope data processing Sequence is cut into overlaps 8mer/9mer Choosing peptide conforming reported motif Motif information from SYFPEITHI database MHC class I binder/epitope data processing (script perl) Negative data Positive data

14 Slide 14 of 38 METHODS AntiJenMHCBNIEDB Data collection Raw data Training set Processing Training models Evaluating Optimal model EPITOPES epitopes predicted by both methods were considered as epitopes Predict Protein 1 DATA COLLECTION AND PROCESSING 2 BUILDING MODEL 3 PARAMETERS OPTIMIZATION 4 APPLYING SVM methodHMM method

15 Slide 15 of 38 STEP 3: PARAMETERS OPTIMIZATION HMM METHOD

16 COUPLE OF MODELS 12 Positive data set 132 Initial models Positive model buildmodel (Baum-Welch or Viterbi) 12 Negative data set Negative model buildmodel (Baum-Welch or Viterbi) TRAINING PRINCIPLE

17 Training set Test set ROC analysis + - Training Initial model (positive) Couple 1 Acc. 6 10-FOLD CROSS VALIDATION 12345 Acc. 1 6789 10 Positive and negative data sets Acc. 2 Acc. 3 Acc. 4 Acc. 5 Acc. 7 Acc. 8 Acc. 9 Acc. 10 Average accuracy

18 NLL CALCULATING PRINCIPLE Negative model Positive model PPVPVSKVVSTDEYVAR Queried sequence hmmscore (Viterbi) hmmscore (Viterbi) NLL 1 final NLL Compare Epitope ? NLL 2 threshold NLL Non-epitope final NLL  threshold NLL final NLL  threshold NLL NLL 1 – NLL 2

19 ROC (Receiver Operating Curve) Analysis A ROC > 90%: excellent prediction A ROC > 80%: good prediction A ROC < 80%: not acceptable prediction

20 RESULTS OF VALIDATION The validation result of 22 couples of models trained by Baum-Welch and Viterbi algorithm in indirect approach for H-2-Db allele

21 NameApproachAlgorithmMatrixAccuracy (%) Db_GBA90IndirectBaum-WelchPAM 90 85,30 Db_TBL75DirectBaum-WelchBLOSUM 75 86,00 Kb_GBL70IndirectBaum-WelchBLOSUM 62 79,80 Kb_TBL70DirectBaum-WelchBLOSUM 70 84,54 Kd_GBA50IndirectBaum-WelchPAM 50 83,55 Kd_TBL85DirectBaum-WelchBLOSUM 85 84,72 IAd_GBP70IndirectBaum-WelchPAM 70 77,41 IAd_TBA90DirectBaum-WelchPAM 90 77,84 IEd_GVL75IndirectViterbiBLOSUM 75 92,77 IEd_TBA70DirectBaum-WelchPAM 70 93,90 IEk_GVL70IndirectViterbiBLOSUM 70 95,11 IEk_TVL75DirectViterbiBLOSUM 75 69,52 OPTIMAL PARAMETERS

22 Slide 22 of 38 STEP 3: PARAMETERS OPTIMIZATION SVM METHOD

23 LOOCV (LEAVE-ONE-OUT-CROSS- VALIDATION) Removing one peptide from the training data The model was built by remaining data Testing was done on the removed peptide Training set

24 THE ACCURACY (MHC class I MODELS) Accuracy Direct method Indirect method MHC allele comparing the accuracies of predictive models between direct and indirect method after carrying out LOOCV procedure (mhc class I)

25 THE ACCURACY (MHC class II MODELS) Direct method Indirect method Accuracy MHC allele

26 OPTIMAL PARAMETERS (MHC CLASS I) MHC Allele Kernel functions and parameters Direct methodIndirect method H-2-Db Selected kernel function Linear functionRBF function Optimal paremeters -t 0 -c 0.1111 -t 2 -c 1 –g 0.145 H-2-Kd Selected kernel function Polynimial function Optimal paremeters -t 1 -c 0.1 -d 3 -s 0.2 -r 2 -t 1 -c 0.001 -d 3 -s 2.5 -r 8 H-2-Kb Selected kernel function Linear functionRBF function Optimal paremeters -c 1.4 -t 2 -c 1 -g 0.115 Kernel functions: - Linear function - Polynimial function - RBF function - Sigmoid function

27 OPTIMAL PARAMETERS (MHC CLASS II) MHC Allele Kernel functions and parameters Direct methodIndirect method H-2-Db Selected kernel function Linear function Optimal paremeters -t 0 -c 0.15 -t 0 -c 0.53 H-2-Kd Selected kernel function Linear function Optimal paremeters -t 0 -c 0.19 -t 0 -c 0.27 H-2-Kb Selected kernel function Linear function Optimal paremeters -t 0 -c 1.4 -t 2 -c 1 -g 0.115 Kernel functions: - Linear function - Polynimial function - RBF function - Sigmoid function

28 Slide 28 of 38 METHODS AntiJenMHCBNIEDB Data collection Raw data Training set Processing Training models Evaluating Optimal model EPITOPES epitopes predicted by both methods were considered as epitopes Predict Protein 1 DATA COLLECTION AND PROCESSING 2 BUILDING MODEL 3 PARAMETERS OPTIMIZATION 4 APPLYING SVM methodHMM method

29 EPITOPE PREDICTION RESULTS – SVM METHOD MHC class IMHC class II H-2- DBH-2KbH-2KdH-2-IAdH-2-IEdH-2-IEk HAHA MHC binder3341565101235571982458 Putative epitope17565618129736757872285 Epitope candidate 268984694938469225 NANA MHC binder26191169425951076236 Putative epitope1109383977425363391555 Epitope candidate 192560309791213123 M MHC binder24954925613038 Putative epitope10431810625865130 Epitope candidate 136517794421

30 Slide 30 of 38 EPITOPE PREDICTION RESULTS – HMM METHOD Protein Indirect method Direct method Compared results HA1138667522960 NA665856342171 Matrix929361189

31 Slide 31 of 38 Total amount of epitopes in Influenza A virus HANAM H-2DB 15120 H-2Kd 56141 Table 7: The number of epitopes in both HMM - SVM method protei n Allel e

32 MHC alleleSequence descriptionStartStopEpitope sequence No. of epitopes H-2-Kd >Q67157|M1_IAAIC Matrix protein1-Influenza A virus (strain A/Aichi/2/1968 H3N2) 99107 YRKLKREIT 3 129137 LIYNRMGAV 131139 YNRMGAVTT H-2-Kb >P03445|HEMA_IADM1 Hemagglutinin[Contains: Hemagglutinin HA1 chain] (Fragment)-Influenza A virus(strain A/Duck/Memphis/546/1976 (H11N9) 1018 IICIRADE 8 2129 GYLSNNST 4452 SVELVENE 5866 SIDGKAPI 6977 DCSFAGWI 7482 GWILGNPM 9098 SWSYIVEN 92100 SYIVENQS EPITOPE PREDICTION RESULTS – EXAMPLES

33 WEB PREDICTION TOOL FOR HMM METHOD

34 Positive results Negative results Number of positive sequences Number of negative sequences WEB PREDICTION TOOL FOR HMM METHOD (cont)

35 Slide 35 of 38 CONCLUSIONS  SVM method: the model accuracy  Indirect method is better  MHC class I: H-2-Db (86.58%), H-2-Kb (80.25% ) and H-2-Kd (83.45%)  MHC class II: H-2-IEd (93.26%), H-2-IEk (95.19%), H-2-IAd (89.42%)  HMM method: the model accuracy  dicrect method is better  MHC class I: H-2-Db (86%), H-2-Kb (84.54% ) and H-2-Kd (84.72%)  MHC class II: H-2-IEd (93.90%), H-2-IEk (95.11%), H-2-IAd (77.84%)

36 Slide 36 of 38 CONCLUSIONS Built HMM and SVM models for T cell epitope prediction (MHC class I and II) Direct approach (epitope prediction) Indirect approach (MHC binder prediction) with a high accuracy Applying successfully these model for epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico

37 Slide 37 of 38 FUTURE WORKS Applying this tool to other proteins Will run any programs by web. B cell epitope prediction Test result by biological experiment …

38 Slide 38 of 38 THANK YOU FOR YOUR ATTENTION


Download ppt "Slide 1 of 38 T-cell EPITOPES PREDICTION OF HEMAGGLUTININ, NEURAMINIDASE AND MATRIX PROTEIN OF INFLUENZA A VIRUS USING SUPPORT VECTOR MACHINE AND HIDDEN."

Similar presentations


Ads by Google