Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics.

Similar presentations


Presentation on theme: "Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics."— Presentation transcript:

1 Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics & Computer Science Central State University Wilberforce, Ohio Yaohang Li Computer Science Department Old Dominion University Norfolk, Virginia BIOT 2013: Biotechnology and Bioinformatics Symposium

2 Contents  Introduction  Research Objective  Background  Method  Protein data sets  Context-based features  Neural Network model  Results  Summary 2

3 Introduction 3  The solvent-accessible surface area, or accessibility, of a residue is the surface area of the residue that is exposed to solvent.  The residue accessibility is a useful indicator to the residue's location, on the surface or in the core Surface area of a protein segment

4 Introduction-cont. 4  DSSP program calculates the absolute solvent accessibility values of proteins  Relative values are calculated as the ratio between the absolute solvent accessibility value and that in an extended tripeptide (Ala-X-Ala) conformation  To allow comparisons between the accessibility of the different amino acids in proteins  A threshold of 0.25 to define 2-state (exposed if >0.25, buried otherwise)

5 Prediction effectiveness 5  Residue solvent accessibility plays an important role in folding and enhancing proteins’ thermodynamic and mechanical stability  The burial of residues at core (hydrophobic residues) is a major driving force for folding  Active sites of proteins are located on its surface.  Reduce the conformational space to aid modeling protein structures in three dimensions  Help predict important protein functions

6 Predicting Structural Features in Protein Modeling 6 Protein Modeling  Correctly predicting structural features is a critical step stone to obtain correct 3D models Sequence 3D intermediate prediction steps

7 Protein Structural Features 7 Protein 1BOO Chain A Secondary Structure: General 3D form of local segments of residues Disulfide bond in protein chain Surface area of a protein segment Properties of the residues in proteins

8 Background 8  Many methods using different protein datasets and different computational methods,  Neural networks, support vector machines, nearest neighbor, information theory, and Bayesian statistics  The prediction is in a discrete fashion  Significant accuracy increase when using evolutionary information  2-state prediction accuracy of ~75% with 0.25 threshold  PSI-BLAST derived profiles  2-state prediction accuracy of ~78%

9 Background-cont. 9 Secondary Structure Prediction 3-state (helix, sheet, coil) 8-state ( α -helix, π -helix, 3 10 -helix, β -strand, β -bridge, turn, bend and others) Residue Solvent Accessibility Prediction 2-state (buried or exposed) Predictor Structural feature (state) of Ri Disulfide Bonding Prediction Stage1: Bonding state prediction (bonded/free) Stage2: Connectivity prediction (connected, not connected) Structural features prediction  classification Each residue is predicted to be in one of few states Machine Learning (ANN, SVM, HMM,...)

10 Statement of the Problem 10  The improvement of prediction methods benefits from the incorporation of effective features  MSA in machine learning  The accuracy of current prediction methods is stagnated for the past few years  2-state solvent accessibility ~78% 3-state secondary structure ~76-80% 8-state secondary structure ~68%

11 Statement of the Problem-cont. 11  How to continuously improve the accuracy of predicting protein structural features toward their theoretical upper bounds?  Reducing the inaccuracy of protein structural features prediction, will be very useful in improving the efficiency of protein tertiary structure prediction  the search space for finding a tertiary structure goes up super-linearly with the fraction of inaccuracy in structural feature prediction

12 HH X Our Approach 12  Extracting and selecting “good” features can significantly enhance the prediction performance  Probably the most effective features, when predicting the structural state of a residue, are the structural states of the neighboring residues  With true states >90% RiRi H H C C B Solvent Accessibility B: Buried E: Exposed Secondary Structure H: Helix E: Sheet C: Coil B B B B

13 Our Approach-cont. 13  Unfortunately, using the true structural states as features is not feasible  However, this inspires us that the favorability of a residue adopting a certain structural state can be also an effective feature  Statistical scores measuring the favorability of a residue adopting a certain structural state within its amino acid environment can be evaluated from the experimentally determined protein structures in (PDB)

14 Our Approach-cont. 14 Predictor Structural feature (state) of Ri Input encoding Sequence & evolutionary info (MSA) + Structure info (context-based scores) We expect that our approaches will improve the predictions of protein structural features with the goal of achieving high accuracy levels

15 Method Context-based features  potential scores  calculated based on the context- based statistics, derived from the protein datasets  estimate the favorability of residues in adopting specific structural states, within their amino acid environment. 15 Context-based Model

16 Context-based Statistics & Potentials 16 RiRi X RiRi CiCi CiCi YRiRi X CiCi

17 Encoding & Neural Network Model 17

18 Results 18 CASP9Manesh215Carugo338 NETASA Q2Q2 69.3271.0969.7 QBQB 70.8672.172.04 QEQE 67.5969.967.22 Sable t=0.2 Q2Q2 78.4779.8378.68 QBQB 78.2780.278.48 QEQE 78.6979.478.91 Sable t=0.3 Q2Q2 75.1377.0475.94 QBQB 89.5591.0890.29 QEQE 59.5860.3560.33 Netsurf Q2Q2 79.1580.8380.04 QBQB 83.3581.27 QEQE 78.1978.4978.13 SPINE Q2Q2 77.8680.579.68 QBQB 83.2285.385.33 QEQE 72.0874.873.53 ACCpro Q2Q2 76.1878.8777.99 QBQB 81.1583.1983.12 QEQE 70.8173.7672.41 Casa Q2Q2 80.8281.9381.14 QBQB 81.4684.2783.65 QEQE 80.1379.1478.39 C OMPARISON OF Q2 ACCURACY BETWEEN OUR AND OTHER POPULARLY USED S OLVENT A CCESSIBILITY PREDICTION SERVERS C OMPARISON OF PREDICTION PERFORMANCE OF S OLVENT A CCESSIBILITY USING PSSM ONLY AND PSSM WITH CONTEXT - BASED SCORES ON C ULL USING 7- FOLD CROSS VALIDATION QBQB QEQE Q2Q2 PSSM Only 78.44%80.61%79.50% PSSM+Score 79.21%82.00%80.76% Q B and Q E to measure the quality of predicting the buried state and the exposed state respectively Q 2 = total number of residues correctly predicted /total number of residues

19 Results-cont. 19 DAVMVFARQGDKGSVSVGDKHFRTQAFKVRLVNAAKSEISLKNSCLVAQSAAGQSFRLDTVDEELTADTLKPGASVEGDAIFASEDDAVYGASLVRLSDRCK 3NRF-A EEB.BEBEEEEEEEEEEEEEEEEBBBBEBEBBBEBEEEBEBEEEBBBBBBEEEEEBEEEEEEEEBEEEEBEEEEEBEBEBEBBBEEEBBEEBBBBBBBEEEE DSSP SA2 EEB.BBBBEEEEBBBBEEEEEEBBBEBEBBBBEBEEEEBEBEEBBBBBBBEEEEEBEBEEBEEEBEEEBBEEEEEBEBBBBBBBEEEEBBEBEBBEBBEEBE PSSM Only 73.58 EEB.BBBEEEEEEEBEEEEEEBBBBEBEBBBBEEEEEEBEBEEBBBBBBBEEEEEBEBEEBEEEBEEEEBEEEEEBEBBBBBBBEEEBBBEBBBBEBBEEBE PSSM+Score 80.19 Solvent Accessibility Prediction on protein 3NRF(A) Q2

20 20 Working with Casa Input title Input your sequence Input your e-mail Submit, then wait for the results... “Casa” available at: http://hpcr.cs.odu.edu/casahttp://hpcr.cs.odu.edu/casa

21 21 Working with Casa Check your e-mail, Click the link provided The results are displayed

22 Summary  The effectiveness of using context-based features has been demonstrated in our computational results in N-fold cross validation as well as on benchmarks, where enhancements of prediction accuracies in secondary structures, disulfide bond and solvent accessibility are observed.  Web servers implementing our prediction methods are currently available.  Dinosolve, available at http://hpcr.cs.odu.edu/dinosolvehttp://hpcr.cs.odu.edu/dinosolve  C3-Scorpion, available at: http://hpcr.cs.odu.edu/c3scorpionhttp://hpcr.cs.odu.edu/c3scorpion  C8-Scorpion, available at: http://hpcr.cs.odu.edu/c8scorpionhttp://hpcr.cs.odu.edu/c8scorpion  Casa, available at: http://hpcr.cs.odu.edu/casahttp://hpcr.cs.odu.edu/casa 22

23 Publications 23 Publication 1 Ashraf Yaseen and Yaohang Li “Enhancing Protein Disulfide Bonding Prediction Accuracy with Context- based Features”, Proceedings of Biotechnology and Bioinformatics Symposium, (BIOT2012), Provo, 2012 2 Ashraf Yaseen and Yaohang Li, "Dinosolve: A Protein Disulfide Bonding Prediction Server using Context- based Features to Enhance Prediction Accuracy". Accepted, BMC Bioinformatics 2013. 3 Ashraf Yaseen and Yaohang Li “Template-based Prediction of Protein 8-state Secondary structures”. 3 rd IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), New Orleans, April 2013. Accepted, BMC Bioinformatics 4 Ashraf Yaseen and Yaohang Li “Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features”, Accepted at BIOT2013 5 Ashraf Yaseen and Yaohang Li “Context-based features can enhance protein secondary structure prediction accuracy”. Submitted to Bioinformatics. 6 Ashraf Yaseen and Yaohang Li, “Accelerating Knowledge-based Energy Evaluation in Protein Structure Modeling with Graphics Processing Units,” Journal of Parallel and Distributed Computing, 72(2): 297-307, 2012

24 Acknowledgement  This work is partially supported by NSF through grant 1066471 and ODU SEECR grant 24

25 Questions? Thank You 25


Download ppt "Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics."

Similar presentations


Ads by Google