Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.

Slides:



Advertisements
Similar presentations
Improvement of Yin Yang site prediction by incorporating the interplay between phosphorylation and O-GlcNAcylation Chao Ji, Yinxing Guo, Quan Zhang.
Advertisements

Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein threading algorithms 1.GenTHREADER Jones, D. T. JMB(1999) 287, Protein Fold Recognition by Prediction-based Threading Rost, B., Schneider,
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to.
Self-Organizing Hierarchical Neural Network
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tree-based methods, neutral networks
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Prediction of Coordination Number and Relative Solvent Accessibility in Proteins Computational Aspects Yacov Lifshits
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Rising accuracy of protein secondary structure prediction Burkhard Rost
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Intelligent Systems for Bioinformatics Michael J. Watts
Chapter 9 Neural Network.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna.
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Protein Family Classification using Sparse Markov Transducers Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Convolutional LSTM Networks for Subcellular Localization of Proteins
Artificial Neural Network System to Predict Golf Score on the PGA Tour ECE 539 – Fall 2003 Final Project Robert Steffes ID:
“ Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints ” J.Gorodkin, O.Lund, C.A.Anderson, S.Brunak On ISMB 99.
Proteins Structure Predictions Structural Bioinformatics.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Fig. 2 System flowchart. Each of the four iterations contains two models (SS, and ASA/HSE/CN/ANGLES), for a total of eight LSTM-BRNN based models. The.
Automatic Picking of First Arrivals
Yuchun Tang (1), Preeti Singh (1), Yanqing Zhang (1),
Protein Structures.
N-Gram Model Formulas Word sequences Chain rule of probability
Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick  Biophysical Journal 
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Bidirectional Dynamics for Protein Secondary Structure Prediction
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Prediction of the Number of Residue Contacts in Proteins
Lecture 09: Introduction Image Recognition using Neural Networks
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB2000), pp P. Baldi, G. Pollastri, C. Anderson, and S. Brunak Cho, Dong-Yeon

Introduction Prediction of the Secondary Structure of Proteins  Understanding their three dimensional conformations   -helices are built up from one contiguous region of the polypeptide chain.   -sheets are built up from a combination of several disjoint regions. Previous Studies  The best existing methods for predicting protein secondary structure achieve prediction accuracy in 75-77% range.   -sheet is almost invariably the weakest category in terms of correct percentages. Prediction of Amino Acid Partners in  -sheets

Data Preparation Selecting the Data  826 protein chains from the PDB select list of June 1998 Assigning  -sheets Partners A2-B2 A3-B3 B2-C2 B3-C3 C2-D2 C3-D3

Statistical Analysis First Order Statistics  The frequency of occurrence of each amino acid General amino acid frequencies in the data Amino acid frequencies in  -sheets

 The ratio of the frequencies in  -sheets over data

Second Order Statistics  The conditional probabilities P(X|Y) of observing a X knowing that the partner is Y in a  -sheet

 Logo representation

Length Distribution  Interval distances between paired  -strands, measured in residue positions along the chain

Artificial Neural Network Architecture Feedforward Neural Network  Large input windows  They tend to dilute sparse information present in the input that is really relevant for the prediction.  Two-window approach  One can either provide the distance information as a third input to the system or one can train a different architecture for each distance type.

 The architecture  Two input windows of length W  The number D of amino acid is also given as an input unit to the architecture with scaled activity D/100.  The goal is to output a probability reflecting whether the two amino acids located at the center of each window are partners or not.

Recurrent Neural Network  Bi-directional recurrent neural network (BRNN)  Input layer  Forward and backward Markov chain  Output layer

Experiments and Results Data  Randomly split the data 2/3 for training and 1/3 for test  Extremely unbalanced  At each epoch, all the positive examples are presented with randomly selected negative examples.  The total balanced percentage is the average of the two percentages obtained on the positive and negative examples.

Results  Feedforward neural network  The best architecture

 The predicted second order statistics

 Five-fold cross validation  BRNN Architecture  Three values (7, 9, and 11) are used as the size of two input windows.  Length 7 yields again the best performance.

 Five-fold cross validation  Ensemble architecture  The ensemble of 3 BRNNS  Five-fold cross validation

 Summary of all the five-fold cross validation results  Profile approach  The profile approach was used as input to the artificial neural network.  The overall performance is comparable, but not any better.  Profiles may provide more robust first order statistics, but weaker intrasequence correlation.

Discussion We have developed a NN architecture that predicts  -sheet amino acid partners with a balanced performance close to 84% correct prediction.  It is insufficient by itself to reliably predict strand pairing because of the large number of false positive predictions. Some of directions for future work  Profiles on the BRNNs  Reduce the number of false positive predictions  Improve the quality of the match  Use of raw sequence information in addition to profiles  -sheet predictor  Various combinations of the present architectures