Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.

Similar presentations


Presentation on theme: "Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for."— Presentation transcript:

1 Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB2000), pp. 25-36 P. Baldi, G. Pollastri, C. Anderson, and S. Brunak Cho, Dong-Yeon

2 Introduction Prediction of the Secondary Structure of Proteins  Understanding their three dimensional conformations   -helices are built up from one contiguous region of the polypeptide chain.   -sheets are built up from a combination of several disjoint regions. Previous Studies  The best existing methods for predicting protein secondary structure achieve prediction accuracy in 75-77% range.   -sheet is almost invariably the weakest category in terms of correct percentages. Prediction of Amino Acid Partners in  -sheets

3 Data Preparation Selecting the Data  826 protein chains from the PDB select list of June 1998 Assigning  -sheets Partners A2-B2 A3-B3 B2-C2 B3-C3 C2-D2 C3-D3

4 Statistical Analysis First Order Statistics  The frequency of occurrence of each amino acid General amino acid frequencies in the data Amino acid frequencies in  -sheets

5  The ratio of the frequencies in  -sheets over data

6 Second Order Statistics  The conditional probabilities P(X|Y) of observing a X knowing that the partner is Y in a  -sheet

7  Logo representation

8 Length Distribution  Interval distances between paired  -strands, measured in residue positions along the chain

9 Artificial Neural Network Architecture Feedforward Neural Network  Large input windows  They tend to dilute sparse information present in the input that is really relevant for the prediction.  Two-window approach  One can either provide the distance information as a third input to the system or one can train a different architecture for each distance type.

10  The architecture  Two input windows of length W  The number D of amino acid is also given as an input unit to the architecture with scaled activity D/100.  The goal is to output a probability reflecting whether the two amino acids located at the center of each window are partners or not.

11 Recurrent Neural Network  Bi-directional recurrent neural network (BRNN)  Input layer  Forward and backward Markov chain  Output layer

12 Experiments and Results Data  Randomly split the data 2/3 for training and 1/3 for test  Extremely unbalanced  At each epoch, all the 37008 positive examples are presented with 37008 randomly selected negative examples.  The total balanced percentage is the average of the two percentages obtained on the positive and negative examples.

13 Results  Feedforward neural network  The best architecture

14  The predicted second order statistics

15  Five-fold cross validation  BRNN Architecture  Three values (7, 9, and 11) are used as the size of two input windows.  Length 7 yields again the best performance.

16  Five-fold cross validation  Ensemble architecture  The ensemble of 3 BRNNS  Five-fold cross validation

17  Summary of all the five-fold cross validation results  Profile approach  The profile approach was used as input to the artificial neural network.  The overall performance is comparable, but not any better.  Profiles may provide more robust first order statistics, but weaker intrasequence correlation.

18 Discussion We have developed a NN architecture that predicts  -sheet amino acid partners with a balanced performance close to 84% correct prediction.  It is insufficient by itself to reliably predict strand pairing because of the large number of false positive predictions. Some of directions for future work  Profiles on the BRNNs  Reduce the number of false positive predictions  Improve the quality of the match  Use of raw sequence information in addition to profiles  -sheet predictor  Various combinations of the present architectures


Download ppt "Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for."

Similar presentations


Ads by Google