Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Secondary structure prediction from amino acid sequence.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Structural bioinformatics
Brian Merrick CS498 Seminar.  Introduction to Neural Networks  Types of Neural Networks  Neural Networks with Pattern Recognition  Applications.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Structure Prediction in 1D
Protein Structure July 2, 2006 Learning objectives-Understand the basis of the secondary structure prediction program- Psi-PRED. Introduce the concept.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Tertiary Structure Prediction
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Intelligent Systems for Bioinformatics Michael J. Watts
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
COT 6930 HPC and Bioinformatics Protein Structure Prediction Xingquan Zhu Dept. of Computer Science and Engineering.
Appendix B: An Example of Back-propagation algorithm
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Neural Networks Geoff Hulten.
Protein structure prediction.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha

Outline Goal is to predict “secondary structure” of a protein from its sequence Artificial Neural Network used for this task Evaluation of prediction accuracy

What is Protein Structure?

http ://academic.brooklyn.cuny.edu/biology/bio4fv/page/3d_prot.htm

Protein Structure An amino acid sequence “folds” into a complex 3-D structure Finding out this 3-D structure is a crucial and challenging task Experimental methods (e.g., X-ray crystallography) are very tedious Computational predictions are a possibility, but very difficult

What is “secondary structure”?

“Strand”“Helix”

“Strand” “Helix”

Secondary structure prediction Well, the whole 3-D “tertiary” protein structure may be hard to predict from sequence But can we at least predict the secondary structural elements such as “strand”, “helix” or “coil”? This is what this paper does.. and so do many other papers (it is a hard problem !)

A survey of structure prediction The most reliable technique is “comparative modeling” –Find a protein P whose amino acid sequence is very similar to your “target” protein T –Hope that this other protein P does have a known structure –Predict a similar structure similar to that of P, after carefully considering how the sequences of P and T differ

A survey of structure prediction Comparative modeling fails if we don’t have a suitable homologous “template” protein P for our protein T “Ab initio” tertiary methods attempt to predict the structure without using a protein structure –Incorporate basic physical and chemical principles into the structure calculation –Gets very hairy, and highly computationally intensive The other option is prediction of secondary structure only (i.e., making the goal more modest) –These may be used to provide constraints for tertiary structure prediction

Secondary structure prediction Early methods were based on stereochemical principles Later methods realized that we can do better if we use not only the one sequence T (our sequence), but also a family of “related sequences” Search for sequences similar to T, build a multiple alignment of these, and predict secondary structure from the multiple alignment of sequence

What’s multiple alignment doing here ? Most conserved regions of a protein sequence are either functionally important or buried in the protein “core” More variable regions are usually on surface of the protein, –there are few constraints on what type of amino acids have to be here (apart from bias towards hydrophilic residues) Multiple alignment tells us which portions are conserved and which are not

hydrophobic core

What’s multiple alignment doing here ? Therefore, by looking at multiple alignment, we could predict which residues are in the core of the protein and which are on the surface (“solvent accessibility”) Secondary structure then predicted by comparing the accessibility patterns associated with helices, strands etc. This approach (Benner & Gerloff) mostly manual Today’s paper suggest an automated method

The PSI-PRED algorithm Given an amino-acid sequence, predict secondary structure elements in the protein Three stages: 1.Generation of a sequence profile (the “multiple alignment” step) 2.Prediction of an initial secondary structure (the neural network step) 3.Filtering of the predicted structure (another neural network step)

Generation of sequence profile A BLAST-like program called “PSI-BLAST” used for this step We saw BLAST earlier -- it is a fast way to find high scoring local alignments PSI-BLAST is an iterative approach –an initial scan of a protein database using the target sequence T –align all matching sequences to construct a “sequence profile” –scan the database using this new profile Can also pick out and align distantly related protein sequences for our target sequence T

The sequence profile looks like this Has 20 x M numbers The numbers are log likelihood of each residue at each position

Preparing for the second step Feed the sequence profile to an artificial neural network But before feeding, do a simply “scaling” to bring the numbers to 0-1 scale

Intro to Neural nets (the second and third steps of PSIPRED)

Artificial Neural Network Supervised learning algorithm Training examples. Each example has a label –“class” of the example, e.g., “positive” or “negative” –“helix”, “strand”, or “coil” Learns how to predict the class of an example

Artificial Neural Network Directed graph Nodes or “units” or “neurons” Edges between units Each edge has a weight (not known a priori)

Layered Architecture Input here is a four-dimensional vector. Each dimension goes into one input unit

Layered Architecture ( units )

What a unit (neuron) does Unit i receives a total input x i from the units connected to it, and produces an output y i = f i (x i ) where f i () is the “transfer function” of unit i w i is called the “bias” of the unit

Weights, bias and transfer function Unit takes n inputs Each input edge has weight w i Bias b Output a Transfer function f() Linear, Sigmoidal, or other

Weights, bias and transfer function Weights w ij and bias w i of each unit are “parameters” of the ANN. –Parameter values are learned from input data Transfer function is usually the same for every unit in the same layer Graphical architecture (connectivity) is decided by you. –Could use fully connected architecture: all units in one layer connect to all units in “next” layer

Where’s the algorithm? It’s in the training of parameters ! Given several examples and their labels: the training data Search for parameter values such that output units make correct predictions on the training examples “Back-propagation” algorithm –Read up more on neural nets if you are interested

Back to PSIPRED …

Step 2 Feed the sequence profile to the input layer of an ANN Not the whole profile, only a window of 15 consecutive positions For each position, there are 20 numbers in the profile (one for each amino acid) Therefore ~ 15 x 20 = 300 numbers fed Therefore, ~ 300 “input units” in ANN 3 output units, for “strand”, “helix”, “coil” –each number is confidence in that secondary structure for the central position in the window of 15

15 Input layer Hidden layer helix strand coil e.g.,

Step 3 Feed the output of 1st ANN to the 2nd ANN Each window of 15 positions gave 3 numbers from the 1st ANN Take 15 successive windows’ outputs and feed them to 2nd ANN Therefore, ~ 15 x 3 = 45 input units in ANN 3 output units, for “strand”, “helix”, “coil”

Test of performance

Cross-validation Partition the training data into “training set” (two thirds of the examples) and “test set” (remaining one third) Train PSIPRED on training set, test predictions and compare with known answers on test set. What is an answer? –For each position of sequence, a prediction of what secondary structure that position is involved in –That is, a sequence over “H/S/C” (helix/strand/coil) How to compare answer with known answer? –Number of positions that match