Secondary structure prediction

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

A Ala Alanine Alanine is a small, hydrophobic
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Proteins Function and Structure.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Predicting local Protein Structure Morten Nielsen.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Applied Bioinformatics The amino acids. Overview Proteins (sneak preview) – Primary structure – Secondary structure – Tertiary structure The amino acids.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Protein Secondary Structures Assignment and prediction.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
It & Health 2009 Summary Thomas Nordahl Petersen.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein Secondary Structures Assignment and prediction.
Introduction to bioinformatics
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Protein Secondary Structures Assignment and prediction.
Predicting local Protein Structure Morten Nielsen.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
The relative orientation observed for  helices packed on ß sheets.
Protein Structure FDSC400. Protein Functions Biological?Food?
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Protein Structures: Experiments and Modeling Patrice Koehl.
Protein Structural Prediction. Protein Structure is Hierarchical.
Proteins account for more than 50% of the dry mass of most cells
Protein structure prediction
Proteins account for more than 50% of the dry mass of most cells
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
On the nature of cavities on protein surfaces: Application to the Identification of drug-binding sites Murad Nayal, Barry Honig Columbia University, NY.
BIOCHEMISTRY REVIEW Overview of Biomolecules Chapter 4 Protein Sequence.
Protein Secondary Structure Prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
1 10/26/2015 MOLECULES. 2 10/26/2015 H 2 N-CH-C-OH O R Monomer E.g. protein Monomer vs polymer amino acid monomer R is a side group.
Protein Structure Prediction
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
Chapter 3 Proteins.
Proteins Secondary Structure Predictions
Secondary Structure Prediction Lecture 7 Structural Bioinformatics Dr. Avraham Samson
Protein structure prediction Haixu Tang School of Informatics.
Proteins Structure Predictions Structural Bioinformatics.
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
Introduction to Bioinformatics II
Figure 3.14A–D Protein structure (layer 1)
Haixu Tang School of Inforamtics
Levels of Protein Structure
Presentation transcript:

Secondary structure prediction Chou & Fasman (1974)

Protein Structure – Why do we care? • Structure Function Relation – The shape of a protein molecule directly determines its biological function. • Proteins with similar function often have similar shape or similar regions or domains. • Hence, if we find a new protein and know it’s shape, we can make a good guess about it’s biological function.

Why predict when we can get the real thing? Swiss-Prot Release : 143790 protein sequences TrEMBL Release : 1075779 protein sequences Secondary structure is derived by tertiary coordinates To get to tertiary structure we need NMR, X-ray We have an abundance of primaries..so why not use them? PDB database : 24168 protein structures Primary structure No problems Overall 77% accurate at predicting Secondary structure Tertiary structure Overall 30% accurate at predicting Quaternary structure No reliable means of predicting yet Function Do you feel like guessing?

Structure Prediction Methods Knowledge Approach Difficulty Usefulness Homolgy Modeling Proteins of known structure Identify related structure with sequence methods, copy 3D coords and modify as necessary Relatively easy Very, if sequence identity > 40% - drug design Fold Recognition Same as above, but use more sophisticated methods to find related structure Medium Limited due to poor models Secondary structure predeiction Sequence-structure statistics Forget 3D-arrangement And predict where the helices/starnds are Can improve alignments, fold recognition, ab -initio Abi initio prediction Energy function statistics Simulate folding, or generate lots of structures and try to pick the correct one Very hard Not really

History 1974. Chou and Fasman propose a statistical method based on the propensities of amino acids to adopt secondary structures based on the observation of their location in 15 protein structures determined by X-ray diffraction. Clearly these statistics derive from the particular stereochemical and physicochemical properties of the amino acids. Rather than a position by position analysis the propensity of a position is calculated using an average over 5 or 6 residues surrounding each position. On a larger set of 62 proteins the base method reports a success rate of 50%. 1978 Garnier improved the method by using statistically significant pair-wise interactions as a determinant of the statistical significance. This improved the success rate to 62% 1993 Levin improved the prediction level by using multiple sequence alignments. The reasoning is as follows. Conserved regions in a multiple sequence alignment provides a strong evolutionary indicator of a role in the function of the protein. Those regions are also likely to have conserved structure, including secondary structure and strengthen the prediction by their joint propensities. This improved the success rate to 69%. 1994 Rost and Sander combined neural networks with multiple sequence alignments. The idea of a neural net is to create a complex network of interconnected nodes, where progress from one node to the next depends on satisfying a weighted function that has been derived by training the net with data of known results, in this case protein sequences with known secondary structures. The success rate is 72%.

Secondary Structure Prediction Algorithms • These methods are 70-75% accurate at predicting secondary structure. • A few examples are – Chou Fasman Algorithm – Garnier-Osguthorpe-Robson (GOR) method – Neural network models – Nearest-neighbor method

Chou-Fasman Algorithm • Analyzed the frequency of the 20 amino acids in alpha helices, Beta sheets and turns. • Ala (A), Glu (E), Leu (L), and Met (M) are strong predictors of  helices • Pro (P) and Gly (G) break  helices. • When 4 of 5 amino acids have a high probability of being in an alpha helix, it predicts a alpha helix. • When 3 of 5 amino acids have a high probability of being in a  strand, it predicts a  strand. • 4 amino acids are used to predict turns.

Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3) Alanine 1.42 0.83 0.66 0.06 0.076 0.035 0.058

Articles Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical, b-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211-221. Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13, 222-245.

Method Assigning a set of prediction values to a residue, based on statistic analysis of 15 proteins Applying a simple algorithm to those numbers

Calculation of Propensities Pr[i|-sheet]/Pr[i], Pr[i|-helix]/Pr[i], Pr[i|other]/Pr[i] determine the probability that amino acid i is in each structure, normalized by the background probability that i occurs at all. Example. let's say that there are 20,000 amino acids in the database, of which 2000 are serine, and there are 5000 amino acids in helical conformation, of which 500 are serine. Then the helical propensity for serine is: (500/5000) / (2000/20000) = 1.0

Calculation of preference parameters Preference parameter > 1.0  specific residue has a preference for the specific secondary structure. Preference parameter = 1.0  specific residue does not have a preference for, nor dislikes the specific secondary structure. Preference parameter < 1.0  specific residue dislikes the specific secondary structure.

Preference parameters Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 Gln 1.23 0.56 0.050 0.030 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.033 Ile 1.60 0.58 0.068 0.017 Leu 1.34 1.22 0.038 0.019 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.036 Phe 1.12 1.28 0.031 0.047 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.72 1.56 0.100 0.095 Thr 0.82 0.093 0.056 Trp 1.14 1.19 1.11 0.045 0.000 0.205 Tyr 0.61 1.29 1.25 0.136 0.110 0.102 Val 1.65 0.30 0.023 Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 Gln 1.23 0.56 0.050 0.030 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.033 Ile 1.60 0.58 0.068 0.017 Leu 1.34 1.22 0.038 0.019 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.036 Phe 1.12 1.28 0.031 0.047 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.72 1.56 0.100 0.095 Thr 0.82 0.093 0.056 Trp 1.14 1.19 1.11 0.045 0.000 0.205 Tyr 0.61 1.29 1.25 0.136 0.110 0.102 Val 1.65 0.30 0.023 Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 Gln 1.23 0.56 0.050 0.030 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.033 Ile 1.60 0.58 0.068 0.017 Leu 1.34 1.22 0.038 0.019 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.036 Phe 1.12 1.28 0.031 0.047 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.72 1.56 0.100 0.095 Thr 0.82 0.093 0.056 Trp 1.14 1.19 1.11 0.045 0.000 0.205 Tyr 0.61 1.29 1.25 0.136 0.110 0.102 Val 1.65 0.30 0.023

Applying algorithm Assign parameters (propensities) to residue. Identify regions (nucleation sites) where 4 out of 6 residues have P(a)>100: a-helix. Extend helix in both directions until four contiguous residues have an average P(a)<100: end of a-helix. If segment is longer than 5 residues and P(a)>P(b): a-helix. Repeat this procedure to locate all of the helical regions. Identify regions where 3 out of 5 residues have P(b)>100: b-sheet. Extend sheet in both directions until four contiguous residues have an average P(b)<100: end of b-sheet. If P(b)>105 and P(b)>P(a): -sheet. Rest: P(a)>P(b)  a-helix. P(b)>P(a)  b-sheet. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<P(t)>P(b): b-turn.

Successful method? 15 proteins evaluated: helix = 46%, ß-sheet = 35%, turn = 65% Overall accuracy of predicting the three conformational states for all residues, helix, b, and coil, is 56% Chou & Fasman:Not so great ? After 1974:improvement of preference parameters