Presentation is loading. Please wait.

Presentation is loading. Please wait.

Secondary structure prediction

Similar presentations


Presentation on theme: "Secondary structure prediction"— Presentation transcript:

1 Secondary structure prediction
Chou & Fasman (1974)

2 Protein Structure – Why do we care?
• Structure Function Relation – The shape of a protein molecule directly determines its biological function. • Proteins with similar function often have similar shape or similar regions or domains. • Hence, if we find a new protein and know it’s shape, we can make a good guess about it’s biological function.

3 Why predict when we can get the real thing?
Swiss-Prot Release : protein sequences TrEMBL Release : protein sequences Secondary structure is derived by tertiary coordinates To get to tertiary structure we need NMR, X-ray We have an abundance of primaries..so why not use them? PDB database : protein structures Primary structure No problems Overall 77% accurate at predicting Secondary structure Tertiary structure Overall 30% accurate at predicting Quaternary structure No reliable means of predicting yet Function Do you feel like guessing?

4 Structure Prediction Methods
Knowledge Approach Difficulty Usefulness Homolgy Modeling Proteins of known structure Identify related structure with sequence methods, copy 3D coords and modify as necessary Relatively easy Very, if sequence identity > 40% - drug design Fold Recognition Same as above, but use more sophisticated methods to find related structure Medium Limited due to poor models Secondary structure predeiction Sequence-structure statistics Forget 3D-arrangement And predict where the helices/starnds are Can improve alignments, fold recognition, ab -initio Abi initio prediction Energy function statistics Simulate folding, or generate lots of structures and try to pick the correct one Very hard Not really

5 History 1974. Chou and Fasman propose a statistical method based on the propensities of amino acids to adopt secondary structures based on the observation of their location in 15 protein structures determined by X-ray diffraction. Clearly these statistics derive from the particular stereochemical and physicochemical properties of the amino acids. Rather than a position by position analysis the propensity of a position is calculated using an average over 5 or 6 residues surrounding each position. On a larger set of 62 proteins the base method reports a success rate of 50%. 1978 Garnier improved the method by using statistically significant pair-wise interactions as a determinant of the statistical significance. This improved the success rate to 62% 1993 Levin improved the prediction level by using multiple sequence alignments. The reasoning is as follows. Conserved regions in a multiple sequence alignment provides a strong evolutionary indicator of a role in the function of the protein. Those regions are also likely to have conserved structure, including secondary structure and strengthen the prediction by their joint propensities. This improved the success rate to 69%. 1994 Rost and Sander combined neural networks with multiple sequence alignments. The idea of a neural net is to create a complex network of interconnected nodes, where progress from one node to the next depends on satisfying a weighted function that has been derived by training the net with data of known results, in this case protein sequences with known secondary structures. The success rate is 72%.

6 Secondary Structure Prediction Algorithms
• These methods are 70-75% accurate at predicting secondary structure. • A few examples are – Chou Fasman Algorithm – Garnier-Osguthorpe-Robson (GOR) method – Neural network models – Nearest-neighbor method

7 Chou-Fasman Algorithm
• Analyzed the frequency of the 20 amino acids in alpha helices, Beta sheets and turns. • Ala (A), Glu (E), Leu (L), and Met (M) are strong predictors of  helices • Pro (P) and Gly (G) break  helices. • When 4 of 5 amino acids have a high probability of being in an alpha helix, it predicts a alpha helix. • When 3 of 5 amino acids have a high probability of being in a  strand, it predicts a  strand. • 4 amino acids are used to predict turns.

8 Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)
Alanine

9 Articles Chou, P.Y. and Fasman, G.D. (1974).
Conformational parameters for amino acids in helical, b-sheet, and random coil regions calculated from proteins. Biochemistry 13, Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13,

10 Method Assigning a set of prediction values to a residue, based on statistic analysis of 15 proteins Applying a simple algorithm to those numbers

11 Calculation of Propensities
Pr[i|-sheet]/Pr[i], Pr[i|-helix]/Pr[i], Pr[i|other]/Pr[i] determine the probability that amino acid i is in each structure, normalized by the background probability that i occurs at all. Example. let's say that there are 20,000 amino acids in the database, of which 2000 are serine, and there are 5000 amino acids in helical conformation, of which 500 are serine. Then the helical propensity for serine is: (500/5000) / (2000/20000) = 1.0

12 Calculation of preference parameters
Preference parameter > 1.0  specific residue has a preference for the specific secondary structure. Preference parameter = 1.0  specific residue does not have a preference for, nor dislikes the specific secondary structure. Preference parameter < 1.0  specific residue dislikes the specific secondary structure.

13 Preference parameters
Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 Gln 1.23 0.56 0.050 0.030 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.033 Ile 1.60 0.58 0.068 0.017 Leu 1.34 1.22 0.038 0.019 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.036 Phe 1.12 1.28 0.031 0.047 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.72 1.56 0.100 0.095 Thr 0.82 0.093 0.056 Trp 1.14 1.19 1.11 0.045 0.000 0.205 Tyr 0.61 1.29 1.25 0.136 0.110 0.102 Val 1.65 0.30 0.023 Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 Gln 1.23 0.56 0.050 0.030 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.033 Ile 1.60 0.58 0.068 0.017 Leu 1.34 1.22 0.038 0.019 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.036 Phe 1.12 1.28 0.031 0.047 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.72 1.56 0.100 0.095 Thr 0.82 0.093 0.056 Trp 1.14 1.19 1.11 0.045 0.000 0.205 Tyr 0.61 1.29 1.25 0.136 0.110 0.102 Val 1.65 0.30 0.023 Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 Gln 1.23 0.56 0.050 0.030 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.033 Ile 1.60 0.58 0.068 0.017 Leu 1.34 1.22 0.038 0.019 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.036 Phe 1.12 1.28 0.031 0.047 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.72 1.56 0.100 0.095 Thr 0.82 0.093 0.056 Trp 1.14 1.19 1.11 0.045 0.000 0.205 Tyr 0.61 1.29 1.25 0.136 0.110 0.102 Val 1.65 0.30 0.023

14 Applying algorithm Assign parameters (propensities) to residue.
Identify regions (nucleation sites) where 4 out of 6 residues have P(a)>100: a-helix. Extend helix in both directions until four contiguous residues have an average P(a)<100: end of a-helix. If segment is longer than 5 residues and P(a)>P(b): a-helix. Repeat this procedure to locate all of the helical regions. Identify regions where 3 out of 5 residues have P(b)>100: b-sheet. Extend sheet in both directions until four contiguous residues have an average P(b)<100: end of b-sheet. If P(b)>105 and P(b)>P(a): -sheet. Rest: P(a)>P(b)  a-helix. P(b)>P(a)  b-sheet. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > ; (2) average P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<P(t)>P(b): b-turn.

15 Successful method? 15 proteins evaluated:
helix = 46%, ß-sheet = 35%, turn = 65% Overall accuracy of predicting the three conformational states for all residues, helix, b, and coil, is 56% Chou & Fasman:Not so great ? After 1974:improvement of preference parameters


Download ppt "Secondary structure prediction"

Similar presentations


Ads by Google