Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein structure prediction Haixu Tang School of Informatics.

Similar presentations


Presentation on theme: "Protein structure prediction Haixu Tang School of Informatics."— Presentation transcript:

1 Protein structure prediction Haixu Tang School of Informatics

2 Basic operations in a cell (Central Dogma) A gene is expressed in two steps 1)Transcription: RNA synthesis 2)Translation: Protein synthesis

3 Basic operations in a cell (Central Dogma) A gene is expressed in two steps 1)Transcription: RNA synthesis 2)Translation: Protein synthesis Proteins

4 Proteins are major function biomolecules in cells Examples of protein functions –Catalysis: Almost all chemical reactions in a living cell are catalyzed by protein enzymes. –Transport: Some proteins transports various substances, such as oxygen, ions, and so on. –Information transfer: For example, hormones. Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones Haemoglobin carries oxygen Insulin controls the amount of sugar in the blood

5 Protein is composed of amino acids COO - NH 3 + C R H Amino groupCarboxylic acid group Different side chains, R, determine the chemical properties of 20 amino acids.

6 20 Amino acids Glycine (G) Glutamic acid (E) Asparatic acid (D) Methionine (M) Threonine (T) Serine (S) Glutamine (Q) Asparagine (N) Tryptophan (W) Phenylalanine (F) Cysteine (C) Proline (P) Leucine (L) Isoleucine (I) Valine (V) Alanine (A) Histidine (H) Lysine (K) Tyrosine (Y) Arginine (R) White: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic

7 Proteins are linear polymers of amino acids R1R1 NH 3 + C CO H R2R2 NH C CO H R3R3 NH CCO H R2R2 NH 3 + C COO ー H + R1R1 NH 3 + C COO ー H + H2OH2O H2OH2O Peptide bond The amino acid sequence is called as primary structure AA F N G G S T S D K

8 Each Protein has a unique structure Amino acid sequence NLKTEWPELVGKSVEE AKKVILQDKPEAQIIVL PVGTIVTMEYRIDRVR LFVDKLDNIAEVPRVG folding

9 Protein Structure Determination X-ray crystallography –most accurate –in vitro –need crystal proteins –~100K per structure Nuclear Magnetic Resonance –Fairly accurate –in vivo, in solution –No need for crystals –Limited to small proteins

10 Protein data bank http://www.rcsb.org/pdb/ PDB files: atom coordinates, etc ( 1atn: actin/DNAse I complex) ATOM 1 CA ACE A 0 105.046 51.546 40.626 1.00 72.72 1ATN 263 ATOM 2 C ACE A 0 105.314 50.822 41.951 1.00 72.72 1ATN 264 ATOM 3 O ACE A 0 105.220 51.451 43.013 1.00 72.56 1ATN 265 ATOM 4 N ASP A 1 105.665 49.507 41.867 1.00 71.64 1ATN 266 ATOM 5 CA ASP A 1 105.992 48.589 42.982 1.00 70.20 1ATN 267 ATOM 6 C ASP A 1 107.024 49.191 43.936 1.00 69.70 1ATN 268 ATOM 7 O ASP A 1 106.927 49.088 45.163 1.00 69.14 1ATN 269 ATOM 8 CB ASP A 1 106.533 47.248 42.410 1.00 70.66 1ATN 270

11 Visualizing protein structure (PDB files)

12 Basic structural units of proteins: Secondary structure α-helix β-sheet Secondary structures, α-helix and β-sheet, have regular hydrogen-bonding patterns.

13 Three-dimensional structure of proteins Tertiary structure Quaternary structure

14 Hierarchical nature of protein structure Primary structure (Amino acid sequence) ↓ Secondary structure ( α -helix, β -sheet ) ↓ Tertiary structure ( Three-dimensional structure formed by assembly of secondary structures ) ↓ Quaternary structure ( Structure formed by more than one polypeptide chains )

15 Secondary Structure Prediction Given a protein sequence, secondary structure prediction aims at predicting the state of each amino acid as being either H (helix), E (extended=strand), or O (other). The quality of secondary structure prediction is measured with a “3-state accuracy” score, or Q 3. Q 3 is the percent of residues that match “reality” (X-ray structure).

16 Early methods for Secondary Structure Prediction Chou and Fasman (Chou and Fasman. Prediction of protein conformation. Biochemistry, 13: 211-245, 1974) GOR (Garnier, Osguthorpe and Robson. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol., 120:97- 120, 1978)

17 Amino Acid  -Helix  -SheetTurn Ala 1.29 0.900.78 Cys 1.11 0.740.80 Leu 1.30 1.020.59 Met 1.47 0.970.39 Glu 1.44 0.751.00 Gln 1.27 0.800.97 His 1.22 1.080.69 Lys 1.23 0.770.96 Val 0.91 1.490.47 Ile 0.97 1.450.51 Phe 1.07 1.320.58 Tyr 0.72 1.251.05 Trp 0.99 1.140.75 Thr 0.82 1.211.03 Gly 0.56 0.921.64 Ser 0.82 0.951.33 Asp 1.04 0.721.41 Asn 0.90 0.761.23 Pro 0.52 0.641.91 Arg 0.96 0.990.88 Chou and Fasman Favors  -Helix Favors  -strand Favors turn

18 The GOR method For each position j in the sequence, eight residues on either side are considered. j

19 Accuracy Both Chou and Fasman and GOR have been assessed and their accuracy is estimated to be Q3=60-65%.

20 Neural networks The most successful methods for predicting secondary structure are based on neural networks. The overall idea is that neural networks can be trained to recognize amino acid patterns in known secondary structure units, and to use these patterns to distinguish between the different types of secondary structure. Neural networks classify “input vectors” or “examples” into categories (2 or more).

21 Protein 3D Structure Prediction In theory, a protein structure can be predicted computationally A protein folds into a 3D structure to minimizes its free potential energy The problem can be formulated as a search problem for minimum energy –the search space is enormous even for small proteins! –the number of local minima increases exponentially of the size of proteins

22 Computational Methods for Protein 3D Structure Prediction Comparative modeling –Protein threading – make structure prediction through identification of “good” sequence-structure fit –Homology modeling – identification of homologous proteins through sequence alignment; structure prediction through placing residues into “corresponding” positions of homologous structure models

23 Protein Threading Find the “correct” sequence-structure alignment between a target sequence and its native-like fold in PDB Energy function – knowledge (or statistics) based rather than physics based –Should be able to distinguish correct structural folds from incorrect structural folds –Should be able to distinguish correct sequence-fold alignment from incorrect sequence-fold alignments

24 Protein Threading Structure database Fitness function Sequence-structure alignment algorithm Prediction reliability assessment

25 Protein Threading – structure database Build a template database

26 Protein Threading – fitness function MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE how well a residue fits a structural environment: E_s how preferable to put two particular residues nearby: E_p alignment gap penalty: E_g find a sequence-structure alignment to minimize the energy function

27 Protein Threading (sequence-structure alignment) Unlike sequence-sequence alignment where amino acids are aligned, a sequence-structure alignment aligns amino acids with structural environments A simple definition of structural environment –secondary structure: alpha-helix, beta-strand, loop –solvent accessibility: 0, 10, 20, …, 100% of accessibility –each combination of secondary structure and solvent accessibility level defines a structural environment E.g., (alpha-helix, 30%), (loop, 80%), …

28 Protein Threading -- algorithm Threading algorithm – to find a sequence-structure alignment with the minimum fitness function sequence fold links

29 CASP CASP = Critical Assessment of Structure Prediction First held in 1994, every 2 years afterwards Teams make structure predictions from sequences alone

30 CASP Two categories of predictors –Automated Automatic Servers, must complete analysis within 48 hours Shows what is possible through computer analysis alone –Non-automated Groups spend considerable time and effort on each target Utilize computer techniques and human analysis techniques

31 CASP CASP6, 2004 –200 prediction teams from 24 countries –Over 30,000 predictions for 64 protein targets collected and evaluated –Conference held after to discuss results, with many teams presenting individual results and methodologies –Helps to steer future work


Download ppt "Protein structure prediction Haixu Tang School of Informatics."

Similar presentations


Ads by Google