Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.

Similar presentations


Presentation on theme: "Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary."— Presentation transcript:

1 Protein Structure Prediction

2 Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary (signal peptide, coiled-coil, trans- membrane, etc.) 3-D prediction, Threading (tertiary structure) Domains, motifs, etc. Subunit (quaternary structure)

3 Self-assembly Proteins self-assemble in solution All of the information necessary to determine the complex 3-D structure is in the amino acid sequences Structure determines function u lock & key model of enzyme function Know the sequence, know the function? Nearly infinite complexity

4 Structure of Peptide Peptide Backbone: C’-N-C α -C’; Dihedral Angle or torsional angle: (Φ,ψ) Instead of 9 variables, use 2 variables (Φ,ψ) for each AA; ω=180 (C’=O and N-H) (Stable resonance). N-terminalC-terminal

5 Structure prediction Protein Structure prediction is the “Holy Grail” of bioinformatics Since structure = function, then structure prediction should allow protein design, design of inhibitors, etc. Huge amounts of genome data - what are the functions of all of these proteins?

6 Chemical Properties of Proteins Proteins are linear polymers of 20 amino acids Chemical properties of the protein are determined by its amino acids Molecular wt., pH, isoelectric point are simple calculations from amino acid composition Hydrophobicity is a property of groups of amino acids - best examined as a graph

7 (Increase local flexibility) (Increase stability)

8 Terminology Active site, Blocks, Core, Fold Domain, Motif Family, superfamily ModuleClass Primary, Secondary, Tertiary, Quaternary

9 Secondary Structure Protein 2ndary structure takes one of three forms: u α helix u β sheet u Turn, coil or loop 2ndary structure are tightly packed in the protein core in a hydrophobic environment 2ndary structure is predicted within a small window Many different algorithms, not highly accurate Better predictions from a multiple alignment Methods: neural networks, nearest-neighbor method, HMM,

10 3-D Structure of Protein Alpha-helix Beta-sheet Loop and Turn Right-hand turn (most), 3.6 residues per turn, Φ=60 0, Ψ=40 0 on average Antiparallele and parallel Turn or coil Loop

11 Neural Networks for 2ndary

12 Protein Structure Classification Class α: a bundle of α helices connected by loops on the surface of protein Class β: antiparallel βsheets Class α/β: mainly parallel βsheets with interveningα helices Class α+β: mainly segregated α helices and antiparallel β sheets Multidomain proteins: comprise domains representing more than one of the above 4 classes Membrane and cell-surface proteins: α helices (hydrophobic) with a particular length range, traversing a membrane

13 Class α Class β Class α/β Class α+β Membrane proteins membrane

14 Structure Prediction on the Web Secondary Structural Content Prediction (SSCP): EMBL, Heidelberg http://www.bork.embl-heidelberg.de/SSCP/sscp_seq.html http://www.bork.embl-heidelberg.de/SSCP/sscp_seq.html http://www.bork.embl-heidelberg.de/SSCP/sscp_seq.html BCM Search Launcher: Protein Secondary Structure Prediction: Baylor College of Medicine http://dot.imgen.bcm.tmc.edu:9331/seq-search/struc-predict.html http://dot.imgen.bcm.tmc.edu:9331/seq-search/struc-predict.html http://dot.imgen.bcm.tmc.edu:9331/seq-search/struc-predict.html PREDATOR: EMBL, Heidelberg http://www.embl-heidelberg.de/cgi/predator_serv.pl http://www.embl-heidelberg.de/cgi/predator_serv.pl http://www.embl-heidelberg.de/cgi/predator_serv.pl UCLA-DOE Protein Fold Recognition Server http://www.doe-mbi.ucla.edu/people/fischer/TEST/getsequence.html http://www.doe-mbi.ucla.edu/people/fischer/TEST/getsequence.html http://www.doe-mbi.ucla.edu/people/fischer/TEST/getsequence.html

15 “Super-secondary” Structure Common structural motifs Membrane spanning Membrane spanning Signal peptide Signal peptide Coiled coil Coiled coil Helix-turn-helix Helix-turn-helix

16 Hydrophobicity Profile for 2ndary (positions of turns between 2ndary structure, exposed and buried residues, membrane- spanning segments, antigenic sites)

17 3-D Structure Cannot be accurately predicted from sequence alone (known as ab initio) Levinthal’s paradox: a 100 aa protein has 3 200 possible backbone configurations - many orders of magnitude beyond the capacity of the fastest computers There are perhaps only a few hundred basic structures, but we don’t yet have this vocabulary or the ability to recognize variants on a theme Methods: HMM, structure profile method, contact potential method, threading method, conformational energy (monte Carlo Algorithm)

18 Procedure of Prediction sequence Database similarity search Align Known structure Family analysis 3D comparative modeling Relationship to Know structure 3D structural Analysis in Lab Predict 3D structure No Yes No Yes

19 Hidden Markov Models for 2D and 3D Hidden Markov Models (HMMs) are a more sophisticated form of profile analysis. Rather than build a table of amino acid frequencies at each position, they model the transition from one amino acid to the next.

20

21 Homology Modeling If two proteins show sufficient sequence similarity, it essentially guarantees that they adopt the same structure. Safe thresholds: >50% identity over 25 residues >30% identity over 50 residues >25% identity over 80 residues or more If one of the two similar proteins has a known structure, can build a rough model of the protein of unknown structure. Quality of the model diminishes with lower sequence identity.

22 Known Structure: ksedemkase- - - - dlkkhgatvltalg ||||:|: || ||:||| ||||||| Unknown Structure: kseddmrrseafgctytcdlrkhgntvltalg Steps in Homology Modeling 1. Do sequence alignment with protein of known structure 2. Replace any side chains that are different in the homolog (green side chains) 3. Rebuild loops where there are gaps in the aligment 4.Adjust side-chains to accommodate the new residues and loops 5.Energy Minimize

23 Structure 3D Profile Method (or 3D-1D method) Data from known library (AA Residues) 3*6 environments

24 Threading Protein Structures Best bet is to compare with similar sequences that have known structures >> Threading u Only works for proteins with >25% sequence similarity to a protein with known structure u Current state of the art requires many days of computing on a dedicated workstation u Some websites offer quick approximations u Will improve as more 3-D structures are described Another aspect of the Genome Project

25 Monte Carlo Algorithm for 3D X: set of atomic coordinates or mainchain-sidechain torsion angles of a protein. E(x): conformation energy; k is Boltzmann’s constant; T is an effective temperature Metropolis Algorithm Metropolis Algorithm 1. generate a random state x, calculate E(x) 2. perturb x: x  x’, to generate a neighbouring conformation 3. calculate E(x’) 4. If E(x) > E(x’), accept x’ as a new state. (downhill). Otherwise accept x’ with a probability exp(-(E(x’)-E(x))/kT). (uphill) 5. return to 2


Download ppt "Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary."

Similar presentations


Ads by Google