Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physics and structure of biomacromolecules Konstantin Zeldovich LRB 1004, x62354.

Similar presentations

Presentation on theme: "Physics and structure of biomacromolecules Konstantin Zeldovich LRB 1004, x62354."— Presentation transcript:

1 Physics and structure of biomacromolecules Konstantin Zeldovich LRB 1004, x62354

2 Protein structure PDB, the Protein Data Bank: ~63,000 structures Primary, secondary, tertiary, … structure Domains Methods: X-ray and NMR Computational approaches Diverse structures: from globular to knotted and intrinsically disordered, but a limited repertoire of ~1000 folds Branden & Tooze, Introduction to protein structure

3 Interactions within a protein Van der Waals Hydrophobic forces Electrostatic Hydrogen bonds Role of solvent Hierarchy of energies (bond strength) Many interactions of a similar energy scale (except chemical bonds). Overall, a 300-residude protein has  G ~ 5 kcal/mol -per residue, a very small difference between folded and unfolded states - SUBTLE BALANCE Hydrophobic interactions drive folding to the compact structure

4 Thermodynamics of folding Privalov, J Chem Thermodyn 29: 447 (1997) Methods: calorimetry, thermal or chemical denaturation Small proteins fold in a two-state fashion, folding is reversible lysozyme heat capacity N U GG reaction coordinate unfoldednative transition state

5 Kinetics of folding Plaxco et al, JMB 277:985 (1998); Biochemistry 39:11177 (2000) For many proteins, folding rate is determined by their topology (contact order) However: newer research suggests strong outliers; C.R. Matthews lab. Contact order (CO) = average sequence separation between contacting residue pairs Relative CO: normalized by chain length

6 Most proteins are densely packed Radius of gyration vs. chain length All bacterial proteins from the PDB, June 2009

7 Anfinsen’s thermodynamic hypothesis Native state is entirely defined by sequence Native state is a minimum of free energy – Unique – Stable – Kinetically accessible All computational efforts depend on these ideas Anfinsen, Science 181: 223 (1973)

8 How sequence defines structure? Protein is a heteropolymer How can a specific structure arise at all? Protein-like sequences and energy gap Folding landscape and “funnels” Review papers: Dill et all, Annu. Rev. Biophys : Shakhnovich, Chem. Rev : Onuchic, Luthey-Schulten, Wolynes, Annu. Rev. Phys. Chem :

9 Toy models address basic questions 27-residue compact chain on 3x3x3 lattice Conformational space is discrete, structures Pairwise contact potentials: only nearest neighbors interact Simulations are very quick Lau & Dill, Macromolecules 22, 3986 (1989) Shakhnovich & Gutin, J Chem Phys 93, 5967 (1990) Discrete conformational space -> we can calculate the energies of the toy protein in each and every of the possible configuration. The configuration with the lowest energy is the native state

10 Proteins have a large energy gap E WHPCECQLLRYGNNDFRNLDMLFISFRWEDNMIQAGWYCPLTRRHIFQFYCHFY compact lattice 27-mers with 10,000 possible conformations Gap! Also, a sparse spectrum for low E

11 Energy gap leads to stability What is the probability to find a protein in its native state? Gap! The larger the gap, the more populated the native state is compared to other states protein random polypeptide P N vs T is roughly equivalent to CD spectra of thermal denaturation

12 Kinetics of folding and “funnels” How does the protein find its native state? Levinthal paradox: a brute-force search of all possible configurations would be outrageously long. In reality, proteins fold in milliseconds. Answer: the native state must be kinetically accessible Dill et all, Annu. Rev. Biophys :289 The lower the energy, the more similar conformations are. Folding thus converges to the single native state Empirically (from simulations), a large gap is necessary for fast folding

13 To crystallize or to simulate? Protein structure prediction Homology modeling vs molecular simulations Structural genomics CASP competition To crystallize is hard, to sequence is cheap. Structure from sequence? In a perfect world: knowing the all of the interactions, find the conformation corresponding to the minimum energy. Voila, this is the native state. Practical challenges: -Interactions are not known exactly -Interactions with solvent -Very large parameter space (# bond angles ~# of atoms ~ 10 5 ) -Rugged energy landscape with deep local minima – search algorithms are inefficient

14 Threading using energies Jones, Taylor, Thornton, Nature 1992 Given a set of structures, determine which one is the best match for the given sequence Rationale: the number of folds is limited Thread the sequence into each structure (possibly with gaps), then evaluate the energy of amino acid contacts. Select the threading which yields the lowest energy (cf. the gap) Works well even at low sequence homology

15 Threading using profiles Bowie, Luthy, Eisenberg, Science 1991 For each position, assess: -secondary structure -fraction polar -buried area, … Residue type A CDE … … position profileAverage over homologous sequences with known structures Create profiles for different folds (using known structures with homologous sequences) For a given sequence with unknown structure, match it to all profiles (with gaps) Select the profile with best score.

16 Homology modeling Marti-Renom,… Sali, Annu. Rev. Biophys. Biomol. Struct :291–325 Pairwise sequence alignment with PDB (BLAST) Match to multiple seq.alignment (PSI-BLAST) Threading, or 3D template matching to PDB Fold correctness? (by seq.similarity?) Stereochemistry Solvent accessibility Positions of charged and hydrophobic groups … Rigid-body assembly Segment matching (aligning conserved atoms) Satisfaction of spatial restraints

17 ab initio structure prediction Anfinsen’s hypothesis: -native structure is entirely determined by the sequence -native structure is a unique energy minimum Assuming we know interactions between the amino acids, can we just look for this minimum??? Polymer modeling is extensively used in materials science. Is it applicable to proteins? Two main methods: molecular dynamics and Monte Carlo deterministicstochastic reflects dynamicsno dynamics Karplus, Scheraga, …

18 Force fields and potentials How do we know the strength of each interaction between atoms in a protein? Ab initio approach: quantum chemistry can calculate the electron density profiles, and thus the energy (isn’t a protein just one big Schroedinger equation?) Statistical approach: learn from the PDB by counting the contacts Potentials optimized to correctly predict known structures of small molecules CHARMM, AMBER Miyazawa & Jernigan 1985, 1996 Boltzmann law:Inverting: number of contacts molar fractions Training set must be carefully chosen: various folds, no homology, …

19 Molecular dynamics: For i -th atom: for a while i j x time Trajectories of all atoms Pros: - Most detailed, most realistic - True dynamics Cons: -Time-consuming force Main issue: needs (picosecond) to reproduce bond vibrations, but folding occurs on microsecond to seconds timescale so at least 10 7 iterations needed Tools: AMBER, CHARMM, GROMACS, NAMD, …

20 Applications of molecular dynamics Protein-ligand interactions Dynamics of protein folding Membrane proteins and ion channels Sidechain packing D.E.Shaw Research has developed a dedicated hardware supercomputer, Anton, to run MD simulations much faster than any commodity clusters hardware designed to run MD, using custom-built chips (ASIC and FPGA) milliseconds are becoming accessible! D.E.Shaw et al 2009, Proceedings of the ACM/IEEE Conference on Supercomputing (SC09)

21 Monte-Carlo simulation Sacrifices information about dynamics to better explore the full energy landscape Trial move energy Elementary step: Make a trial move, and accept or reject the new configuration - always accept - accept with probability (Metropolis sampling) Different conformations are visited with the same frequency as in mol.dyn.

22 Monte-Carlo simulation (cont’d) Typical moves are rotations around bonds -local move, rotation of one atom rel. to its two neighbors -global move, pivoting of the entire chain around a bond Advantage over MD: no small/large timescale problem However, - no direct information about dynamics - calculating rotations is expensive (trigonometry!) Often used in coarse-grained simulations to explore large conformational space and find basins of attraction (energy valleys). If needed, these valleys can then be further explored by molecular dynamics Tools: ProFASi

23 Hybrid techniques: I-TASSER Wu, Skolnick, Zhang, BMC Biology 5:17 (2007)

24 Hybrid techniques: ROBETTA Kim, Chivian, Baker, NAR 2004, vol. 32 W526–W531 Sequences parsed into putative domains If homology is found, comparative modeling If low homology, ab initio folding 3 or 9 residues fragment libraries are assembled Selected decoys are clustered, cluster centroids used as models Sidechains repacked by MC simulations using a rotamer library

25 Structural databases: SCOP, CATH Hierarchical structural classification Class all-alpha, all-beta, alpha/beta, alpha+beta, mulitdomain, membrane, small Fold Superfamily Family Hierarchical domain classification Class: mainly-alpha, mainly-beta and alpha-beta Architecture Topology (fold family) Homologous superfamily Murzin et al, JMB 247:536(1995)Orengo et al, Structure 5:1093 (1997)

26 Tools & servers PDB Structure prediction servers and tools (just a few) I-TASSER ROBETTA MODELLER Molecular dynamics packages (general) AMBER CHARMM GROMACS NAMD Monte Carlo protein modeling ProFASi Structural biology software database

Download ppt "Physics and structure of biomacromolecules Konstantin Zeldovich LRB 1004, x62354."

Similar presentations

Ads by Google