Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University
Computational Methods for Modeling Protein Folding and Structure 1. Homology Modeling Assumes that proteins with similar sequences have similar structures, alignments 2. Threading “Threads” sequence of unknown structure through database of known structures and scores match based on contact potentials 3. Ab initio or de novo approaches Deduce 3-d structure for given sequence by finding minimum energy based on force field
Types of Computer Simulations 1.Molecular Dynamics a.Decide on model intermolecular forces b.Distribute ,000 molecules in simulation cell assigning random positions and velocities to each molecule c.Monitor molecule’s motion as a function of time by solving Newton’s equation of motion (F=m*a) at each time step to predict new position and velocity d.Take time averages of properties of interest 2.Monte Carlo a.Decide on model intermolecular forces b.Distribute ,000 molecules at random locations in cell c.Generate configurations of these molecules randomly (in proportion to their probability of occurring) d.Take averages over all configurations generated to calculate properties of interest
Types of Computer Simulations – cont. 3.Periodic Boundary Conditions : makes 1000 molecules look like molecules 4.Computer simulation gives exact results for the molecular model studied
Simulation of a System of Hard Spheres
Folding Kinetics New View: Energy Bias Dill & Chan (1997)
Representing Protein Geometry Atomic Resolution Models Each atom on protein and on solvent molecules is represented as a sphere interacting via a realistic set of potentials based on the Lennard Jones potential and electrostatic Coulomb potential Includes correct bond lengths, bond angles, planar trans peptide bond, leads to faithful representation of protein geometry. Low resolution models ( Coarse-grained or Simplified Folding Models ) Solvent molecules not included in the simulation. Lattice Models: protein is chain of single-site amino acid residues arranged on the sites of a square or cubic lattice Off-Lattice models: protein is a flexible chain of single-sphere amino acid residues interacting via Lennard Jones or other potentials Intermediate Resolution Models – in between
All-Atom Simulations
Folding of Villin Headpiece Subdomain Well-studied, fast-folding 36-residue protein Folding time is ~10 microseconds Duan and Kollman (1998) conducted a 1- microsecond simulation of “folding” using 256 dedicated CPU for 2 months Unfolded state hydrophobic collapse helix formation conformational readjustment partially-folded intermediate
All-atom simulations
Folding of Polyalanine 30-mer
Villin Headpiece Folds at Home
Intermolecular Potentials for Spherical Molecules One Example– Lennard Jones Potential Lennard-Jones potential in dimensionless form r*= r/ σ where σ is molecular diameter of system under study taken from Dr. D. A. Kofke’s lectures on Molecular Simulation, SUNY Buffalo
Why use simplified ( coarse- grained) protein models? All atom simulations take too long, can depend sensitively on the details, and sample only very early folding events. Simplified models allow us to learn general physical principles of protein folding. contain few parameters,implicit biases. Allow complete exploration of conformational and sequence space
Lattice Models for Folding: Monte Carlo Simulations (1) (2) (3)(4) Amino acid residues are sites ( beads) on a cubic lattice Generate random moves of “beads” on the lattice protein Accept moves based on their probability of occurring= exp( E new -E old )/kT
Lattice Models -The HP Model Energy function: amino acids are either hydrophobic (H) or polar(P), Hydrophobic beads, H, attract each other with strength ε when they are on neighboring lattice sites U= ε [number H-H contacts]
Lattice model of folding Q 0 = # of native contacts C = total # of contacts F = free energy possible starting conformations rapidly fold to one of disordered globules and then slowly search for one of 10 3 compact transition states that rapidly fold to the unique native structure. F C Q0Q possible starting configurations disordered globules 10 3 transition states 1 native configuration