Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)

The Protein Folding Problem Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be?Central question of molecular biology:Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be? Input: AAVIKYGCAL… Output: 1 1, 2 2 … = backbone conformation: (no side chains yet)

Disulfide Bonds Two cyteines in close proximity will form a covalent bond Disulfide bond, disulfide bridge, or dicysteine bond. Significantly stabilizes tertiary structure.

Protein Folding – Biological perspective Central dogma: Sequence specifies structureCentral dogma: Sequence specifies structure Denature – to unfold a protein back to random coil configuration – -mercaptoethanol – breaks disulfide bonds –Urea or guanidine hydrochloride – denaturant –Also heat or pH Anfinsens experiments –Denatured ribonuclease –Spontaneously regained enzymatic activity –Evidence that it re-folded to native conformation

Folding intermediates Levinthals paradox – Consider a 100 residue protein. If each residue can take only 3 positions, there are 3 100 = 5 10 47 possible conformations. –If it takes 10 -13 s to convert from 1 structure to another, exhaustive search would take 1.6 10 27 years! Folding must proceed by progressive stabilization of intermediates –Molten globules – most secondary structure formed, but much less compact than native conformation.

Forces driving protein folding It is believed that hydrophobic collapse is a key driving force for protein folding –Hydrophobic core –Polar surface interacting with solvent Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions

Folding help Proteins are, in fact, only marginally stable –Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form Many proteins help in folding –Protein disulfide isomerase – catalyzes shuffling of disulfide bonds –Chaperones – break up aggregates and (in theory) unfold misfolded proteins

The Hydrophobic Core Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen. The mutation E6 V in the chain places a hydrophobic Val on the surface of hemoglobin The resulting sticky patch causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently Sickle cell anemia was the first identified molecular disease

Sickle Cell Anemia Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination.

Computational Problems in Protein Folding Two key questions: –Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein? H-bonds, electrostatics, hydrophobic effect, etc. Derive a function, see how well it does on real proteins –Optimization – once we get an evaluation function, can we optimize it? Simulated annealing/monte carlo EC Heuristics

Fold Optimization Simple lattice models (HP-models) –Two types of residues: hydrophobic and polar –2-D or 3-D lattice –The only force is hydrophobic collapse –Score = number of H H contacts

H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: –Penalize for buried polar or surface hydrophobic residues Scoring Lattice Models

What can we do with lattice models? For smaller polypeptides, exhaustive search can be used –Looking at the best fold, even in such a simple model, can teach us interesting things about the protein folding process For larger chains, other optimization and search methods must be used –Greedy, branch and bound –Evolutionary computing, simulated annealing –Graph theoretical methods

The hydrophobic zipper effect: Learning from Lattice Models Ken Dill ~ 1997

Absolute directions –UURRDLDRRU Relative directions –LFRFRRLLFL –Advantage, we cant have UD or RL in absolute –Only three directions: LRF What about bumps? LFRRR –Bad score –Use a better representation Representing a lattice model

Preference-order representation Each position has two preferences –If it cant have either of the two, it will take the least favorite path if possible Example: {LR},{FL},{RL}, {FR},{RL},{RL},{FR},{RF} Can still cause bumps: {LF},{FR},{RL},{FL}, {RL},{FL},{RF},{RL}, {FL}

Decoding the representation The optimizer works on the representation, but to score, we have to decode into a structure that lets us check for bumps and score. Example: How many bumps in: URDDLLDRURU? We can do it on graph paper –Start at 0,0 –Fill in the graph

More realistic models Higher resolution lattices (45° lattice, etc.) Off-lattice models –Local moves –Optimization/search methods and / representations Greedy search Branch and bound EC, Monte Carlo, simulated annealing, etc.

Threading: Fold recognition Given: –Sequence: IVACIVSTEYDVMKAAR… –A database of molecular coordinates Map the sequence onto each fold Evaluate –Objective 1: improve scoring function –Objective 2: folding

X-Ray Crystallography ~0.5mm The crystal is a mosaic of millions of copies of the protein. As much as 70% is solvent (water)! May take months (and a green thumb) to grow.

X-Ray diffraction Image is averaged over: –Space (many copies) –Time (of the diffraction experiment)

The Protein Data Bank ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213 ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214 ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215 ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216 ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217 ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218 ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219 ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220 ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221 ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222 ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223 ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224 ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225 ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226 ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227 ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228 http://www.rcsb.org/pdb/

Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)

Similar presentations

Presentation on theme: "Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)

Similar presentations

Presentation on theme: "Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)"— Presentation transcript:

Similar presentations

About project

Feedback