Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.

Slides:



Advertisements
Similar presentations
Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Putting biology to work for you: In vitro (directed) evolution and other techniques.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Thomas Blicher Center for Biological Sequence Analysis
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.
Bioinformatics (3 lectures) Why bother about proteins/prediction What is bioinformatics Protein databases Making use of database information –Predictions.
Homology Modeling Seminar produced by Hanka Venselaar.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structural Prediction. Protein Structure is Hierarchical.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
What are proteins? Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, … Linear polymer chain composed.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Lecture 10 – protein structure prediction. A protein sequence.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Department of Mechanical Engineering
Secondary structure prediction
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Sequence specific recognition of DNA by proteins. Nitrogen and oxygen exposed in the grooves can make hydrogen bonds with proteins. Different Watson/Crick.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Classwork II: NJ tree using MEGA. 1.Go to CDD webpage and retrieve alignment of cd00157 in FASTA format. 2.Import this alignment into MEGA and convert.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure prediction: Homology modeling
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
PROTEIN MODELLING Presented by Sadhana S.
University of Washington
Protein Structure Prediction and Protein Homology modeling
Protein Engineering Protein engineering Industrial enzymes (Table 8.1)
Protein Structure Prediction
Protein Structures.
Directed Mutagenesis and Protein Engineering
Homology Modeling.
Protein structure prediction.
The Three-Dimensional Structure of Proteins
Homology modeling in short…
Presentation transcript:

Protein structure prediction.

Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts within domains is higher than the number of contacts between domains. - chain continuous domains - chain discontinous domains Kinetics: domain as an independently folding unit. Physics: domain as a rigid body linked to other domains by flexible linkers. Genetics: minimal fragment of gene that is capable of performing a specific function.

Domains as recurrent units of proteins. The same or similar domains are found in different proteins. Each domain has a well determined compact structure and performs a specific function. Proteins evolve through the duplication and domain shuffling. Protein domain classification based on comparing their recurrent sequence, structure and functional features – Conserved Domain Database

Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing. Fold classification: structural similarity between folds is searched using structure- structure comparison algorithms.

Definition of protein folds. Protein fold – arrangement of secondary structures into a unique topology/tertiary structure. Example of alpha+beta proteins: TIM beta/alpha-barrel contains parallel beta-sheet barrel, closed; n=8, S=8;TIM beta/alpha-barrel strand order , surrounded by alpha-helices NAD(P)-binding Rossmann-fold domains core: 3 layers, a/b/a; parallel beta-sheet of 6 strands,NAD(P)-binding Rossmann-fold domains order

Fold recognition. Unsolved problem: direct prediction of protein structure from the physico-chemical principles. Solved problem: to recognize, which of known folds are similar to the fold of unknown protein. Fold recognition is based on observations/assumptions: -The overall number of different protein folds is limited ( folds) -The native protein structure is in its ground state (minimum energy)

Protein structure prediction flowchart Protein sequence Database similarity search Does sequence align with a protein of known structure ? Protein family analysis Relationship to known structure? Three-dimensional comparative modeling Predicted three- dimensional structural model Structural analysis Is there a predicted structure? Three- dimensional structural analysis in laboratory No Yes NoYes No From D.W.Mount

Protein structure prediction. Prediction of three-dimensional structure from its protein sequence. Different approaches: -Homology modeling (predicted structure has a very close homolog in the structure database). -Fold recognition (predicted structure has an existing fold). -Ab initio prediction (predicted structure has a new fold).

Homology modeling. Aims to produce protein models with accuracy close to experimental and is used for: -Protein structure prediction -Drug design -Prediction of functionally important sites (active or binding sites)

Steps of homology modeling. 1.Template recognition & initial alignment. 2.Backbone generation. 3.Loop modeling. 4.Side-chain modeling. 5.Model optimization.

1. Template recognition. Recognition of similarity between the target and template. Target – protein with unknown structure. Template – protein with known structure. Main difficulty – deciding which template to pick, multiple choices/template structures. Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods.

Two zones of sequence alignment. Two sequences are guaranteed to fold into the same structure if their length and sequence identity fall into “safe” zone Homology modeling zone Twilight zone Alignment length Sequence identity

2. Backbone generation. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned. If two aligned residues are the same, copy their side chain coordinates as well.

3. Insertions and deletions. insertion AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict. Approaches to loop modeling: -Knowledge-based: searches the PDB for loops with known structure -Energy-based: an energy function is used to evaluate the quality of a loop. Energy minimization or Monte Carlo.

4. Side chain modeling. Side chain conformations – rotamers. In similar proteins - side chains have similar conformations. If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions. Problem: side chain configurations depend on backbone conformation which is predicted, not real E1E1 E2E2 E3E3 E = min(E1, E2, E3)

5. Model optimization. Energy optimization of entire structure. Since conformation of backbone depends on conformations of side chains and vice versa - iteration approach: Predict rotamersShift in backbone

Classwork I: Homology modeling. -Go to NCBI Entrez, search for gi Do Blast search against PDB -Repeat the same for gi Compare the results

Fold recognition. Goal: to find protein with known structure which best matches a given sequence. Since similarity between target and the closest to it template is not high, sequence-sequence alignment methods fail. Solution: threading – sequence-structure alignment method.

Threading – method for structure prediction. Sequence-structure alignment, target sequence is compared to all structural templates from the database. Requires: -Alignment method (dynamic programming, Monte Carlo,…) -Scoring function, which yields relative score for each alternative alignment

Scoring function for threading. Contact-based scoring function depends on the amino acid types of two residues and distance between them. Sequence-sequence alignment scoring function does not depend on the distance between two residues. If distance between two non- adjacent residues in the template is less than 8 Å, these residues make a contact.

Scoring function for threading. Ala Ile Tyr Trp w is calculated from the frequency of amino acid contacts in PDB; a i – amino acid type of target sequence aligned with the position “i” of the template; N- number of contacts

Classwork I: calculate the score for target sequence “ATPIIGGLPY” aligned to template structure which is defined by the contact matrix *** 2 3* 4* 5** 6* 7* 8* 9 ** ATPYIGL A T P Y I G 0.2 L0.3

Alignment algorithms. Dynamic programming. “frozen approximation”: traceback in the alignment matrix is not possible for interactions between two amino acids, so that: b – amino acid type from template, not from target; now the score of every position does not depend on the alignment elsewhere in the sequence. Monte Carlo

CASP prediction competitions.

Threading model validation. Correct bond length and bond angles Correct placement of functionally important sites Prediction of global topology, not partial alignment (minimum number of gaps) >> 3.8 Angstroms

Placement of functionally important sites in threading. Prediction of structure of methylglyoxal synthase based on the template of carabamoyl phosphate synthase

Classwork II: Homology modeling. -Go to NCBI Entrez, search for gi Do Blast search against PDB -Repeat the same for gi Predict functionally important sites

GenThreader Predicts secondary structures for target sequence. 2.Makes sequence profiles (PSSMs) for each template sequence. 3.Uses threading scoring function to find the best matching profile.

Classwork III. -Go to -Go over the options of protein structure prediction program -Predict structure for protein sequence (“gwu_thread_seq.txt”) ad0cf784.gen.html

Protein engineering and protein design. Protein engineering – altering protein sequence to change protein function or structure Protein design – designing de novo protein which satisfies a given requirement

Protein engineering strategies. Goals: Design proteins with certain function Increase activity of enzymes Increase binding affinity and specificity of proteins Increase protein stability Design proteins which bind novel ligands

Protein engineering uses combinatorial libraries. Random mutagenesis introduces different mutations in many genes of interest. Active proteins are separated from inactive ones: - in vivo (measuring effect on the whole cell) - in vitro (phage display, gene is inserted into phage DNA, expressed, selected if it binds immobilized target protein)

Specificity of Kunitz inhibitors can be optimized by protein engineering. Kunitz domains – specific inhibitors of trypsin-like proteinases, highly conserved structure with only 33% identity. Each Kunitz domain recognizes one or more proteinases through the binding loop (yellow). Phage display method found mutants of Kunitz inhibitors which have higher specificity than native ones. Modeling of mutant proteins showed that enhanced specificity is caused by increased complementarity between binding loop and the active site.

Native state can be stabilized by reducing the difference in entropy between folded and unfolded conformations U F G Reaction coordinate ΔGΔG

Proteins can be made more stable by protein engineering. Three approaches to increase stability: Reduce the difference in entropy between folded and unfolded conformations Stabilize secondary structures Increase the number of hydrophobic interactions in the interior core

Model system: lysozyme from bacteriophage T4. Lysozyme has the ability to lyse certain bacteria by hydrolyzing the b-linkage between N-acetylmuramic acid (NAM) and N-acetylglucosamine (NAG) of the peptidoglycan layer in the bacterial cell wall. Conformational transition in lysozyme involves the relative movement of its two lobes to each other in a cooperative manner

Disulfide bridges increase protein stability. Increasing stability by reducing the number of unfolded conformations (since enthalpic contribution will be the same for folded and unfolded states). Task: to find positions on backbone where Cysteines can be introduced for disulfide bonds formation.

Strategy of introducing a new disulfide bond. B. Mathews, 1989: Analysis of disulfide bonds geometries in existing structures. Analysis of all pairs of amino acids which are close in space. Energy optimization of candidate disulfide bonds. Analysis of destabilizing effect of exchanging native amino acids into Cys. As a result: three disulfide bonds were introduced through mutagenesis experiments in lysozyme

Stability of mutants compared to wild- type protein. Measure of stability – melting temperature at which 50% of enzyme is inactivated during reversible heat denaturation. For wild-type Tm = 42 C. all mutants were more stable than wild-type. the longer the loop between Cys, the larger the effect (the more restricted is unfolded state). the more disulfide bonds were introduced, the more stable was the mutant. From B. Mathews et al

Attempts to fill cavities to stabilize lysozyme failed… Introduction of cavities of size –CH 3 group destabilizes protein by ~ 1kcal/mol. T4 lysozyme has two cavities; mutations Leu  Phe and Ala  Val destabilize the protein by ~ kcal/mol. New side-chains (Val and Phe) adopt unfavorable conformations in cavities.

Classwork IV: analyzing the lysozyme’s mutants. Retrieve structure neighbors (1PQM and 1KNI) of 2LZM. Which mutant might have an increased stability and why?

Can structural scaffolds be reduced in size with maintaining function? A.Braisted & J.A. Wells used Z-domain (58 residues) of bacterial protein A: removed third helix (truncated protein - 38 residues); mutated residues in the first and second helices; used phage display to select active forms; restored the binding of truncated protein.

Designing an amino acid sequence that will fold into a given structure. Inverse protein folding problem: designing a sequence which will fold into a given structure – much easier than folding problem! B. Dahiyat & S. Mayo: designed a sequence of zinc finger domain that does not require stabilization by Zn. Wild type protein domain is stabilized by Zn (bound to two Cys and two His); mutant is stabilized by hydrophobic interactions.

Paracelsus challenge: convert one fold into another by changing 50% of residues. Challenge because all proteins with > 30% identity seem to have the same fold. L.Regan et al: Protein G (mainly beta-sheet) was converted to Rop protein (alpha-helical) by changing only 50% residues