Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein threading algorithms 1.GenTHREADER Jones, D. T. JMB(1999) 287, Protein Fold Recognition by Prediction-based Threading Rost, B., Schneider,
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Thomas Blicher Center for Biological Sequence Analysis
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein-protein interactions Chapter 12. Stable complex Transient Interaction Transient Signaling Complex Rap1A – cRaf1 Interface 1310 Å 2 Stable complex:
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Macromolecular structure
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein structure prediction.. Protein domains can be defined based on: Geometry: group of residues with the high contact density, number of contacts.
Representations of Molecular Structure: Bonds Only.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Chapter 14 Protein Structure Classification
PROTEIN MODELLING Presented by Sadhana S.
Protein Structure Visualisation
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein Structure Prediction
Protein Structures.
Homology Modeling.
Protein structure prediction.
Homology modeling in short…
Presentation transcript:

Protein structure prediction.

Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing. Fold classification: structural similarity between folds is searched using structure- structure comparison algorithms.

Protein structure prediction flowchart Protein sequence Database similarity search Does sequence align with a protein of known structure? Protein family analysis Relationship to known structure? Three-dimensional comparative modeling Predicted three- dimensional structural model Structural analysis Is there a predicted structure? Three- dimensional structural analysis in laboratory No Yes NoYes No From D.W.Mount

Protein structure prediction. Prediction of three-dimensional structure of a protein from its sequence. Different approaches: -Homology modeling (query protein has a very close homolog in the structure database). -Fold recognition (query protein can be mapped to template protein with the existing fold). -Ab initio prediction (query protein has a new fold).

Homology modeling. Aims to produce protein models with accuracy close to experimental and is used for: -Protein structure prediction -Drug design -Prediction of functionally important sites (active or binding sites)

Steps of homology modeling. 1.Template recognition & initial alignment. 2.Backbone generation. 3.Loop modeling. 4.Side-chain modeling. 5.Model optimization.

1. Template recognition. Recognition of similarity between the target and template. Target – protein with unknown structure. Template – protein with known structure. Main difficulty – deciding which template to pick, multiple choices/template structures. Template structure can be found by searching for structures in PDB using pairwise sequence alignment methods.

Two zones of protein structure prediction Homology modeling zone Fold recognition zone Alignment length Sequence identity

2. Backbone generation. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned. If two aligned residues are the same, copy their side chain coordinates as well.

3. Insertions and deletions. insertion AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict. Approaches to loop modeling: -Knowledge-based: search the PDB for loops with known structures -Energy-based: an energy function is used to evaluate the quality of a loop. Energy minimization or Monte Carlo.

4. Side chain modeling. Side chain conformations – rotamers. In similar proteins - side chains have similar conformations. If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions. Problem: side chain configurations depend on backbone conformation which is predicted, not real E1E1 E2E2 E3E3 E = min(E1, E2, E3)

5. Model optimization. Energy optimization of entire structure. Since conformation of backbone depends on conformations of side chains and vice versa - iteration approach: Predict rotamersShift in backbone

Classwork: Homology modeling. -Go to NCBI Entrez, search for gi Do Blast search against PDB -Repeat the same for gi Compare the results

Fold recognition. Unsolved problem: direct prediction of protein structure from the physico-chemical principles. Solved problem: to recognize, which of known folds are similar to the fold of unknown protein. Fold recognition is based on observations/assumptions: -The overall number of different protein folds is limited ( folds) -The native protein structure is in its ground state (minimum energy)

Fold recognition. Goal: to find protein with known structure which best matches a given sequence. Since similarity between target and the closest template is not high, pairwise sequence alignment methods fail. Solution: threading – sequence-structure alignment method.

Threading – method for structure prediction. Sequence-structure alignment, target sequence is compared to all structural templates from the database. Requires: -Alignment method (dynamic programming, Monte Carlo,…) -Scoring function, which yields relative score for each alternative alignment

Target sequence Structural templates Score1Score2Score3 Protein structure prediction: target sequence is compared to structures using sequence- structure alignment Concept of threading: D. Jones et al, 1993

Target sequence Structural templates Score1 Score2Score3 Structural model of target Score3>Score2>Score1 Protein structure prediction: target sequence is compared to structures using sequence- structure alignment

Scoring function for threading. Contact-based scoring function depends on amino acid types of two residues and distance between them. Sequence-sequence alignment scoring function does not depend on the distance between two residues. If distance between two non- adjacent residues in the template is less than 8 Å, these residues make a contact.

Scoring function for threading. Ala Ile Tyr Trp “w” is calculated from the frequency of amino acid contacts in PDB; a i – amino acid type of target sequence aligned with the position “i” of the template; N- number of contacts

Classwork: calculate the score for target sequence “ATPIIGGLPY” aligned to template structure which is defined by the contact matrix *** 2 3* 4* 5** 6* 7* 8* 9 ** ATPYIGL A T P Y I G 0.2 L0.3

Evaluation of quality of structural model Correct bond length and bond angles Correct placement of functionally important sites Prediction of global topology, not partial alignment (minimum number of gaps) >> 3.8 Angstroms

Success and limitations of structure prediction Success: Accuracy scores almost doubled from CASP1 to CASP6, might be because of database size Models of small targets are very accurate Adapted from Kryshtafovych et al 2005 Limitations: Models of large and remotely related proteins are not very accurate Domain boundaries are difficult to define Models often do not provide details for functional annotation

GenThreader Predicts secondary structures for target sequence. 2.Makes sequence profiles (PSSMs) for each template sequence. 3.Uses threading scoring function to find the best matching profile.

Protein-protein interactions.

Common properties of protein-protein interactions. Majority of protein complexes have a buried surface area ~1600±400 Ǻ^2 (“standard size” patch). Complexes of “standard size” do not involve large conformational changes while large complexes do. Protein recognition site consists of a completely buried core and a partially accessible rim. Trp and Tyr are abundant in the core, but Ser and Thr, Lys and Glu are particularly disfavored. Top molecule Bottom molecule rim core

Different types of protein-protein interactions. Permanent and transient. External are between different chains; internal are within the same chain. Homo- and hetero-oligomers depending on the similarity between interacting subunits. Interface type can be predicted from amino acid composition (Ofran and Rost 2003).

Experimental methods

Verification of experimental protein-protein interactions. Protein localization method. Expression profile reliability method. Paralogous verification method.

Protein localization method. Sprinzak, Sattath, Margalit, J Mol Biol, 2003 A – A3: Y2H B: physical methods C: genetics E: immunological True positives: -Proteins which are localized in the same cellular compartment -Proteins with a common cellular role

Deane, C. M. (2002) Mol. Cell. Proteomics 1: Expression profile reliability method.

Deane, C. M. (2002) Mol. Cell. Proteomics 1: Paralogous verification method. PVM method is based on observation that if two proteins interact, their paralogs would interact. Calculates the number of interactions between two families of paralogous proteins.

Interaction databases Experiment (E) Structure detail (S) Predicted –Physical (P) –Functional (F) Curated (C) Homology modeling (H) *IMEx consortium

Protein interaction databases Protein-protein interaction databases Domain-domain interaction databases

DIP database Documents protein- protein interactions from experiment –Y2H, protein microarrays, TAP/MS, PDB 55,733 interactions between 19,053 proteins from 110 organisms. Organisms# proteins# interactions Fruit fly705220,988 H. pylori Human E. coli C. elegans Yeast492118,225 Others985401

DIP database Duan et al., Mol Cell Proteomics, 2002 Assess quality –Via proteins: PVM, EPR –Via domains: DPV Search by BLAST or identifiers / text

BIND database Records experimental interaction data 83,517 protein-protein interactions 204,468 total interactions Includes small molecules, NAs, complexes Alfarano et al., Nucleic Acids Res, 2005

Classwork. Go to DIP webpage ( mbi.ucla.edu) mbi.ucla.edu Retrieve all interactions for cytochrome C, tubulin, RNA-polymerase from yeast How many of them are confirmed by several experimental methods?

Protein interaction databases Protein-protein interaction databases Domain-domain interaction databases

InterDom database Predicts domain interactions (~30000) from PPIs Data sources: –Domain fusions –PPI from DIP –Protein complexes –Literature Scores interactions Ng et al., Nucleic Acids Res, 2003

Pibase database Records domain interactions from PDB and PQS Domains defined with SCOP and CATH All inter-domain and inter-chain distances within 6 Ǻ are considered interacting domains From interacting domain pairs, create list of interfaces with buried solvent accessible area > 300 Ǻ 2

Classwork. Go to Pibase website Select largest structural complexes, 1k73, 1i6h Compare two complexes in terms of the number of interacting domains, #interactions per node

NCBI CBM database To retrieve interactions: –Record interactions –Use VAST structural alignments to compare binding surfaces –Study recurring domain- domain interactions CBM – database of interacting structural domains exhibiting Conserved Binding Modes Shoemaker et al., Protein Sci, 2006.

Definition of CBM Interacting domain pair – if at least 5 residue-residue contacts between domains (contacts – distance of less than 8 Ǻ) Structure-structure alignments between all proteins corresponding to a given pair of interacting domains Clustering of interface similarity, those with >50% equivalently aligned positions are clustered together Clusters with more than 2 entries define conserved binding mode.

Number of interacting pairs and binding modes 833 conserved interaction types 1,798 total domain interaction types Up to 24 CBMs per interaction type Classify complicated domain pairs by CBMs Globin example: –630 pairs –2 CBMs account for majority CBMStructuresSpecies 1154Jawed vertebrates 2112Jawed vertebrates 317Clam,earthworm 44lamprey 54V.stercoraria 62Rice,soybeans 72human 82lamprey

Classwork. Retrieve structures 1GY3, 1E9H, 1OL2 Examine all interactions within and between chains/domains. How many CBMs do you find?