Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.

Slides:



Advertisements
Similar presentations
Antibody Structure Prediction and the Use of Mutagenesis in Docking Arvind Sivasubramanian, Aroop Sircar, Eric Kim & Jeff Gray Johns Hopkins University,
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Direct-Coupling Analysis (DCA) and Its Applications in Protein Structure and Protein-Protein Interaction Prediction Wang Yang
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Protein Structure Prediction using ROSETTA
Protein Planes Bob Fraser CSCBC Overview Motivation Points to examine Results Further work.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
Protein structure determination. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography,
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
IV. Protein Structure Prediction and Determination Methods of protein structure determination Critical assessment of structure prediction Homology modelling.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University
Representations of Molecular Structure: Bonds Only.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Modelling protein tertiary structure Ram Samudrala University of Washington.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Structural and sequence features of beta-turns in beta-hairpins
Protein Planes Bob Fraser CSCBC 2007.
Volume 19, Issue 8, Pages (August 2011)
Prediction of Protein Structure and Function on a Proteomic Scale
Protein dynamics Folding/unfolding dynamics
Protein Structures.
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Volume 20, Issue 3, Pages (March 2012)
Probing the “Dark Matter” of Protein Fold Space
Volume 19, Issue 8, Pages (August 2011)
Protein structure prediction
Presentation transcript:

Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute for Medical Research Mill Hill, London Sixth International Conference on Bioinformatics InCoB2007 HKUST, Hong Kong 27 th – 30 th August 2007

Functional site prediction - applications:  To predict function of the protein (Pazos & sternberg, 2004; PNAS 101: )  In protein – protein docking: To select the near- native docked solution. (Chelliah et al., 2006; JMB 357: ).  In sequence-structure homology recognition and to improve alignment accuracy (chelliah et al., 2005; Proteins 61:722-31)

Gene sequence Protein sequence Protein structure Xray/NMR Predict structure: De-novo/ab-initio select correct models Protein structure Functional site prediction

Overview  De-novo protein structure prediction method (decoy generation)  Functional site prediction method  Evaluating models  Conclusions

Top 1/3 C  models Threading Fold Generation and scoring Top 100+N Refinement Top 100+N IDEAL FORMS SEQUENCE ALIGNMENT Predicted sec. structure Predicted Res. burial STRUCTURE PATTERNS Secondary structure ‘stick’ level Residue level Main-chain level Top 200 models De-novo protein structure prediction method * Taylor (2002). Nature. 416:

 Biochemically important residues are typically found in close proximity and are also highly conserved.  Functional site prediction is done using CRESCENDO * (gives scores for each residue position). * Chelliah, V., L. Chen, et al. (2004). J Mol Biol 342(5): Functional site prediction method

Observed substitution pattern for each amino acid (p) at t th position (sp1+sp2+sp3+sp+…+sp N)/N = Expected substitution pattern for each amino acid (q) at t th position Environment specific substitution table sp1 sp2 sp3 sp4 sp- spN Divergent score between the observed (p) and expected (q) substitution table Multiple sequence alignment of the homologous sequences: structure based sequence alignment Alignment position ……………….. CRESCENDO: Functional site prediction method * * Overington et al., (1992). Protein Science 1:216-26

Assumptions Correct or near-native like models will have the critical residues important for binding (identified by CRESCENDO) to be in close proximity to each other. i.e. Functional residues in the correct models form clusters Functional residues in the incorrect models might be scattered. Can correct and incorrect models be distinguished by looking at how the functional residues are packed in the models?

F1F2F3F4Fn 200 decoy models Classify based on fold types ---- SAP * Cluster: rmsd- ≤2 Å & PID ≥60% cut-off Average C  coordinate of models of each cluster is used to find the pair-wise distance between residues. * Taylor (1999). Prot. Sci. 8: Clustering of models

Model score  Pair-wise distance and product of CRESCENDO scores between each pair of residues (that are at least 8 residues apart in the linear sequence) are calculated.  The number (in %) of pair of residues that are within the spatial distance of 12 Å, in the top 40 pairs (based on product of CRESCENDO scores) was calculated.  The percentage scores were added in each step (in steps of 5 pairs) to get the final score of the models.

2trxA- 34 clusters (with ≤ 2Å rmsd and ≥ 60% PID) were obtained from 81 correct models Good and poor models of same fold type Why clustering between models of same type needed? Why clustering between models of same type needed? Function site prediction differs between models of same type due to a) difference in loop conformation, b) beta strand or helix shift even by a single residues. So, even correct folds might have poor models (based on site prediction).

N-term C-term Helix and strand order: H1(1,5);S2(2,1,3,4,5);H3(2,3,4) S1 S2 S3 S4 S5 H1 H2 H3 H4 H5 3chy 1 2 3

Proximity plot:3chy Best model in each foldtype native Correct model

Fold type Strand and helix order No. of models in each fold type in 200 models No. of cluster with ≤ 2Å rmsd; ≥60% PID cut-off Score of the best model nativeH1(1,5);S2(2,1,3,4,5);H3(2,3,4) F1H1(1,5);S2(2,1,3,4,5);H3(2,3,4) F2H1(1*,5);S2(2,1*,3,4,5);H3(2,3,4) F3H1(1,5);S2(2,3,1,4,5);H3(2,3,4) F4H1(1,3*,4*);S2(2,1,3*,4*,5*);H3(2*,5*) F5H1(1,4);S2(2,3,1,4,5*);H3(2,3,5*) F6H1(1,3,5);S2(2,1,4,3,5);H3(2,4) F7H1(1,5);S2(2,1,3,4,5);H3(2,3,4) F8H1(1,5*);S2(2,1,3,4,5);H3(2,3,4) Decoy fold distribution for 3chy

Summary plot: 3chy

PDB (length) Top 4 ranking models – rmsd (PID %) Rank-1Rank-2Rank-3Rank-4 3chy (128)3.6 (70.6)3.8 (63.0)4.8 (65.6)8.83 (21.8) 1cozA (126)6.5 (63.1)9.0 (77.3)14.2 (61.3)7.9 (80.3) 2trxA (108)4.7 (100.0)12.9 (77.6)14.1 (100.0)5.6 (100.0) 1f4pA (148)5.9 (80.1)5.3 (82.9)5.8 (100.0)14.6 (100.0) 1di0A (147)4.6 (82.5)16.2 (71.4)16.1 (96.5)5.8 (53.6)

PDB (length) Top 4 ranking models – rmsd (PID %) Rank-1Rank-2Rank-3Rank-4 1v9w (130)13.4 (100.0)11.3 (76.2)6.9 (77.7)6.3 (100.0) 1rlj (135)13.7 (94.8)4.9 (88.0)11.2 (94.3)13.7 (100.0) 1kjnA (159)3.4 (26.3)5.0 (59.7)5.0 (62.6)9.4 (5.8) 1vq1A (178)8.5 (80.3)9.5 (100.0)7.1 (89.5)7.9 (94.3) 1uxoA (186)13.7 (90.7)11.4 (100.0)8.9 (100.0)11.8 (94.6) 1t57A (186)14.8 (32.0)9.8 (77.2)14.9 (96.2)9.9 (92.2) 1vk2A (187)16.4 (100.0)14.7 (100.0)14.5 (97.3)15.9 (94.2)

correctincorrect H5 Thioredoxin: 2trxA Rank 1Rank 4 Rank 10 (last)

Conclusions The requirement of proteins to form functional sites - used to select the correct protein fold. In larger proteins, difficult due to the conformation of longer loop The competing incorrect folds - mostly strand swapped models. Discriminates between incorrect fold and correct efficiently when the direction of secondary structure element that contain functional residues is altered and when the fold is messy.

Thanks to Dr Willie Taylor National Institute for Medical Research, Mill Hill, London, UK. Prof Sir Tom Blundell Department of Biochemistry, University of Cambridge, Cambridge, UK.