CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction using ROSETTA
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Thomas Blicher Center for Biological Sequence Analysis
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Similar Sequence Similar Function Charles Yan Spring 2006.
Dali: A Protein Structural Comparison Algorithm Using 2D Distance Matrices.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Multiple Sequence Alignments
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Homology Modeling Seminar produced by Hanka Venselaar.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
JM - 1 Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction Jarek Meller Jarek Meller Division.
PROTEIN PHYSICS LECTURES 22-23
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
METHOD: Family Classification Scheme 1)Set for a model building: 67 microbial genomes with identified protein sequences (Table 1) 2)Set for a model.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Chapter 14 Protein Structure Classification
Computational Structure Prediction
Figure 2. VIPUR training ROC and PR performance
Protein Structure Prediction and Protein Homology modeling
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Support Vector Machine (SVM)
Protein dynamics Folding/unfolding dynamics
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
Protein structure prediction
Homology modeling in short…
Presentation transcript:

CRB Journal Club February 13, 2006 Jenny Gu

Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished between function or stability. To disentangle between functional and structural constraints, predicted sequence profiles generated for structural stability is compared to naturally occurring sequence profiles. Incorporates two additional measures, free energy and sequence profile difference, in addition to residue conservation to identify functional residues.

Datasets Enzyme Active Site Set (Suspicious. What about cross fold validation?)

Conservation Score Distribution Sequence conservation score calculated by SCORECONS with multiple sequence alignment from MUSCLE A) All Residue Sites B) Enzyme Active Site Set

Calculating Difference between Profiles Designed Sequence Profiles Rosetta design program Generate 40 protein sequences stable for structure. Align with PSI-blast to generate position specific scoring matrix (PSSM) Natural Sequence Profiles Euclidean distance rescaled between : 0 (high similarity) 1 (low similiarity) PSSM matrix from PSI-blast

Difference between Natural and Designed Sequence Profiles A) All residues in active sites. B) Functional residues in active sites Differences between profiles are rescaled such that: 0 - High Similarity 1 - Low similarity In other words: Selection for function vs. stability 0 - Low selection 1 - High selection

Calculating Native/Optimal Residue Energy Difference 1. Use Rosetta  G module to calculate free energy changes for each 20 amino acid substitutions at each position. 2. Compare to native  G. If functional constraints are imposed, there should be a big gap between  G.

Rosetta  G Originally developed to identify binding interface hot spots. Model based on all-atom rotamer description of side chains with energy function dominated by Lennard Jones interactions, solvation interactions, and hydrogen bonding.

Distribution of Free Energy Difference Difference between free energy of naturally occurring residue and energetically most favorable residue. (kcal/mol) A) For all residues in active sites. B) Functional residues in active sites. In other words: Positions with smaller differences have been selected for stability.

Residue Classification Combine: 1.Sequence Conservation 2.Profile Difference (Natural vs. Designed) 3.Residue Free Energy Changes (Natural vs Optimal) To classify functional vs. nonfunctional residues. Logistic regression with linear model module used to determine weights for input features.

Classification Performance Largest improvement observed with free energy measures. Inclusion of profile difference with free measures resulted in minor improvements. Combined measures reduces false positives.

Chymosin B Sequence Conservation Only Combined Measures

Arginine Kinase Sequence Conservation Only Combined Measures

Testing Generality Dataset 2 includes ligand binding sites

Comparison to another predictor

Sources of Error Sensitivity to multiple alignment quality. Loop regions are difficult to align. Functionally important residues can contribute to stability. Suggested Improvements: Better multiple sequence alignments. Spatial clustering of high scoring residues. Introducing backbone flexibility into energy calculations.

Other Approaches - Extracting from Sequence Design 1) Design procedure based on Monte Carlo simulation of amino acid substitution process. 2) Fixed substitutions based on scoring function from template structure and multiple alignment of homologs.

Other Approaches - Using Protein Homology Information 1) Identify high degree of conservation between homologous proteins. 2) Use information theory to identify positions where environment-specific substitution tables make poor prediction of overall amino acid substitution pattern. 3) Identify residues with highly conserved positions when homologous family are superposed.

Interest in this Paper Distinguishing between functional and structural constraints. Designing sequences and subsequent profiles allows us to explore an enlarge sequence space that is not captured by natural sequence. Questions: From an evolutionary perspective: 1) How does structure limit the exploration of sequence space. 2) How is sequence space expanded with structure change. 3) How do selective pressures for molten globules, flexible regions, and disordered structures impact the sequence space?

Current Domain Coverage of Genome Current perspective: Ab initio structure evolution is now difficult now that system of balance and checks is implemented. Evolution of current protein repertoire largely attributed to recombination of existing folds.

Reaching beyond structural genomics? …. With known structures: Use of Hidden Markov Model (HMM) or profile for domains to identify in genome. Evolutionary plasticity greater for loop regions than for core. Work has been done in this area. With unknown structures: Can we design a structure not currently in PDB and identify it in nature? With structures that nature “hasn’t seen before”. De novo structure designed in Maybe it already exists in nature, we just don’t know about it yet. And if it doesn’t exist, is it just a proof of principle or can we actually do something with it?