Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.

Slides:



Advertisements
Similar presentations
Homology Based Analysis of the Human/Mouse lncRNome
Advertisements

Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Heuristic alignment algorithms and cost matrices
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Thomas Blicher Center for Biological Sequence Analysis
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Protein Modules An Introduction to Bioinformatics.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Similar Sequence Similar Function Charles Yan Spring 2006.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Scaffold Download free viewer:
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Classification of protein and domain families Sequence to function Protein Family Resources and Protocols for Structural and Functional Annotation of Genome.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
METHOD: Family Classification Scheme 1)Set for a model building: 67 microbial genomes with identified protein sequences (Table 1) 2)Set for a model.
Prediction of Protein Structure and Function on a Proteomic Scale
Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets  Benjamin P. Lewis, Christopher B. Burge,
Molecular Modeling By Rashmi Shrivastava Lecturer
Basic Local Alignment Search Tool
AnchorDock: Blind and Flexible Anchor-Driven Peptide Docking
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein backbone in the design algorithm is necessary to capture the behaviour of real proteins and is a prerequisite for the accurate exploration of sequence space. We present a broad exploration of protein sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. An application is demonstrated, wherein designed sequences are used to increase the utility of comparative modeling, in place of natural sequence homologues. Results We designed hundreds of thousands of diverse sequences for 264 naturally-occurring proteins, in 55 fold classes. Protein folds show distinct variation in “designability”. Our novel “reverse BLAST” approach uses designed sequence to identify up to 5-fold more high-quality structural templates for comparative modeling than standard PSI-BLAST. Reverse BLAST identifies at least one new modeling target in 41 of 49 genomes tested.

Protein design Challenges in computational protein design: choosing sufficiently accurate energy functions finding intelligent ways to efficiently search the large (O(10n)) space of protein sequences modeling peptide backbone flexibility Some highlights of the design algorithm (SPA): initial rotamer filtering step Amber/OPLS parameter set; implicit solvation amino acid baseline corrections to maintain reasonable sequence compositions genetic algorithm to search for low energy sequences to match the target structure

Peptide backbone flexibility through structural ensembles Ten representative backbone traces from the structural ensemble used in designing sequences for 1abo, the SH3 domain from Abl tyrosine kinase. The structural variants appear in yellow, with the original crystal structure backbone traced in purple. All structures are within 1 Å rmsd of each other. Designing to a structural ensemble generates more diverse sequences than fixed-backbone methods.

More non-native-like sequences are designed Distribution of identity to the native parent sequence for 253 proteins. Identity to the native sequence was calculated for the set of sequences designed using only the fixed parent backbone as a target template (all residues: black dashed line; buried residues: great dashed line) and for the set of sequences designed using a structural ensemble target (all residues: black solid line; buried residues: grey solid line). Using structural ensembles of 100 structural variants as target templates narrows and lowers the distribution of identity to the parent native sequence, indicating broader exploration of sequence space.

Overall sequence diversity is determined by the protein fold Sequence entropy distributions of designed sequences, grouped by structure into folds. The six folds are identified by their PFAM families. The relatively tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggests a) that the diversity of the designed sequence set for a structure is primarily determined by its overall fold and b) that the designability principle postulated from studies of simple models may hold in real proteins.

Designed sequences identify structural homologues accurately The E-value of the most significant hit from each of 264 “reverse BLAST” searches is plotted. Dark grey columns represent predictions that are true structural homologues; light grey columns represent false positives. Our novel “reverse BLAST searching” uses alignments of designed sequences as PSI-BLAST queries against a genome to identify structural templates for structure prediction of gene sequences. 251 of the 264 designed sequence alignments produced hits (against PDB as a test set) with E-values below 10. At a significance level of E<0.01, a commonly used threshold in comparative modeling, all hits were against true structural homologues, with 47% (124/264) coverage.

“Reverse BLAST” identifies more templates for homology modeling Light grey: the number of genes for which structural templates were identified by PSI-BLAST searching against the set of 264 structures in the test set. Dark grey: the number of novel genes for which structural templates were identified by “reverse BLAST” searching using 264 alignments of computationally designed sequences. Reverse BLAST searching identified at least one additional structural template for use in homology modeling (not identified by standard PSI- BLAST) for 41 of 49 genomes. In ten cases, the reverse BLAST method more than doubled the number of structural templates identified.

Conclusions The task of large-scale protein sequence design has been efficiently massively parallelized. Design to a structural ensemble greatly increases the diversity of sequences generated, without loss of sequence quality. Similar structures produce sequence sets of similar diversity, and the distributions of sequence entropies for different folds segregate, supporting the designability postulate seen in simple models. “Reverse BLAST searching” uses designed sequences to accurately identify structural homologues. Reverse BLAST searching allows increased identification of structural templates for homology modeling without the need for natural sequence homologues.

Future Directions Use sequence profiles for specific proteins to generate biased combinatorial libraries for protein synthesis. This will experimentally test the ability of the design algorithm to produce viable sequences. Introduce functional constraints into the design process to produce new sequences which are both stable and functional. Refine methods for generating high sequence diversity for a given structure, allowing more extensive sampling of sequence space. Use computational design to redesign peptide ligands for applications in drug discovery and understanding protein-protein interactions.