Comparative Protein Modeling Jason Wiscarson ( Lloyd Spaine ( Comparative or homology modeling, is a computational.

Slides:

Advertisements

Similar presentations

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.

Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.

Sequence Similarity Searching Class 4 March 2010.

Heuristic alignment algorithms and cost matrices

Protein structure (Part 2 of 2).

Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.

Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.

The Protein Data Bank (PDB)

. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.

1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.

Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.

Proteins are made by linking amino acids Protein Structure Review and Refinement Introduction Brian Bahnson Dept of Chemistry & Biochemistry, University.

BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.

Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.

Multiple Sequence Alignments

Bioinformatics (3 lectures) Why bother about proteins/prediction What is bioinformatics Protein databases Making use of database information –Predictions.

Introduction to Bioinformatics From Pairwise to Multiple Alignment.

Protein Structures.

Bioinformatics Ayesha M. Khan Spring 2013.

Chapter 5 Multiple Sequence Alignment.

Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.

Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.

Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.

Empirical energy function Summarizing some points about typical MM force field In principle, for a given new molecule, all force field parameters need.

Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.

Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.

COMPARATIVE or HOMOLOGY MODELING

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.

Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.

Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.

Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.

Construction of Substitution Matrices

Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.

Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.

Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009

Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.

Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.

Structure prediction: Homology modeling

Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.

Predicting Protein Structure: Comparative Modeling (homology modeling)

Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.

Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.

EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.

Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.

Protein Structure Prediction Graham Wood Charlotte Deane.

Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.

Construction of Substitution matrices

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.

Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.

Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.

Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.

Bioinformatics Overview

PROTEIN MODELLING Presented by Sadhana S.

Computational Structure Prediction

Protein Structure Prediction and Protein Homology modeling

Comparison of Exemplars of Rotamer Clusters Across the Proteinogenic Amino Acids

Sequence Based Analysis Tutorial

Protein Structures.

Homology Modeling.

Protein structure prediction.

Homology modeling in short…

Presentation transcript:

Comparative Protein Modeling Jason Wiscarson ( Lloyd Spaine ( Comparative or homology modeling, is a computational tool used to predict three-dimensional structure of proteins with unknown structures. If the sequence and the protein share sequence similarity, proteins with known 3-D structures may serve as templates to predict the unknown protein structure. The term “homology” refers to evolutionary relationship between two or more proteins that have the same ancestor in an evolution tree regardless of their sequence similarity. Proteins from similar families often have similar functions, yet there are many instances in which proteins have similar structure but different functions. Therefore the process to construct 3-D models of proteins shown in Figure 1 is paramount. [1] Esposito, E. X.; Tobi, D.; Madura, J. D. “Comparative Protein Modeling” Reviews in Computational Chemistry, Volume 22, 2006, Wiley-VCH, John Wiley & Sons, Inc. – to be published. [2] Ramachandran Plot and analine structure: Introduction Finding related sequences and structures Sequence Alignment Selecting Templates and Improving Alignments References In comparative protein modeling several databases are used to find genomic, amino acid, and protein data. The Expert Protein Analysis System (ExPASy) is the start for searching for proteins and their related sequences. Swiss-Prot contains data that has been refined by removing unnecessary information and TrEMBL receives and stores initial genomics data. PROSITE uses tertiary structure and key amino acid residues based on biologically significant patterns. ENZYME retrieves an enzyme’s recommended name, alternative names, catalytic activity, cofactors, human genetic diseases, and cross-references. SWISS-MODEL holds comparative protein models that do not have a known 3-D structure. Basic Local Alignment Search Tool (BLAST) uses protein sequence to search and analyze the sequences of interest; locates similar protein sequences: sequence alignments. Protein Data Bank (PDB) is a repository for experimentally determined protein 3-D structures. Side-Chains with Rotamer Library (SCWRL) determines the most likely side-chain conformations by 1)Reading the initial structure and determining possible low energy side-chain conformations (rotamers). 2)Defining disulfide bridges and performing a dead-end elimination to get rid of rotamers. 3)Constructing a residue graph and determining the rotamer clusters and outputing the final structure. Molecular Mechanics (MM) is a method that removes repulsive contacts between side chains by allowing the side chains to relax to low-energy rotamers. Molecular Dynamics (MD) simulation involves: 1.Warm-up, equilibrium, cool down 2.Sampling the trajectory during a “production” run time period and analyzing results. Molecular Dynamics with Simulated Annealing (MD- SA) is an optimization method that works by heating a system, samples many energy states, and then slowly cools the system to ensure that the low-energy structures are found. Sequence Alignment and Modeling System with Hidden Markov Models (SAM)-T02 provides sequence alignment from the target sequence to all templates in steps: 1.Find sequences similar to the target sequence. 2.Predict the secondary structure. 3.Find probable templates for threading. 4.Align the target with the templates. 5.Construct a fragment library for the target. 6.Build a 3-D model of the target. Threading different proteins that have similar structures 1.Creates pseudo-protein models based on solved proteins. 2.Calculates energy value for the pseudo-protein models. 3.Ranks the alignments based on that energy value. Evaluating Protein Models Constructing Protein Models Satisfaction of Spatial Restraints (SSR) constructs a 3- D protein model using spatial restraints based on distances, bond angles, dihedral angles, dihedral pairs, etc. Segment Match Modeling (SMM) constructs protein by: 1.Choosing protein template. 2.Building list of possible template matches 3.Sorting templates by best fit to target’s structure. 4.Using probabilities to select the “best segment” from a low pseudo-energy subset group. 5.Moving coordinates from best segments template protein. Multiple Template Method (MTM) uses solved X-ray structures to build the target sequence’s protein model. 3D-JIGSAW creates a homology model: 1.Select and align templates, based on sequence. 2.Select template segments. 3.Create backbone (framework, scaffold). 4.Add side chains, refine and evaluate target protein model. Protein Model Refinement Alignment based on evolutionary history is done to amino acid residues of target protein. The types of alignment are: a)Global alignment of regions that lack similarity and then search for similar regions. b)Local alignment in regions with significant similarity first, and then align regions of optimally aligned residues. To prepare sequences a database Sequence to Coordinates (S2C) is used to examine the differences that originate from the mutagenesis studies. Alignment programs differ in the methods used but they score or evaluate the final alignment using gap penalties, similarity matrices and alignment scores. Similarity Matrices describe the probability of a specific amino acid residue mutating to a different residue type. Common similarity matrices include : 1.Point-Accepted Mutation per 100 amino acid residues (PAM), is based on the probability of an amino acid residue mutating to another amino acid residue. 2.BLOck SUstitution Matix (BLOSUM) matrices is similar to PAM but uses more diverse set of sequences. 3.Gonnet similarity matrices index and reorganize amino acids using a tree on small cluster of computers. Clustal is an alignment program that aligns large sequences of varying similarity quickly. Sequences are progressively aligned based on the branching order in the phylogenetic tree. Tree-Based Consistency Objective Function for Alignment Evaluation (T-Coffee) is a method to rectify progressive-alignment (heuristic) methods where errors in the first alignment cannot be corrected as other sequences are added to the alignment. It suffers from greediness, its inability to correct errors (addition or extension of a gap). Divide-and-Conquer Alignment (DCA) method aligns sequences simultaneously. It uses the multiple sequence simultaneously (MSA) methodology. The first step is to improve the alignment and select the template. This is where the sequence of interest (target) and other sequences and structures (template) are aligned. Afterwards, the best templates are chosen based on evolutionary distance as determined by a phylogenic tree. Selecting Templates: structure for a protein model is done by considering R-factor (residual index), the value that relates how well predicted structure matches experimental electron density maps. Improving Sequence Alignment With Primary and Secondary Structure Analysis is used to reveal regions rich in proline, glutamic acid, serine, and threonine (PEST regions)  locate sequence repeats; predict percentage of buried versus accessible residues; and provide information about protein’s isoelectric point. Pattern and Motif-Based Secondary Structure Prediction: AA sequence  3D structure. Well-known pattern and motif-based secondary structure prediction methods include PSIPRED, GenTHREADER, PREDATOR, PROF, MEMSAT, and PHD. Several methods exist to check imperfections in the models including: PROCHECK which does statistical checks and indicates regions of a protein structure that might require modification because of nonoptimal stereochemistry. Verify 3D scores 3-D models with probability table and assess probability that each amino acid residue would occupy specific position in the 3-D structure. ERRAT examines nonbonded distances of C-C, C-N, C-O, N- N, N-O, and O-O atoms. Protein Structure Analysis (ProSa) uses potential of mean force which is change in potential energy of a system caused by the variation of a specific coordinate to locate the regions of the protein structure that may contain improper or unsuitable geometries. Protein Volume Evaluation (PROVE) uses computed volume of individual atoms as a means of evaluating the viability of a protein model. Model Clustering Analysis uses NMRCLUST, NMRCORE, and OLDERADO which are programs that aid in the superposition and clustering of protein structure. Figure 1 Flow chart that shows construction of comparative protein models. The solid lines represent comparative modeling steps, and dotted lines represent parameters (template, alignment, construction environment, or refinement method) that can improve the quality of the protein model Find known sequences and 3-D structures related to the target protein Align the target and template amino acid residues Select templates and adjust/improve the alignments Construct Model Refine Model Refine Model Evaluate Model Evaluate Model Final Model Final Model Figure 2 Peptide bonds create rigid plates which rotate about phi and psi. Figure 3 A Ramachandran plot for the tripeptide in Figure 2.