Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved.

Slides:



Advertisements
Similar presentations
Protein Structure.
Advertisements

Secondary structure prediction from amino acid sequence.
Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
Protein Structure Prediction
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
An Introduction to Bioinformatics Protein Structure Prediction.
Garnier-Osguthorpe-Robson
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
The Protein Data Bank (PDB)
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Structure Prediction in 1D
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Computing for Bioinformatics Lecture 8: protein folding.
Retrieving and Viewing Protein Structures from the Protein Data Base 7.88J Protein Folding Prof. David Gossard Room 3-336, x September.
Protein Structure Analysis - I
1 Computational Biology, Part 13 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
1 Computational Biology, Part 11 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Using structure alignment tools. Structure alignment View a structural alignment of the P53 1T4F protein with Catalytic And Tetramerization Domains From.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Protein structure prediction
Protein Tertiary Structure Prediction
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Tutorial 1: Getting Started with Adobe Dreamweaver CS4.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Molecular visualization
Secondary structure prediction
1 Enter the following Micro-RNA sequence into the box Run MFold and look at the results MFold Using MFold to predict RNA secondary structure
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?
Proteins Structure Predictions Structural Bioinformatics.
Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles Protein Sequence, Structure, and Function Lab v1 | Gustavo Caetano - Anolles 1.
Structural organization of proteins
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Introduction to Bioinformatics II
Computational Analysis
Protein Structure Prediction
Sequence Based Analysis Tutorial
Protein Structures.
Protein structure prediction.
Lesson 3 Bioinformatics Laboratory
Presentation transcript:

Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.

Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each residue Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each residue

Structural Propensities Due to the size, shape and charge of its side chain, each amino acid may “fit” better in one type of secondary structure than another Due to the size, shape and charge of its side chain, each amino acid may “fit” better in one type of secondary structure than another Classic example: The rigidity and side chain angle of proline cannot be accomodated in an  -helical structure Classic example: The rigidity and side chain angle of proline cannot be accomodated in an  -helical structure

Structural Propensities Two ways to view the significance of this preference (or propensity) Two ways to view the significance of this preference (or propensity)  It may control or affect the folding of the protein in its immediate vicinity (amino acid determines structure)  It may constitute selective pressure to use particular amino acids in regions that must have a particular structure (structure determines amino acid)

Secondary structure prediction In either case, amino acid propensities should be useful for predicting secondary structure In either case, amino acid propensities should be useful for predicting secondary structure Two classical methods that use previously determined propensities: Two classical methods that use previously determined propensities:  Chou-Fasman  Garnier-Osguthorpe-Robson

Chou-Fasman method Uses table of conformational parameters (propensities) determined primarily from measurements of secondary structure by CD spectroscopy Uses table of conformational parameters (propensities) determined primarily from measurements of secondary structure by CD spectroscopy Table consists of one “likelihood” for each structure for each amino acid Table consists of one “likelihood” for each structure for each amino acid

Chou-Fasman propensities (partial table)

Chou-Fasman method A prediction is made for each type of structure for each amino acid A prediction is made for each type of structure for each amino acid  Can result in ambiguity if a region has high propensities for both helix and sheet (higher value usually chosen, with exceptions)

Chou-Fasman method Calculation rules are somewhat ad hoc Calculation rules are somewhat ad hoc Example: Method for helix Example: Method for helix  Search for nucleating region where 4 out of 6 a.a. have P  > 1.03  Extend until 4 consecutive a.a. have an average P  < 1.00  If region is at least 6 a.a. long, has an average P  > 1.03, and average P  > average P   consider region to be helix

Garnier-Osguthorpe-Robson Uses table of propensities calculated primarily from structures determined by X- ray crystallography Uses table of propensities calculated primarily from structures determined by X- ray crystallography Table consists of one “likelihood” for each structure for each amino acid for each position in a 17 amino acid window Table consists of one “likelihood” for each structure for each amino acid for each position in a 17 amino acid window

Garnier-Osguthorpe-Robson Analogous to searching for “features” with a 17 amino acid wide frequency matrix Analogous to searching for “features” with a 17 amino acid wide frequency matrix One matrix for each “feature” One matrix for each “feature”   -helix   -sheet  turn  coil Highest scoring “feature” is found at each location Highest scoring “feature” is found at each location

Accuracy of predictions Both methods are only about 55-65% accurate Both methods are only about 55-65% accurate A major reason is that while they consider the local context of each sequence element, they do not consider the global context of the sequence - the type of protein A major reason is that while they consider the local context of each sequence element, they do not consider the global context of the sequence - the type of protein  The same amino acids may adopt a different configuration in a cytoplasmic protein than in a membrane protein

“Adaptive” methods Neural network methods - train network using sets of known proteins then use to predict for query sequence Neural network methods - train network using sets of known proteins then use to predict for query sequence  nnpredict Homology-based methods - predict structure using rules derived only from proteins homologous to query sequence Homology-based methods - predict structure using rules derived only from proteins homologous to query sequence  SOPM  PHD

Neural Network methods A neural network with multiple layers is presented with known sequences and structures - network is trained until it can predict those structures given those sequences A neural network with multiple layers is presented with known sequences and structures - network is trained until it can predict those structures given those sequences Allows network to adapt as needed (it can consider neighboring residues like GOR) Allows network to adapt as needed (it can consider neighboring residues like GOR)

Neural Network methods Different networks can be created for different types of proteins Different networks can be created for different types of proteins

Homology-based modeling Principle: From the sequences of proteins whose structures are known, choose a subset that is similar to the query sequence Principle: From the sequences of proteins whose structures are known, choose a subset that is similar to the query sequence Develop rules (e.g., train a network) for just this subset Develop rules (e.g., train a network) for just this subset Use these rules to make prediction for the query sequence Use these rules to make prediction for the query sequence

Retrieving 3D structures Protein Data Bank (PDB) Protein Data Bank (PDB)  using web browser  home page =  using anonymous FTP Entrez Entrez  using web browser BLAST BLAST  using web browser

Displaying Structures with RasMol The GIF image of Ribonuclease A is static - we cannot rotate the molecule or recolor portions of it to aid visualization The GIF image of Ribonuclease A is static - we cannot rotate the molecule or recolor portions of it to aid visualization For this we can use RasMol, a public domain program available for wide range of computers, including MacOS, Windows and Unix For this we can use RasMol, a public domain program available for wide range of computers, including MacOS, Windows and Unix

Displaying Structures with RasMol Drs. David Hackney and Will McClure have developed an online tutorial for RasMol - a link may be found on the , and web pages Drs. David Hackney and Will McClure have developed an online tutorial for RasMol - a link may be found on the , and web pages

PDB files In order to optimally display, rotate and color the 3D structure, we need to download a copy of the coordinates for each atom in the molecule to our local computer In order to optimally display, rotate and color the 3D structure, we need to download a copy of the coordinates for each atom in the molecule to our local computer The most common format for storage and exchange of atomic coordinates for biological molecules is PDB file format The most common format for storage and exchange of atomic coordinates for biological molecules is PDB file format

PDB files PDB file format is a text (ASCII) format, with an extensive header that can be read and interpreted either by programs or by people PDB file format is a text (ASCII) format, with an extensive header that can be read and interpreted either by programs or by people We can request either the header only or the entire file; the next screen requests the header only We can request either the header only or the entire file; the next screen requests the header only

RasMol has a graphics window and a command window

PDB Retrieval & Display Can download PDB files from Entrez Can download PDB files from Entrez Second example: Display structures of MHC proteins containing  2 -microglobulin Second example: Display structures of MHC proteins containing  2 -microglobulin

Useful RasMol commands show sequence lists all amino acids in each chain show sequence lists all amino acids in each chain select *a selects all residues in chain A select *a selects all residues in chain A colour red displays the selected residues in red colour red displays the selected residues in red

Alternatives to RasMol NCBI (providers of Entrez service) have developed a public domain 3D viewer for molecules, Cn3D (“See in 3D”) NCBI (providers of Entrez service) have developed a public domain 3D viewer for molecules, Cn3D (“See in 3D”) Integrated into Network Entrez Client Integrated into Network Entrez Client Available as a stand-alone helper application Available as a stand-alone helper application

Alternatives to RasMol It is often useful for an investigator or teacher to be able to save a series of views of one or more molecules so that they can be replayed again (creating a script for a “movie” with preprogrammed changes in rotation, color, etc.) It is often useful for an investigator or teacher to be able to save a series of views of one or more molecules so that they can be replayed again (creating a script for a “movie” with preprogrammed changes in rotation, color, etc.) Two programs that do this are CHIME and MAGE Two programs that do this are CHIME and MAGE

Alternatives to RasMol CHIME (derived from RasMol source) is available as a Browser Plugin CHIME (derived from RasMol source) is available as a Browser Plugin MAGE is available as a stand-alone helper application MAGE is available as a stand-alone helper application Information on both is available through links on a HELP page at the PDB Information on both is available through links on a HELP page at the PDB

Structural homology It is useful for new proteins whose 3D structure is not known to be able to find proteins whose 3D structure is known that are expected to have a similar structure to the unknown It is useful for new proteins whose 3D structure is not known to be able to find proteins whose 3D structure is known that are expected to have a similar structure to the unknown It is also useful for proteins whose 3D structure is known to be able to find other proteins with similar structures It is also useful for proteins whose 3D structure is known to be able to find other proteins with similar structures

Finding proteins with known structures based on sequence homology If you want to find known 3D structures of proteins that are similar in primary amino acid sequence to a particular sequence, can use BLAST web page and choose the PDB database If you want to find known 3D structures of proteins that are similar in primary amino acid sequence to a particular sequence, can use BLAST web page and choose the PDB database This is not the PDB database of structures, rather a database of amino acid sequences for those proteins in the structure database This is not the PDB database of structures, rather a database of amino acid sequences for those proteins in the structure database Links are available to retrieve PDB files Links are available to retrieve PDB files

Finding proteins with similar structures to a known protein For literature and sequence databases, Entrez allows neighbors to be found for a selected entry based on “homology” in terms (MEDline database) or sequence (protein and nucleic acid sequence databases) For literature and sequence databases, Entrez allows neighbors to be found for a selected entry based on “homology” in terms (MEDline database) or sequence (protein and nucleic acid sequence databases) An experimental feature allows neighbors to be chosen for entries in the structure database An experimental feature allows neighbors to be chosen for entries in the structure database

Finding proteins with similar structures to a known protein Proteins with similar structures are termed “VAST Neighbors” by Entrez (VAST refers to the method used to evaluate similarity of structure) Proteins with similar structures are termed “VAST Neighbors” by Entrez (VAST refers to the method used to evaluate similarity of structure) VAST or structure neighbors may or may not have sequence homology to each other VAST or structure neighbors may or may not have sequence homology to each other