Protein Tertiary Structure Prediction Structural Bioinformatics.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure, Databases and Structural Alignment
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
BMI 731 Protein Structures and Related Database Searches.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure prediction: Homology modeling
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Homology 3D modeling Miguel Andrade Mainz, Germany Faculty of Biology,
Protein dynamics Folding/unfolding dynamics
Protein dynamics Folding/unfolding dynamics
Protein Folding and Protein Threading
Protein Structures.
Homology Modeling.
Protein structure prediction.
Protein structure prediction
Presentation transcript:

Protein Tertiary Structure Prediction Structural Bioinformatics

Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded polypeptide chain The Different levels of Protein Structure

3 PDB: Protein Data Bank DataBase of molecular structures : Protein, Nucleic Acids (DNA and RNA), Structures solved by X-ray crystallography NMR Electron microscopy

4 RCSB PDB – Protein Data Bank

How can we view the protein structure ? Download the coordinates of the structure from the PDB Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage Upload the coordinates to the viewer

Pymol example Launch Pymol Open file “1aqb” (PDB coordinate file) Display sequence Hide everything Show main chain / hide main chain Show cartoon Color by ss Color red Color green, resi 1:40 Help

Predicting 3D Structure –Comparative modeling (homology) Based on structural homology –Fold recognition (threading) Outstanding difficult problem Based on sequence homology

Comparative Modeling Similar sequences suggests similar structure

Sequence and Structure alignments of two Retinol Binding Protein

Structure Alignments The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another. Low values of RMSD mean similar structures There are many different algorithms for structural Alignment.

Dali (Distance mAtrix aLIgnment) DALI offers pairwise alignments of protein structures. The algorithm uses the three- dimensional coordinates of each protein to calculate distance matrices comparing residues. See Holm L and Sander C (1993) J. Mol. Biol. 233: SALIGN

Comparative Modeling Builds a protein structure model based on its alignment to one or more related protein structures in the database Similar sequence suggests similar structure

Comparative Modeling Accuracy of the comparative model is related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% modeled <30% sequence identity =low accuracy (many errors)

Homology Threshold for Different Alignment Lengths Alignment length (L) Homology Threshold (t) A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L. The threshold values t(L) are derived from PDB

Comparative Modeling Similarity particularly high in core –Alpha helices and beta sheets preserved –Even near-identical sequences vary in loops

Comparative Modeling Methods MODELLER (Sali –Rockefeller/UCSF) SCWRL (Dunbrack- UCSF ) SWISS-MODEL

Comparative Modeling Modeling of a sequence based on known structures Consist of four major steps : 1.Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST 2. Aligning sequence with the templates 3. Building a model 4. Assessing the model

Fold Recognition

HemoglobinTIM Protein Folds: sequential and spatial arrangement of secondary structures

Similar folds usually mean similar function Homeodomain Transcription factors

The same fold can have multiple functions Rossmann TIM barrel 12 functions 31 functions

Fold Recognition Methods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity. Search for folds that are compatible with a particular sequence. "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence

Basic steps in Fold Recognition : Compare sequence against a Library of all known Protein Folds (finite number) Query sequence MTYGFRIPLNCERWGHKLSTVILKRP... Goal: find to what folding template the sequence fits best There are different ways to evaluate sequence-structure fit

MAHFPGFGQSLLFGYPVYVFGD... Potential fold... 1)... 56)... n) There are different ways to evaluate sequence-structure fit

Programs for fold recognition TOPITS (Rost 1995) GenTHREADER (Jones 1999) SAMT02 (UCSC HMM) 3D-PSSM

Ab Initio Modeling Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Practically nearly impossible WHY ? –Exceptionally complex calculations –Biophysics understanding incomplete

Ab Initio Methods Rosetta (Bakers lab, Seattle) Undertaker (Karplus, UCSC)

CASP - Critical Assessment of Structure Prediction Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally. Current state - –ab-initio - the worst, but greatly improved in the last years. –Modeling - performs very well when homologous sequences with known structures exist. –Fold recognition - performs well.

What can you do? FOLDIT Solve Puzzles for Science A computer game to fold proteins

What’s Next Predicting function from structure

Structural Genomics : a large scale structure determination project designed to cover all representative protein structures Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998) ATP binding domain of protein MJ0577

As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved Wanted ! Automated methods to predict function from the protein structures resulting from the structural genomic project.

Approaches for predicting function from structure ConSurf - Mapping the evolution conservation on the protein structure

Approaches for predicting function from structure PFPlus – Identifying positive electrostatic patches on the protein structure

Approaches for predicting function from structure SHARP2 – Identifying positive electrostatic patches on the protein structure

Machine learning approach for predicting function from structure Find the common properties of a protein family (or any group of proteins of interest) which are unique to the group and different from all the other proteins. Generate a model for the group and predict new members of the family which have similar properties.

Knowledge Based Approach Generate a dataset of proteins with a common function (DNA binding protein) Generate a control dataset Calculate the different properties which are characteristic of the protein family you are interested for all the proteins in the data (DNA binding proteins and the non-DNA binding proteins Represent each protein in a set by a vector of calculated features and build a statistical model to split the groups Basic Steps 1. Building a Model

Calculate the properties for a new protein And represent them in a vector Predict whether the tested protein belongs to the family Basic Steps 2. Predicting the function of a new protein

TEST CASE Y14 – A protein sequence translated from an ORF (Open Reading Frame) Obtained from the Drosophila complete Genome >Y14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHL NLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIM GQTIQVDWCFVKG G

Support Vector Machine (SVM) To find a hyperplane that maximally separates the RNA-binding from non-RNA binding into two classes Input spaceFeature space Kernel function ? new protein structure RNA binding Non-NA binding =[x1, x2, x3…] =[y1, y2,y3…]

>Y14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNI HLNLDRRTGFSKGYALVEYETHKQALAAKEALN GAEIMGQTIQVDWCFVKG G Y14 DOES NOT BIND RNA