Lecture 10 – protein structure prediction. A protein sequence.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Structure Prediction
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Methods for Protein Structure Prediction Ying Xu.
Strict Regularities in Structure-Sequence Relationship
Protein Structure Modeling (2). Prediction
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Protein Tertiary Structure Prediction
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
The 7 steps of Homology modeling. 1: Template recognition and initial alignment.
Introduction to Structural Bioinformatics Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein threading Structure is better conserved than sequence
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Homology Modeling Seminar produced by Hanka Venselaar.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structural Prediction. Protein Structure is Hierarchical.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Computer-Assisted Drug Design (1) i)Random Screening ii)Lead Development and Optimization using Multivariate Statistical Analyses. iii)Lead Generation.
Macromolecular structure
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Bioinformatics 2 -- Lecture 8 More TOPS diagrams Comparative modeling tutorial and strategies.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein structure – introduction “Bioinformatics: genes, proteins and computers” Orengo, Jones and Thornton (2003).
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
PROTEIN MODELLING Presented by Sadhana S.
Protein Structure Prediction and Protein Homology modeling
Protein Folding and Protein Threading
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Lecture 10 – protein structure prediction

A protein sequence

>gi| |ref|NP_ | unknown protein; protein id: At1g [Arabidopsis thaliana] MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM RDHVHGKATNHEDLTCATEEARIISWENLQKAKAEAAIRKLEKYFPQMKLEKKRSSSMEKIMRKVKSAEKRAEEMRRSVL DNRVSTASHGKASSFKRSGKKKIPSLSGCFTCHVF

Protein Structure Heparin docking – Red: heparin; blue: central domain Yellow: C-terminal domain

A Protein Structure alpha-helix beta-sheet loop core

Domain and Folds A discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function. Most proteins have multi-domains. The core 3D structure of a domain is called a fold. There are only a few thousand possible folds.

Protein Similarity Level Family –The proteins in the same family are homologous at the sequence level. Super Family –all members of the super family should have the same overall domain architecture, i.e., the same domains in the same order Fold –The folds of two domains are similar.

Protein Folding Problem A protein folds into a unique 3D structure under the physiological condition. Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL

Relevance of Protein Structure in the Post-Genome Era sequence structure function medicine

Structure-Function Relationship Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism. A predicted structure is a powerful tool for function inference. Trp repressor as a function switch

Structure-Based Drug Design HIV protease inhibitor Structure-based rational drug design is still a major method for drug discovery.

Protein Structure Prediction Structure: Traditional experimental methods: X-Ray or NMR to solve structures; generate a few structures per day worldwide cannot keep pace for new protein sequences Strong demand for structure prediction: more than 30,000 human genes; 10,000 genomes will be sequenced in the next 10 years. Unsolved problem after efforts of two decades.

Ab initio Structure Prediction  An energy function to describe the protein obond energy obond angle energy odihedral angel energy ovan der Waals energy oelectrostatic energy  Minimize the function and obtain the structure.  Not practical in general oComputationally too expensive oAccuracy is poor

Template-Based Prediction Structure is better conserved than sequence Structure can adopt a wide range of mutations. Physical forces favor certain structures. Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel

 ~90% of new globular proteins share similar folds with known structures, implying the general applicability of comparative modeling methods for structure prediction  general applicability of template-based modeling methods for structure prediction (currently 60-70% of new proteins, and this number is growing as more structures being solved)  NIH Structural Genomics Initiative plans to experimentally solve ~10,000 “unique” structures and predict the rest using computational methods Scope of the Problem

Homology Modeling Sequence is aligned with sequence of known structure, usually sharing sequence identity of 30% or more. Superimpose sequence onto the template, replacing equivalent sidechain atoms where necessary. Refine the model by minimizing an energy function. Applicable to ~20% of all proteins.

Concept of Threading oThread (align or place) a query protein sequence onto a template structure in “optimal” way oGood alignment gives approximate backbone structure Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Template set Prediction accuracy: fold recognition / alignment

4 Components of Threading  Template library  Scoring function  Alignment  Confidence assessment

Core of a Template Core secondary structures:  -helices and  -strands

Definition of Template  Residue type / profile  Secondary structure type  Solvent assessibility  Coordinates for C  / C  RES 1 G 156 S RES 5 P 157 H RES 5 G 158 H RES 5 Y 159 H RES 5 C 160 H RES 1 G 161 S

Energy (Score) Function …YKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEW… Singleton energy: How well a residue fits a template position (sequence and structural environment): E_s Pairwise energy: How preferable to put two particular residues nearby: E_p Alignment gap penalty: E_g Total energy: E_p + E_s + E_g

Threading problem Threading: Given a sequence, and a fold (template), compute the optimal alignment score between the sequence and the fold. If we can solve the above problem, then –Given a sequence, we can try each known fold, and find the best fold that fits this sequence. –Because there are only a few thousands folds, we can find the correct fold for the given sequence. Threading is NP-hard.

Computational Methods Branch and Bound. Integer Program. –Use linear programming plus branch and bound.

ab initio threading homology

Blue Gene On December 6, 1999, IBM announced a $100 million research initiative to build the world's fastest supercomputer, "Blue Gene", to tackle fundamental problems in computational biology. More than one petaflop/s (1,000,000,000,000,000 floating point operations per second)