Protein Structure Prediction Graham Wood Charlotte Deane.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
IV. Protein Structure Prediction and Determination Methods of protein structure determination Critical assessment of structure prediction Homology modelling.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Representations of Molecular Structure: Bonds Only.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Step 3: Tools Database Searching
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Computational Structure Prediction
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein Structures.
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Protein Structure Prediction Graham Wood Charlotte Deane

The problem - in brief MVLSEGEWQL VLHVWAKVEA DVAGHGQDIL … AKYKELCYOG Databases Algorithms Software +=

Why is protein structure prediction needed? Essential functioning of cells is mediated by proteins It is protein structure that leads to protein function 3D structure determination is expensive, slow and difficult (by X-ray crystallography or NMR) Assists in the engineering of new proteins

Terminology Target - the unknown structure you are trying to model Parent - a known structure which provides a basis for modelling

The problem- more detail Configuration space Energy EKGPDLYLIPLT Protein databases EKGPDLYLIPLT Biologist Physicist

CASP Critical Assessment of Structure Prediction Jan-Apr May Jun Jul Aug Sept Oct Nov Dec Biologists Caspers Organisers Call for structures Publish seqs on web Give sequences to organisers Structure determination Give structures to organisers Predict structure from sequence Expert assessment 4 day mtg

Degree of evolutionary conservation Less conserved Information poor More conserved Information rich DNA seqProtein SeqStructureFunction ACAGTTACAC CGGCTATGTA CTATACTTTG HDSFKLPVMS KFDWEMFKPC GKFLDSGKLG

Three main approaches (in order of current success) 1.Comparative modelling 2.Fold recognition 3.De novo

Comparative modelling Conserved backbone Energy EKGPDLYLIPLT Target Close homologues Variable backbone Side chains

Comparative modelling (protein building) 1.Prepare the raw materials 2.Build the model (two methods) 3.Check the model 4.Accept or reject the model

C1: Preparing the raw materials Structurally align parents Align target to parents EKGPDLYLIPLT Given target AA sequence Identify parents (homologues)

loop region secondary structure region Structurally conserved regions and structurally variable regions SCR SVR

C2: Building (choice of two methods) Attach and orient side-chains Refine model Determine SCRs and build associated backbone Determine SVRs and build rest of backbone Assemble fragmentsUse spatial restraints

C2: Building (choice of two methods) Orient side-chains Refine model Determine SCRs and build associated backbone Determine SVRs and build rest of backbone Assemble fragmentsUse spatial restraints Optimally satisfy spatial restraints

D T N V A Y C N K D

C3: Test model (C4: then accept or reject) Examine the model in the light of all experimental data PROCHECK, VERIFY3D, PROSA II, Visual inspection using 3D software, JOY

Problems in comparative modelling Aligning the target to the parents The packing of secondary structure elements in the core The long insertions and deletions in the structurally variable regions

Fold Recognition ? Target

Fold recognition Energy EKGPDLYLIPLT Target Structurally similar proteins

Fold recognition (protein finding) 1.Obtain library of non-duplicate folds 2.Perform sequence-structure alignment 3.Assess success of alignment Biologist – use substitution matrix Physicist – use potentials 4.Accept or reject the model

Sequence-structure alignment 1. Construct sequence profile 2. Use profile to score the sequence TargetParent BLASTP OWLMULTAL Dynamic programming algorithm Score

Amino acid substitutions are constrained by local environments Different substitution patterns Environment-specific substitution tables

Main-chain conformation and secondary structure (α-helix, β-strand, coil and positive φ) Solvent accessibility (accessible and inaccessible) Hydrogen bonds (side-chain to main-chain NH, side-chain to main-chain CO and side-chain to side-chain) Definition of local environments

Substitution scores Background probability of observing amino acid b, match occurring by chance Log odds score scaled to the nearest integer Probability that amino acid a in environment E is replaced by amino acid b Frequency of observing amino acid a in environment E replaced by b

Scoring with potentials Energy potential Solvation potential

The Novel Fold Problem ? asdghklprtwecvmnasetyasdghklprtwecvmnasety

De novo – new fold methods Energy EKGPDLYLIPLT Segment configurations Sets of local configurations

Defining a “New Fold” CATH –Somewhat objective SCOP –No objective definition –Tends towards evolutionary relationships Ask A. Murzin

New fold approach All structure information is in the AA sequence (Anfinson, Science, 1973) Seek “lowest free energy conformation” Tactic is to simplify the problem, for example Simplified model of protein (one atom per residue) Simple or knowledge based potential function Assist in detecting distant homologues

New fold recognition (structure discovery) 1.Set up domain and objective function 2.Perform optimisation 3.Check the model 4.Accept or reject the model

De Novo (biologist) ROSETTA (Baker et al.) Domain of objective function sequence 9 residues... Set of local structures consistent with local sequence

De Novo (biologist) ROSETTA Objective function to be maximised constant Function of energy

De Novo (biologist) ROSETTA Maximising the probability of the sequence 1.Choose each local conformation and start with a fully extended chain 2.Generate a neighbouring conformation 3.Accept in simulated annealing style, using P(structure|sequence) 4.Do this many times and cluster results – use centre of largest cluster as prediction

De Novo (physicist) ASTROFOLD (Floudas et al.) 1.Predict α-helices and β-strands 2.Predict β-sheets and disulphide bridges using ILP 3.Use deterministic global optimisation, with energy function and constraints to predict tertiary structure

Testing of prediction servers - LiveBench SensitivitySpecificityAdded Value ServerTypeEasyHardAllHardEasyHard Pcons2Consensus ShotGun on 5Consensus ShotGun on 3Consensus Shotgun-INBGUThreading INBGUThreading Fugue3Threading Fugue2Threading Fugue1Threading mGenTHREADERThreading GenTHREADERThreading D-PSSMThreading ORFeusSequence FFASSequence Sam-T99Sequence SuperfamilySequence ORF-BLASTBLAST PDB-BLASTBLAST BLAST 18

Review - comparative modelling Conserved backbone Energy EKGPDLYLIPLT Target Close homologues Variable backbone Side chains

Review - fold recognition Energy EKGPDLYLIPLT Target Structurally similar proteins

Review - new fold methods Energy EKGPDLYLIPLT Segment configurations Sets of local configurations

Summary: Prediction Methods Comparative modelling –There exists a protein with clear homology –PSI-BLAST Fold recognition –There exists a protein of similar fold (analogy) –DALI (CATH & SCOP) Novel Fold methods –The sequence has a new fold Better methods needed yet for it all to be useful!