Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction using ROSETTA
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Threading Optimization Using Consensus Homology Modeling Maliha Sarwat ( ), Tasmin Tamanna Haque ( ) Department of Computer Science.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Fold Recognition Ole Lund, Assistant professor, CBS.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Thomas Blicher Center for Biological Sequence Analysis
Fold Recognition Ole Lund, Associate professor, CBS.
Protein Fold recognition
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Jianlin Jack Cheng Computer Science Department University of Missouri, Columbia, USA Mexico, 2014.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure prediction: Homology modeling
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
PROTEIN MODELLING Presented by Sadhana S.
Protein Structure Visualisation
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1

Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis, signaling messengers … Function depends on 3D structure. Easy to obtain protein sequences, difficult to determine structure. 1. 要知道結構的目的,最終就是要知道protein 的功能。 2. 蛋白質的功能是由結構所決定的。 3. 能得到一個正確的蛋白質結構是非常重要的。 4.蛋白質三維結構可以用來指引生物實驗。 此一個方法可以應用在蛋白質突變研究 (mutation)、活性位置研究(active site) 與 藥物設計等。能夠減少在實驗上、製藥 上所須的時間。 只有序列的資訊有時很難得知他的function 2 2

Where find the data? Protein Data Bank (PDB) http://www.rcsb.org/pdb/ > ~100,000 structures of proteins Text file contain: coordinates for each heavy atom from the first residue to the last X Y Z 1. 蛋白質的結構,主要儲存在protein data bank 的 databases 裡,簡稱PDB。 2. 儲存的格式。 3. 記錄每個原子三度空間的座標。 結構預測時,很多時候都必須要參考到現有已知的結構。 3 3

PDB Statistics 4

TIM barrel 5

How to determine the protein structure? By experimentation X-Ray NMR (nuclear magnetic resonance spectroscopy) Sequence-Structure gap 1. 在實驗上,決定蛋白質結構有兩種。 2. 所以目前很多在這方面的研究,想用prediction的方法來填補gap的區域。 3. 那是不是 No structure => No GO? 利用X-光線繞射的特性,對已經結晶的蛋白質進行繞射實驗,然後再將所得到的數據加以分析,就可以得到結構。 利用核磁共振現象得到蛋白質的結構。並不是所有原子核都能產生核磁共振的現象。 6 6

Protein Structure Prediction The primary sequence already contain all the information necessary to define 3D structure. The 3D protein structure can be predicted according to three main categories of methods (Rost & O’Donoghue, 1997): (1) homology modeling; (2) fold recognition (threading); (3) ab initio techniques. Homology modeling is currently the most accurate method to predict protein 3D structure (Tramontano, 1998). 1. 那什麼時候要用哪一個方法呢?這裡我做了一個簡單的圖表來做說明。 7 7

Protein Structure Prediction Sequence Sequence Homology To known fold <30% >30% Homology Modeling Threading 1. 基本上用sequence identity 來做區別。 2. Comparative和Threading都需要template 3. Ab inito 直接由氨基酸的物理、化學特性(熱力學原理)來預測結構,適合較小的protein。 Yes Match Found? No Model Ab initio 8 8

Sequence similarity implies structural similarity? 2 4 6 8 1 5 i d e n t y Number of residues aligned Percentage sequence identity/similarity (B.Rost, Columbia, NewYork) Sequence identity implies structural similarity . Safe zone 1. 30%只是一個大概的標準。 2. 會依照aligned的residues而有些微的變化。這是統計出來的結果。 3. 當sequence aligned的很長時,identity就可以不用到30%。 9 9

Homology Modeling Basis Structure is much more conserved than sequence during evolution Limited applicability A large number of proteins and ORFs have no similarity to proteins with known structure Structure is conserved much longer than sequence in evolution. 同源模擬法是根據同源蛋白質三級結構的保留性超過蛋白質序列的理論,通常在 蛋白質序列上稍微的改變其三級結構亦改變很小。 10 10

What is Homology Modeling? Target Template KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL Share Similar Sequence Homologous ? KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE 8lyz 1alc 1. 接下來就講到我們今天的主題,Homology Modeling。 2. 利用template所提供的資訊,來預測target可能的三級結構(template必須是target的homology)。 3. 用圖解的方式加以說明。 4. Identity越高,準確度越高。 Use as template 11 11

Structure prediction by homology modeling Step 1 Step 2 Step 3 1. 剛剛講完了概念,現在要講其流程。 2. 兩條序列必須進行比對,找出相對應的位置。 Step 4 12 12

Homology detection and template selection To detect the fold of a probe sequence from a library of known target fold. The three type of sequence based methods: Pair-wise sequence-sequence comparison FASTA, BLAST Sequence profile comparison PSI-BLAST, IMPALA, HMMER, SAM Profile-profile comparison prof_sim, COMPASS 由Databases中去尋找一個或多個structure來當作template,當然與target的結構越像越好。 13 13

Sequence-Sequence comparison T BLAST, FASTA, SSEARCH 14

Profile-Sequence comparison T PSI-BLAST 15

PSI-BLAST Overview 1. 說明PSI-BLAST是如何去產生profile,及如何用profile去收尋target的homology。 2. 重複的去收尋Database。 3. 比較遙遠的Homology的關係。 16 16

Sequence-Profile comparison T RPS-BLAST, IMPALA, HMMER, SAM 17

Profile-Profile comparison Q T prof_sim, COMPASS 18

The importance of the sequence alignment Method_1 1lmb3 <-> 1pou shift = 9.34 σ = 39.62 LEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAYNAALLAKILKVSVEEFSPSIAREIYEMYEA HHHHHHHHHHHHHHHHHCCCChhhhhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhcchhhhhhhhhhhhh ||||||||||||||||||||| ++++++++ + ++++++++++++ ++++++++ 000000000000000000000 99999999 X XXXXXXXXXXXX XXXXXXXX HHHHHHHHHHHHHHHHHHCCC---------cchhhhhhhhhcccccc---chhhhhhhcccccccchhhhhhhhhhhhh LEELEQFAKTFKQRRIKLGFT---------QGDVGLAMGKLYGNDFS---QTTISRFEALNLSFKNMCKLKPLLEKWLN The quality of the sequence alignment is of crucial importance. Misplaced gaps, representing insertions or deletions, will cause residues to be misplaced in space. Careful inspection and adjustment on alignment may improve the quality of the modeling. 結構模擬是依據alignment的結果來進行的。 有shift,分別對應到結構的兩個位置。 SCR; structure conserved region SVR; structure variable region Method_2 1lmb3 <-> 1pou Shift = 0.67 σ = 60.78 LEDARRLKAIYEKKKNELGLS----QESVADKMG--MGQSGVGALFN-GINALNAYNAALLAKILKVSVEEFS HHHHHHHHHHHHHHHHHCCCC----hhhhhhhhc--cCHHHHHHHHC-cccccchhhhhhhhhhhccchhhcc ||||||||||||||||||||| ---- |||||||||| -- ++++++++ ++ 000000000000000000000 4444 0000000000 11 11111111 44 HHHHHHHHHHHHHHHHHHCCCcchhhhhhhhhcccccCCHHHHHHHCccccccchhhhhhhhhhh---hhhcc LEELEQFAKTFKQRRIKLGFTQGDVGLAMGKLYGNDFSQTTISRFEALNLSFKNMCKLKPLLEKW---LNDAE 19 19

Backbone generation Rigid-body assembly Building model core 20 Generate the backbone coordinates from the template for the aligned regions. Building model core 20 20

SVR通常都是Loop的位置 21 21

Construction of loops might be done by: Ab initio methods - without any prior knowledge. This is done by empirical scoring functions that check large number of conformations and evaluates each of them. 1. 由計算能量的方式,選出最可能的conformation,適合較小的Loop。 Wedemeyer, Scheraga J. Comput. Chem. 20, 819-844 (1999) 22 22 14

data clustered library Construction of loops might be done by: Using database of loops which appear in known structures. The loops could be categorized by their length or sequence data clustered library 23

and correct end-to-end distances 挑適合的Loop來填補這個區域 Scan database and search protein fragments with correct number of residues and correct end-to-end distances 24 24

25 25 15

26 26 16

Loop Modeling: A database approach Method breaks down for loops larger than 9 cRMS (Ǻ) Limit in 9 residues 一般說來,環狀結構的長度若是超過9個胺基酸,則困難度增加及準確性降低。 Loop length 27 27

Predicted model with long loop Target: 2bj7A Predicted model with long loop Without loop GDT_TS = 45.96 GDT_TS = 60.48 28

29 1. :此一資料庫儲存已知結構側鏈的可能構形,包含轉動子的 觀察頻率或預期頻率,每一種側鏈的構形與主鏈的二面角或與鄰近的胺基酸序列相 關連(correlations),這種顯著的相關連是在側鏈二面角機率(χ1,χ2 dihedral angle probabilities)和主鏈二面角估計值(φ,ψ dihedral angle values)之間。 2. 能量最小化方法,這些轉動子組與局部的環境無關連。 29 29

Errors in Homology Modeling - target --- template a) Side chain packing b)Distortions and shifts c) No template True structure Template Model 30 30

Errors in Homology Modeling d) Misalignments e) Incorrect template True structure Template Model 31 (Marti-Renom et al., 2000) 31

PROCHECK, Verify3D, Prosa, Anolea, Bala … Verify3D analyzes the compatibility of an atomic model (3D) with its own amino acid sequence (1D). 32 32

PROCHECK http://www.biochem.ucl.ac.uk/~roman/ procheck/procheck.html β α http://www.biochem.ucl.ac.uk/~roman/ procheck/procheck.html 33 33

Verify3D Verify3D analyzes the compatibility of an atomic model (3D) with its own amino acid sequence (1D). We show that an effective test of the accuracy of a 3D protein model is a comparison of the model to its own amino-acid sequence, using a 3D profile, computed from the atomic coordinates of the structure 3D profiles of correct protein structures match their own sequences with high scores. Verify3D: The Verify3D (Eisenberg et al.) method assess protein structures using three-dimensional profiles. This program analyzes the compatibilty of an atomic model (3D) with its own amino acid sequence (1D). Each residue is assigned a structural class based on its location and environment (alpha, beta, loop, polar, apolar etc). Then a database generated from good structures is used to obtain a score for each of the 20 amino acids in this structural class. The vertical axis in the plot represents the average 3D-1D profile score for each residues in a 21-residue sliding window. The scores ranges from -1 (bad score) to +1 (good score). Luethy et al., 1992 34 34

ProQ Server ProQ is a neural network-based predictor – Structural features  quality of a protein model. Correct Good Very good LGscore > 1.5 LGscore > 3 LGscore > 5 MaxSub > 0.1 MaxSub > 0.5 MaxSub > 0.8 ProQ is optimized to find – correct models… – …NOT (necessarily) native structures. Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/ 35 35

Modeling accuracy (Marti-Renom et al., 2000) 36 1. 用homology modeling預測的成功與否,與sequence identity有很大的關係。 2. 這是sequence identity 和預測成功率的關係圖。 3. 說明○及● 4. 由圖可知,當sequence identity越低時,template-target difference及alignment error都會相對的增加。 36 (Marti-Renom et al., 2000) 36

Utility of Structural Information 結果有好有壞,不同Level有不同的運用 Docking 的研究 37 37

38

39

(PS)2: protein structure prediction server 原理、流程,還有如何用該工具去預測三級結構 40 40

Consensus strategy The idea of consensus analysis is to gather predictions from a set of different methods. The performance of consensus methods is significantly higher than for individual methods. 3d-shotgun (Fischer D., 2003) 3d-jury (Ginalski K et al., 2003) Pmodeller (Bjorn W et al., 2003) 41

Structure prediction by homology modeling Step 1 Step 2 Step 3 Step 4 42 42

Overview of the (PS)2 method 1. 先簡單的講一下我們所使用的方法,等一下會對網頁做詳細的介紹 2. 目前Automatics 的方法,改變其中任何一個step 3. consensus Figure 1. Overview of the protein structure prediction server, (PS)2. 43 43

Alignment method 4, 2, 3 44 44

http://predictioncenter.org/ 45 45

CASP3 servers registered: 1. 3D-PSSM (Sternberg) sternber@icrf.icnet.uk 2. Karplus karplus@cse.ucsc.edu 3. frsvr (Fischer) dfischer@cs.bgu.ac.il 4. pscan (Eloffson) arne@bimbo.biokemi.su.se 5. BASIC (Godzik) adam@scripps.edu 6. GenTHREADER jones@globin.bio.warwick.ac.uk 7. Valentina di Francesco valedf@tigr.org 8. TOPITS (Rost) Burkhard.Rost@EMBL-Heidelberg.de 9. Bork 46

CASP8 servers registered: 47

Model Evaluation Performance evaluation Comparing the 47 CM targets to evaluate the performance with the other groups in CASP6. GDT_TS Score N is the total number residues of the target (native structure) GDTd is the number of aligned residues whose Cα-atom distance between the target and predicted model is less than d d is 1, 2, 4, or 8 Å. 48 48

Figure 3. Comparison (PS)2 with PSI-BLAST, IMPALA, and T-Coffee of the prediction accuracies (global / local GDT_TS scores) on target T0264. 49 49

Top2 = 21 Figure 4. Comparison of (PS)2 models with all automated servers in CASP6. 50 50

Cases Table 1. Compare with the other groups in CASP6 RBTA ESYP 3DJR MGTH 3DJS PROS PMO5 PRCM PCO5 PCOB Average GDT_TS 65.89 64.92 63.14 62.54 61.27 61.08 58.11 57.93 57.62 56.37 37.57 Cases T0269, Template 1prxA (PS)2 model, GDT_TS: 85.76 T0269, Template 1qq2A ESYP model, GDT_TS: 78.48 51 51

http://ps2.life.nctu.edu.tw 52 52

53 53

54 54