Presentation is loading. Please wait.

Presentation is loading. Please wait.

T-COFFEE, a novel method for combining biological information Cédric Notredame.

Similar presentations


Presentation on theme: "T-COFFEE, a novel method for combining biological information Cédric Notredame."— Presentation transcript:

1 T-COFFEE, a novel method for combining biological information Cédric Notredame

2 chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : Potential Uses of A Multiple Sequence Alignment? Extrapolation Motifs/Patterns Phylogeny Profiles Struc. Prediction Multiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.

3 Why Is It Difficult To Compute A multiple Sequence Alignment? A CROSSROAD PROBLEM BIOLOGY: What is A Good Alignment COMPUTATION What is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: *

4 Why Is It Difficult To Compute A multiple Sequence Alignment ? BIOLOGY CIRCULAR PROBLEM.... Good Sequences Good Alignment COMPUTATION

5 Dynamic Programming Using A Substitution Matrix Progressive Alignment

6 The T-Coffee Algorithm

7 Progressive Alignment Principle and its Limitations…

8 The Extended Library Principle…

9

10 The Triplet Assumption SEQ A SEQ B

11 Weighting And Extension Extension=Using Information from Other Sequences Weighting=Using The surrounding Information (Coffee)

12 T-Coffee Progressive Alignment Notredame, Higgins, Heringa, 2000 Dynamic Programming Using The extended Library

13 Local Alignment Global Alignment Extension Multiple Sequence Alignment Mixing Local and Global Alignments

14 What is a library? Extension+T-Coffee Library Based Multiple Sequence Alignment 2 Seq1 MySeq Seq2 MyotherSeq #1 2 1 1 25 3 8 70 …. 3 Seq1 anotherseq Seq2 atsecondone Seq3 athirdone #1 2 1 1 25 #1 3 3 8 70 ….

15 Validating T-Coffee

16 What Is BaliBase BaliBase BaliBase is a collection of reference Multiple Alignments The Structure of the Sequences are known and were used to assemble the MALN. Evaluation is carried out by Comparing the Structure Based Reference Alignment With its Sequence Based Counterpart

17 BaliBase DALI, Sap …  Method X Comparison

18 Validation Using BaliBase T-Coffee Results

19 Validation Using BaliBase

20

21

22

23

24 Choosing The Right Method (MAFFT evaluation)

25

26 Taking T-Coffee Further: Using Structures

27 Mixing Heterogenous Information With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist

28 Sequences are Cheap and Common. Structures are Expensive and Rare. STUCTURE  FUNCTION We WANT to use Structural information in multiple alignments: To help the alignment To extrapolate from Structures to Sequences. Why Do We Want To Mix Sequences and Structures?

29 Better gap penalties (ClustalW). Helping an Alignment With Structures? Low gap penalties high gap penalties

30 Better gap penalties (ClustalW). Helping an Alignment With Structures? Revealing Very Distant Relationships 1hstA 1tc3c

31 Is It Possible to Use Structural Information ? Any_pair THE new T-coffee method Struct Vs Struct Seq Vs Struct FUGUE Evaluation on Homestrad SAP Seq Vs Seq Local Global

32 DataMethodResult SeqCW35.2 % SeqTC38.4 % 1 StrucTC+FU41.9 % 2 StrucTC+SA41.8 % 2 StrucTC+SA+FU51.7 % ALL StrucTC+SA66.7 % CW: Clustal W TC: T-Coffee default SA: T-Coffee Using SAP FU: T-Coffee Using SAP Is It Possible to Use Structural Information ? Validation of Any_pair on the Homestrad Database (Orla O’Sullivan, Des Higgins and C. Notredame) Result: % of columns correctly aligned as judged from the Homestrad reference Alignment

33 Of the Importance of being Trustworthy… Identifying Good Bits in an Alignment

34 cah2_human NGPEHWHK-DFPIAKGERQSPVDIDTHTAKYDP------------SLKPLSVS--YDQAT cahp_mouse --GVEWGL-VFPDANGEYQSPINLNSREARYDP------------SLLDVRLSPNYVVCR cah4_rat SGPEQWTG----DCKKNQQSPINIVTSKTKLNP------------SLTPFTFVG-YDQKK ptpg_mouse YGPEHWVT-SSVSCGGSHQSPIDILDHHARVGD------------EYQELQLDG-FDNES cah6_human LDEAHWPQ-HYPACGGQRQSPINLQRTKVRYNP------------SLKGLNMTGYETQAG cah_dunsa -VGFDWTGGVCVNTGTSKQSPINIETDSLAEESERLGTADDTSRLALKGLLSS--SYQLT cahh_varv --------------MSQQLSPINIETKKAISNA------------RLKPLNIH--YNESK cah2_chlre EGKDGAG-NPWVCKTGRKQSPINVPQYHVLDGK------------GSK--IATGLQTQWS **::: cah2_human ---------SLRILNNGHAFNVEFDD-SQDKAVLK--------------------GGPLD cahp_mouse ---------DCEVTNDGHTIQVILKS----KSVLS--------------------GGPLP cah4_rat ---------KWEVKNNQHSVEMSLGE----DIYIF--------------------GGDLP ptpg_mouse SN-------KTWMKNTGKTVAILLKD----DYFVS--------------------GAGLP cah6_human ---------EFPMVNNGHTVQIGLPS----TMRMT--------------------VAD-G cah_dunsa ---------SEVAINLEQDMQFSFNAPDEDLPQLT--------------------IGGVV cahh_varv ---------PTTIQNTGKLVRINFKG-----GYLS--------------------GGFLP cah2_chlre YPDLMSNGSSVQVINNGHTIQVQWTY----DYAGHATIAIPAMRNQSNRIVDVLEMRPND * :.. cah2_human G----TYRLIQFHFHWGSLD--GQGSEHTVDKKKYAAELHLVHWNTK-YGDFGKAVQQPD cahp_mouse Q--GQEFELYEVRFHWGREN--QRGSEHTVNFKAFPMELHLIHWNSTLFGSIDEAVGKPH cah4_rat T----QYKAIQLHLHWSEES--NKGSEHSIDGKHFAMEMHVVHKKMTTGDKVQDSDSKD- ptpg_mouse G----RFKAEKVEFHWGHSNG-SAGSEHSVNGRRFPVEMQIFFYNPDDFDSFQTAISENR cah6_human I----VYIAQQMHFHWGGASSEISGSEHTVDGIRHVIEIHIVHYNS-KYKTYDIAQDAPD cah_dunsa H----TFKPVQIHFH-------HFASEHAIDGQLYPLEAHMVMASQN-DGS--------D cahh_varv N----EYVLSSLHIYWGKED--DYGSNHLIDVYKYSGEINLVHWNKKKYSSYEEAKKHDD cah2_chlre ASDRVTAVPTQFHFH--------STSEHLLAGKIFPLELHIVHKVTD---KLEACKG--G...:: *:* :. * ::. How Good Is my Alignment?

35 Measuring The Local Reliability: CORE Measure of Reliability cah2_human NGPEHWHK-DFPIAKGERQSPVDIDTHTAKYDPSLKPLSVS cahp_mouse --GVEWGL-VFPDANGEYQSPINLNSREARYDPSLLDVRLS cah4_rat SGPEQWTG----DCKKNQQSPINIVTSKTKLNPSLTPFTFV ptpg_mouse YGPEHWVT-SSVSCGGSHQSPIDILDHHARVGDEYQELQLD cah6_human LDEAHWPQ-HYPACGGQRQSPINLQRTKVRYNPSLKGLNMT  Escore (Q,x) N*Max Escore Core (Q)=

36 CORE index Specificity (  ) and Sensitivity (  ) 0.48

37 What is the Local Quality of my Alignment II I

38 T-COFFEE, Version_1.24(Wed Nov 15 18:31:29 PST 2000) Notredame, Higgins, Heringa, JMB(302)pp205-217,2000 CPU TIME:11 sec. SCORE=39 * BAD AVG GOOD * cah2_human : 42 cah4_rat : 41 cah6_human : 40 cahp_mouse : 43 cah_dunsa : 33 cah2_human 77664444-454555557666665554444444------------33322222- cah4_rat 54553332----233445655555554444444------------443323221 cah6_human 44333443-333344445555444444444444------------444433331 cahp_mouse --633453-333345565554444334444455------------555444331 cah_dunsa -34334320212223456555555543333333ERLGTADDTSRL22222111- cah2_chlre 7663333-0333334566666555444343322------------222--1110 ptpg_mouse 67763343-333334445444433333333333------------332222221 cahh_varv --------------5555555555554444433------------33322211- Cons 655433430333334455555554444444443------------333322221 cah2_human -11121---------22223334333322321-00011222------------- cah4_rat -22222---------23333344443344442----22222------------- cah6_human 001122---------22233344333333433----22222------------- cahp_mouse 022333---------34344455554444543----33334------------- cah_dunsa -11111---------11111111111111110P00000111------------- cah2_chlre 00000000DLMSNGS11223333333433332----22111ATIAIPAMRNQSN ptpg_mouse -1111100-------12234445444544433----33333------------- cahh_varv -11222---------22233333333333322-----1122------------- Cons 01112100-------22233334333333332-00022222------------- Using Consistency For Automatic Annotation?

39 Evaluating An Alignment Not Generated With T-Coffee: T_coffee –infile CLUSTALW_ALN –in Library –do_score

40 Running T-Coffee ONLINE

41 WHERE ? Cedric.notredame@europe.com igs-server.cnrs-mrs.fr/~cnotred igs-server.cnrs-mrs.fr/Tcoffee

42 The T-Coffee Server

43

44 ES45, 4Proc 1 Gb RAM

45 T-Coffee Server HP/Compaq-ES45/4-2G

46 The T-Coffee Server

47 Data Input

48 The Right Parameters

49 The T-Coffee Server

50 Evaluating An Alignment

51

52 The T-Coffee Server

53

54

55

56

57

58

59 Future…

60 Large Scale…

61 Tailor Made…

62 WHERE ? Cedric.notredame@europe.com igs-server.cnrs-mrs.fr/~cnotred igs-server.cnrs-mrs.fr/Tcoffee

63 WHO ? WHO USES T-Coffee ? Dali Domain Dictionnary Pfam SwissProt WHO Makes T-Coffee ? Cédric Notredame Des Higgins Chantal Abergel Olivier Poirot Orla O’Sullivan

64 igs-server.cnrs-mrs.fr/~cnotred igs-server.cnrs-mrs.fr/Tcoffee Cedric.notredame@europe.com


Download ppt "T-COFFEE, a novel method for combining biological information Cédric Notredame."

Similar presentations


Ads by Google