Presentation is loading. Please wait.

Presentation is loading. Please wait.

3D-COFFEE Mixing Sequences and Structures Cédric Notredame.

Similar presentations


Presentation on theme: "3D-COFFEE Mixing Sequences and Structures Cédric Notredame."— Presentation transcript:

1 3D-COFFEE Mixing Sequences and Structures Cédric Notredame

2 chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : Potential Uses of A Multiple Sequence Alignment? Extrapolation Motifs/Patterns Phylogeny Profiles Struc. Prediction Multiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.

3 Why Is It Difficult To Compute A multiple Sequence Alignment? A CROSSROAD PROBLEM BIOLOGY: What is A Good Alignment COMPUTATION What is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: *

4 Why Is It Difficult To Compute A multiple Sequence Alignment ? BIOLOGY CIRCULAR PROBLEM.... Good Sequences Good Alignment COMPUTATION

5 The T-Coffee Algorithm

6 Local Alignment Global Alignment Extension Multiple Sequence Alignment Mixing Local and Global Alignments

7 What is a library? Extension+T-Coffee Library Based Multiple Sequence Alignment 2 Seq1 MySeq Seq2 MyotherSeq #1 2 1 1 25 3 8 70 …. 3 Seq1 anotherseq Seq2 atsecondone Seq3 athirdone #1 2 1 1 25 #1 3 3 8 70 ….

8 The Triplet Assumption X Y Z X Y SEQ A SEQ B Consistency Consensus

9 ClustalWT-Coffee

10 Dynamic Programming Using An Extended Library Progressive Alignment

11 What Is BaliBase How Good is T-Coffee ??? Best Performing Method on MSA benchmark Datasets BaliBase -Notredame -Sonhammer Ribosomal RNA -Katoh (Mafft) Homstrad -Notredame OxBench -Barton

12 Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist

13 Mixing Sequences and Structures

14 Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures STUCTURE  FUNCTION

15 Why Do We Want To Mix Sequences and Structures? Sequences are Cheap and Common. Structures are Expensive and Rare.

16 Why Do We Want To Mix Sequences and Structures? Cheapest Structure determination: Sequence-Structure Alignment THREAD Or ALIGN ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN

17 Why Do We Want To Mix Sequences and Structures? ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN THREAD Or ALIGN Convincing Alignment  Same Fold

18 Why Do We Want To Mix Sequences and Structures? Convincing Alignment  Same Fold Distant sequences are hard to align

19 Why Do We Want To Mix Sequences and Structures? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * Multiple Sequence Alignments Help Exploring the Twilight Zone

20 Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures 2-Produce Better Alignments

21 Why Do We Want To Mix Sequences and Structures? ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN ALIGN Unreliable alignment if %ID <30%

22 Why Do We Want To Mix Sequences and Structures? Alignment Unsentitive to %ID ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Struc. Superposition Folds evolve Slower than Sequences

23 Why Do We Want To Mix Sequences and Structures?

24 Structure Superposition

25 Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures 2-Produce Better Alignments

26 How To Mix Sequences and Structures

27 Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist

28 Struct Vs Struct Seq Vs Struct Thread Evaluation on Homestrad Superpose Seq Vs Seq Local Global Mixing Sequences and Structures with T-Coffee

29 The 3D-Coffee Libraries Methods Global: Needlman and Wunsch Local:Sim (lalign) Threading: Fugue Superposition:SAP

30 Threading: Fugue

31 Fugue Threading: Fugue

32 Fugue Threading: Fugue 1-Turn Sequence into a profile: -lower penalties in loops -Structure specific matrix 2- Align Profile with Sequence

33 Evaluating Fugue Threading: Fugue  1-Select 967 pairs of sequences in HOMSTRAD FUGUE T-Coffee 2-Align each pair with T-Coffee and Fugue. Compare 3-Compare the Two Alignments

34 Fugue Threading: Fugue 1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and Fugue. 3-Compare the Two Alignments TCdef wins Fugue wins TCdef:58.81% Fugue:61.81%

35 Superposition: SAP

36 Superposition:SAP

37 1-High Level Dynamic Programming Substitution Matrix when doing regular Alignments 2-Low Level DP. Forcing the aln of two residues

38 1-High Level Dynamic Programming Superposition:SAP 1 9 12 13 1 8 14 5 3-Rigid Body Superposition RMSD 2-Low Level DP. Forcing the aln of two residues

39 1-High Level Dynamic Programming Superposition:SAP 1 9 12 13 1 8 14 5 3-Rigid Body Superposition RMSD 2-Low Level DP. Forcing the aln of two residues

40 1-High Level Dynamic Programming Superposition:SAP 3-Rigid Body Superposition 2-Low Level DP. Evaluate Every Pair

41 1-High Level Dynamic Programming Superposition:SAP Structure Based Sequence Alignment Make a DP on the accumulated traces  Use Traces like a Substitution Matrix

42 1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and SAP. 3-Compare the Two Alignments Superposition:SAP

43 1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and SAP. 3-Compare the Two Alignments Superposition:SAP TCdef:58.81% SAP:86.31%

44 SAPFugue TCdef:58.81% Fugue:61.81% TCdef:58.81% Fugue:86.31%

45 Sequences and Structures: How Good is The Mixture ???

46 Our Benchmark: HOM39 -HOMSTRAD: Structure based MSAs that can be used as References. -COMPACT and DEMANDING -HOM39: The 39 Most difficult datasets (percent ID lower than 25).

47 Our BenchMark: Using HOM39 BENCHMARKING Strategy: -re-align HOM39 without using ALL the structures -Compare the result with the reference

48 Evaluating 3D-Coffee 1- Can a SINGLE structure Help ?

49 Seq Vs Struct Thread Evaluation on HOM39 Seq Vs Seq Local Global Using ONE structure with 3D-Coffee HOM39 with ONE Structure per MSA

50

51

52

53 Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 2- Does it benefit to ALL the Sequences Is EVERYONE Happier if there is a STAR in the team…

54 BaliBase HOM39 TC-Fugue  + Remove Provided Structure(s) Comparison

55

56 Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 3- Can We Use Two or More Structures 2-Does it benefit to all the sequences

57 Seq Vs Struct Fugue Evaluation on Homestrad Seq Vs Seq Local Global Mixing Sequences and Structures with 3D-Coffee HOM39 with TWO Structures/MSA Struct Vs Struct SAP, LSQ

58 Indirect Improvement Direct Improvement

59

60 Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 4-Relation Accuracy/ N-structures ??? 2-Does it benefit to all the sequences 3-Can we use Two Structures

61 Seq Vs Struct Fugue Evaluation on Homestrad Seq Vs Seq Local Global Mixing Sequences and Structures with T-Coffee HOM39 with 1-N Structures per MSA Struct Vs Struct SAP

62

63 Induced Improvement

64 Conclusion

65 -Structures Help BUT NOT SO MUCH

66 The More Structures The Merrier

67

68 Credits Orla O’Sullivan: University College, Cork, Ireland Des Higgins: University College, Cork, Ireland Karsten Suhre: IGS-CNRS, Marseille, France

69 Conclusion The program is available on request from: cedric.notredame@europe.com

70


Download ppt "3D-COFFEE Mixing Sequences and Structures Cédric Notredame."

Similar presentations


Ads by Google