Presentation is loading. Please wait.

Presentation is loading. Please wait.

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.

Similar presentations


Presentation on theme: "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing."— Presentation transcript:

1 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing progressive multiple alignment methods

2 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [2] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Pair-wise alignment quality versus sequence identity Vogt et al., JMB 249, 816-831,1995

3 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [3] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Do you remember the steps of the progressive alignment algorithm ?

4 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [4] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Remember the greediness of the progressive alignment algorithm? 1 3 2 5 1 3 1 3 1 3 2 5 2 5 d root W C V V A Y W W C V V A Y F W C M V L Y W W C M V L Y F W C V V A Y W W C V V A Y F W C M V L Y W W C M V L Y F W C V V A Y W W C V V A Y F W C M V L Y W W C M V L Y F W C I V A L Y W ?

5 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [5] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Strategies for progressive alignment optimization Heuristics Profile pre-processing Secondary structure-guided alignment Globalised local alignment Matrix extension Objective: try to avoid (early) errors

6 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [6] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Clustal, ClustalW, ClustalX Probably most well-known progressive alignment program Neighbour Joining (NJ) algorithm (Saitou and Nei, 1984) to construct guide tree NJ does not require that all lineages have diverged by equal amounts Sequence blocks are represented by profiles Individual sequences are additionally weighted according to the branch lengths in the NJ tree Higgins et al. 1994

7 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [7] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Clustal, ClustalW, ClustalX Further carefully crafted heuristics include: local gap penalties automatic selection of the amino acid substitution matrix automatic gap penalty adjustment mechanism to delay alignment of sequences that appear to be distant at the time they are considered. Clustal (W/X) does not allow iteration (Hogeweg and Hesper, 1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002) Still limited by “greedy” nature of progressive algorithm

8 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [8] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 ClustalW web-interface

9 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [9] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Protein structure hierarchical levels (a) Primary structure —Ala—Glu—Val — Thr—Asp—Pro—Gly— (b) Secondary structure (c) Tertiary structure (fold)(d) Quartenary structure (oligomers) α helix β sheet

10 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [10] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Flavodoxin fold: helix-beta-helix mainly bacterial electron-transfer proteins (transport systems) alpha-helical layers sandwiching 5 stranded parallel beta-sheet

11 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [11] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Flavodoxin cheY NJ tree

12 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [12] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 ClustalW Flavodoxin-cheY

13 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [13] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 T-COFFEE Performs progressive alignment as in Clustal But there are many differences … Integrating different pair-wise alignment techniques (NW, SW,..) Combining different multiple alignment methods (consensus multiple alignment) Combining sequence alignment methods with structural alignment techniques Plug in user knowledge Notredame, Higgins and Heringa 2000

14 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [14] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 T-COFFEE Step1: Global alignment: Clustal Local alignment: Lalign (pick top ten) Step 2: Put collection of local & global in a library Step 3: Extend the library by aligning each pairwise alignment with a possible third sequence (Search matrix or library extension) Notredame, Higgins and Heringa 2000

15 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [15] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Search matrix extension

16 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [16] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Search matrix extension

17 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [17] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Search matrix extension

18 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [18] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 So for T-COFFEE … Original pairwise alignments are refined based on the consistency with other “intermediate” sequences Distance matrix and guide tree are built based on the refined alignments

19 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [19] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 The T-COFFEE effect Although a direst alignment might choose one path, using information from the other sequences (T-COFFEE) finds an alternate and usually better one Minimize errors in the early stages of alignment assembly Extra computation time for the calculation of the consistency scores

20 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [20] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 but … T-COFFEE (v1.23) Flavodoxin-cheY

21 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [21] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 T-COFFEE web-interface

22 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [22] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 3D-COFFEE Computes structural based alignments Structures related to the sequences are retrieved More accurate … but for many (many) proteins we do not have the structure!

23 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [23] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE Again a progressive alignment method Like in T-COFFEE the idea is minimise error- propagation by including prior knowledge about the other sequences when making pairwise alignments PRALINE does this by making pre-profiles: the sequences to be aligned are represented by pre-profiles instead of single sequences Heringa 1999

24 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [24] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Pre-profile generation

25 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [25] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE Flavodoxin-cheY Pre-processing (cut-off  1500)

26 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [26] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Using structural information Structure is more conserved than sequence CASP (Critical Assessment of Techniques for Protein Structure Prediction) Tertiary structure prediction: de novo prediction is not accurate at all ! (especially for larger folds)

27 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [27] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Using secondary structure 10 years SS prediction method development: Accuracy ± 5% 10 years MSA method development: Accuracy can be ± 40% Amino acid patterns Secondary structure patterns Super-secondary structure patterns Alternate matrices with associated gap penalties according to region

28 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [28] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 How to combine ss and aa info Dynamic programming search matrix Amino acid substitution matrices MDAGSTVILCFV HHHCCCEEEEEE M D A A S T I L C G S H H H H C C E E E C C C H E HC E Default

29 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [29] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 In terms of scoring … So how would you score a profile using this extra information? Same formula as in lecture 6, but you can use sec. Structure-specific substitution scores in various combinations. Where does it fit in? Very important: structure is always more conserved than sequence so it can help with the insertion(or not) of gaps.

30 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [30] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 SS-PRALINE Sequences to be aligned Predict secondary structure or DSSP HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHHHCCCEEEE CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC Align sequences using secondary structure Secondary structure Multiple alignment

31 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [31] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Using predicted secondary structure

32 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [32] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE web-interface

33 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [33] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Multiple alignment methods  Multi-dimensional dynamic programming  Progressive alignment  Iterative alignment correct for problems with progressive alignment by repeatedly realigning subgroups of sequence From here on refer to the PRALINE paper for help (further reading section online)

34 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [34] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Iterative alignment Idea: an optimal might be found by repeatedly modifying existing suboptimal solutions Start with producing low-quality alignment and gradually improve by iterative refinement Continue until no more improvements in the alignment scores can be achieved

35 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [35] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Iteration fates Convergence Limit cycle Divergence

36 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [36] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Consistency scoring

37 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [37] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Consistency iteration Pre - - profiles Multiple alignment positional consistency scores

38 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [38] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Pre-profile update iteration Pre - - profiles Multiple alignment

39 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [39] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Is the initial SS prediction good enough?

40 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [40] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 SS iteration

41 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [41] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 MUSCLE Edgar 2004

42 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [42] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE and MUSCLE method PRALINE and MUSCLE use almost the same formalism to compare two profiles: MUSCLE: PRALINE: The difference is the position of the log in the above equations: Edgar calls the Muscle scoring scheme “Log-expectation score (LE)”

43 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [43] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 So what do we do ? A single shot for a good alignment without thinking: MUSCLE, T-COFFEE (maybe POA) If you want to experiment with making alignments for a given sequence set: PRALINE Profile pre-processing Iteration Secondary structure-induced alignment Globalised local alignment There is no single method that always generates the best alignment Therefore best is to use more than one method: e.g. include Dialign2 (local)

44 CENTRFORINTEGRATIVE BIOINFORMATICSVU E [44] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Summary Weighting schemes simulating simultaneous multiple alignment Profile pre-processing (global/local) Matrix extension (well balanced scheme) Using additional information secondary structure driven alignment Schemes strike balance between speed and sensitivity


Download ppt "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing."

Similar presentations


Ads by Google