Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.

Similar presentations


Presentation on theme: "Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14."— Presentation transcript:

1 Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14

2 Why care about similarity? Similar sequences have similar structure

3 Similar structure -> similar sequence? No, the converse is not true! Convergent evolution. Outwardly similar solutions to similar problems may be internally different. Tiger and ‘Tasmanian tiger’. Fish and dolphin. Bat and bird. Same is true of molecular ‘species’ and ‘anatomies’!

4 Sequence --> function Similar sequences have similar function ‘[T]he same genes that work in flies are the ones that work in humans.’ -- Eric Wieshaus 1995 Nobel for drosophila work

5 Common origins Similar sequences have common origins ‘Descent with modification’ is Nature’s design mechanism Strong similarity may imply recent common origin (what do we mean by ‘strong’ and ‘recent’?) Strong similarity may imply strong conservation of sequence or motif

6 Is multiple sequence comparison a generalization? From cs point of view, we’re going from two strings to many strings, a generalization Yes, in that it helps detect faint similarities No, in that we go from known biological similarity to suspected sequence similarity

7 ‘Big’ uses for MSC Represent protein families Identify conserved sequence features Deduce evolutionary history

8 Profile representation Definition Given a multiple alignment of a set of strings, a profile specifies for each column the frequency of each character

9 Profile example Alignment a b c - a a b a b a a c c b - c b - b c Profile C1 C2 C3 C4 C5 a.75.25.50 b.75.75 c.25.25.50.25 d.25.25.25

10 Fit string S to profile P Given a profile P and a string S, what is the best alignment (fit) of S to P? Example: S: A a b - b c P: 1 - 2 3 4 5

11 Two key issues How to score an alignment of a string to a profile How to compute an optimal alignment, given a scoring system

12 Scoring and alignment of profile Scoring Assuming letter-to-letter scores are given, use the weighted sum for each column Optimal alignment By DP, similar to S-S optimal alignment Q: How would you do profile-to-profile scoring and alignment?

13 Signature (motif) representation A motif is a regular expression (re) Example: a helicase motif [&H][&AD[DE]x n [TSN][x 4 ][QK]Gx 7 [&A], where –[abc] = any of a,b,c –& = [ILVMFYW] –x = any amino –a 3 = up to 3 a’s –a n = any number of a’s Find a motif by grep-ing

14 Finding optimal MS alignment Need a scoring system Given a scoring system, an (efficient) method of calculation If no efficient method of getting the right answer, an efficient way of getting a plausible answer

15 Need MSC measure Desirable characteristics: –variable number of sequences –column-wise calculation –order independence MQPILLL MLR-LL- MK-ILLL MPPVLIL

16 Sum-of-pairs (SP) measure Column score = sum pairwise scores k Choose 2 pairs Reduces to pairwise alignment when k = 2 Need to assign (-,-) value May compute in either row or column order

17 DP approach Generalization of two-sequence comparison k-dimensional array space complexity is O(n k ) MSC with SP measure is NP-complete

18 MSA speedup heuristic This ‘heuristic’ guarantees the right answer! But.. it doesn’t guarantee the speedup General idea: –find a lower bound on L –if value for a cell exceeds L, it cannot enter into opt solution

19 Commonly method -- iterative Simplest implementation Begin with S i and S j which are pairwise closest Iteratively merge in additional string with smallest edit distance from any in multiple alignment Equivalent to finding MSP on edit tree

20 Clustering method Almost any clustering algorithm can be adapted to MSC Usually start with small clusters and build big ones Also possible start with big cluster, and divide-and-conquer Not clear which method is best


Download ppt "Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14."

Similar presentations


Ads by Google