Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)

Similar presentations


Presentation on theme: "Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)"— Presentation transcript:

1 Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)

2

3 Saving cells in DP 1.Find local alignments 2.Chain -O(NlogN) L.I.S. 3.Restricted DP

4 The Problem: Find a Chain of Local Alignments (x,y)  (x’,y’) requires x < x’ y < y’ Each local alignment has a weight FIND the chain with highest total weight

5 A similar problem: LCS 15324162042431118 4 20 24 3 11 15 11 4 18 20 Given two strings x and y, find the longest common subsequence Imagine a sparse scenario, where x and y have few matches

6 Sparse Dynamic Programming – L.I.S. Longest Increasing Subsequence Given a sequence over an ordered alphabet  w = w 1, …, w m Find the longest increasing subsequence  s = s 1, …, s k  s 1 < s 2 < … < s k

7 Sparse Dynamic Programming – L.I.S. Let input be w: w 1,…, w n INITIALIZATION: L: 1-indexed array, L[1]  w 1 B: 0-indexed array of backpointers; B[0] = 0 P: array used for traceback // L[j]: smallest last element w i of j-long LIS seen so far ALGORITHM for i = 2 to n { Find j such that L[j] < w[i] ≤ L[j+1] L[j+1]  w[i] B[j+1]  i P[i]  B[j] } That’s it!!! Running time?

8 Sparse LCS expressed as LIS Create a sequence w Every matching point (i, j), is inserted into w as follows: For each column j, from smallest to largest, insert in w the points (i, j), in decreasing row i order The 11 example points are inserted in the order given a = (y, x), b = (y’, x’) can be chained iff  a is before b in w, and  y < y’ 15324162042431118 6 4 27 18 10 9 5 11 3 4 20 24 3 11 15 11 4 18 20 x y

9 Sparse LCS expressed as LIS Create a sequence w w = (4,2) (3,3) (10,5) (2,5) (8,6) (1,6) (3,7) (4,8) (7,9) (5,9) (9,10) Consider now w’s elements as ordered lexicographically, where (y, x) < (y’, x’) if y < y’ Claim: An increasing subsequence of w is a common subsequence of x and y 15324162042431118 6 4 27 18 10 9 5 11 3 4 20 24 3 11 15 11 4 18 20 x y

10 Sparse Dynamic Programming for LIS Example: w = (4,2) (3,3) (10,5) (2,5) (8,6) (1,6) (3,7) (4,8) (7,9) (5,9) (9,10) L = 1.(4,2) 2.(3,3) 3.(3,3) (10,5) 4.(2,5) (10,5) 5.(2,5) (8,6) 6.(1,6) (8,6) 7.(1,6) (3,7) 8.(1,6) (3,7) (4,8) 9.(1,6) (3,7) (4,8) (7,9) 10.(1,6) (3,7) (4,8) (5,9) 11.(1,6) (3,7) (4,8) (5,9) (9,10) Longest common subsequence: s = 4, 24, 3, 11, 18 15324162042431118 6 4 27 18 10 9 5 11 3 4 20 24 3 11 15 11 4 18 20 x y

11 Sparse DP for rectangle chaining 1,…, N: rectangles (h j, l j ): y-coordinates of rectangle j w(j):weight of rectangle j V(j): optimal score of chain ending in j L: list of triplets (l j, V(j), j)  L is sorted by l j : top to bottom  L is implemented as a balanced binary tree y h l

12 Sparse DP for rectangle chaining Main idea: Sweep through x- coordinates To the right of b, anything chainable to a is chainable to b Therefore, if V(b) > V(a), rectangle a is “useless” – remove it In L, keep rectangles j sorted with increasing l j - coordinates  sorted with increasing V(j) V(b) V(a)

13 Sparse DP for rectangle chaining Go through rectangle x-coordinates, from lowest to highest: 1.When on the leftmost end of i: a.j: rectangle in L, with largest l j < h i b.V(i) = w(i) + V(j) 2.When on the rightmost end of i: a.k: rectangle in L, with largest l k  l i b.If V(i)  V(k): i.INSERT (l i, V(i), i) in L ii.REMOVE all (l j, V(j), j) with V(j)  V(i) & l j  l i i j k

14 Example x y 1: 5 3: 3 2: 6 4: 4 5: 2 2 5 6 9 10 11 12 14 15 16

15 Time Analysis 1.Sorting the x-coords takes O(N log N) 2.Going through x-coords: N steps 3.Each of N steps requires O(log N) time: Searching L takes log N Inserting to L takes log N All deletions are consecutive, so log N per deletion Each element is deleted at most once: N log N for all deletions Recall that INSERT, DELETE, SUCCESSOR, take O(log N) time in a balanced binary search tree

16 Multiple Sequence Alignments

17 The Global Alignment problem AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC x y z

18

19

20 Definition Given N sequences x 1, x 2,…, x N :  Insert gaps (-) in each sequence x i, such that All sequences have the same length L Score of the global map is maximum A faint similarity between two sequences becomes significant if present in many Multiple alignments can help improve the pairwise alignments

21 Scoring Function Ideally:  Find alignment that maximizes probability that sequences evolved from common ancestor, according to some phylogenetic model More on phylogenetic models later x y z w v ?

22 Scoring Function A comprehensive model would have too many parameters, too inefficient to optimize Possible simplifications  Ignore phylogenetic tree  Statistically independent columns: S(m) =  i S(m i ) m: multiple alignment, m i are columns

23 Scoring Function: Sum Of Pairs Definition: Induced pairwise alignment A pairwise alignment induced by the multiple alignment Example: x:AC-GCGG-C y:AC-GC-GAG z:GCCGC-GAG Induces: x: ACGCGG-C; x: AC-GCGG-C; y: AC-GCGAG y: ACGC-GAC; z: GCCGC-GAG; z: GCCGCGAG

24 Sum Of Pairs (cont’d) The sum-of-pairs score of an alignment is the sum of the scores of all induced pairwise alignments S(m) =  k<l s(m k, m l ) s(m k, m l ):score of induced alignment (k,l)

25 Sum Of Pairs (cont’d) Heuristic way to incorporate evolution tree: Human Mouse Chicken Weighted SOP: S(m) =  k<l w kl s(m k, m l ) w kl : weight decreasing with distance Duck

26 Consensus -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC CAG-CTATCAC--GACCGC----TCGATTTGCTCGAC CAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Find optimal consensus string m * to maximize S(m) =  i s(m *, m i ) s(m k, m l ):score of pairwise alignment (k,l)

27 A Profile Representation Given a multiple alignment M = m 1 …m n  Replace each column m i with profile entry p i Frequency of each letter in  # gaps Optional: # gap openings, extensions, closings - A G G C T A T C A C C T G T A G – C T A C C A - - - G C A G – C T A C C A - - - G C A G – C T A T C A C – G G C A G – C T A T C G C – G G A 1 1.8 C.6 1.4 1.6.2 G 1.2.2.4 1 T.2 1.6.2 O.2.8.4.4 E.4 C.2.8.4.2


Download ppt "Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)"

Similar presentations


Ads by Google