Presentation is loading. Please wait.

Presentation is loading. Please wait.

Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat.

Similar presentations


Presentation on theme: "Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat."— Presentation transcript:

1 Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

2 Swaps + Mismatches A paper on this subject by Amihood Amir, Estrella Eizenberg, Ohad Lipsky and Ely Porat Was submitted to ESA 2004

3 Problem definition T: a d b d a c b d a b c a b d a b b a a b c Mismatches: Abrahamson 87 K-mismatches Landau Vishkin 86 Amir Lewenstein Porat 00

4 Problem definition T: a d b d a c b d a b c a b d c a b d b a c Swaps: Amir Aumann Landau M.Lewenstein N.Lewenstein 87 Cole Hariharan 00 Amir Cole Hariharan Lewenstein Porat 2001 Amir Lewenstein Porat 2000

5 Problem definition T: a d b d a c b d a b c a b d c a b b b a c Minimum distance: Counting all as mismatches: 5 err Minimum distance: 3 err

6 Starting with simpler problem  ={0,1} T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 1 0 1 0 We wish to count only the mismatches (we will leave the swaps for later) we call them non-swap-mismatches (NSM)

7 Starting with simpler problem  ={0,1} T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 1 0 1 0 NSM[6]=2 Mismatches[6]=4 Minimum-distance[6]=(Mismatches[6]+NSM[6])/2 3-err O(nlogm) ???? O(????+nlogm)

8 Starting with simpler problem T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1 T 1 : 0 1 0 1 0 1 * * * 1 0 1 0 * * T 2 : * * * * * * 1 0 1 * * * * 0 1

9 Starting with simpler problem P 2 : 1 0 * * * * * P 1 : * * 0 1 0 1 0 We do the same for the pattern We will give solution only for the odd places (NSM[i] where i is odd) P: 1 0 0 1 0 1 0

10 Starting with simpler problem P 2 : 1 0 * * * * * P 1 : * * 0 1 0 1 0 T 1 : 0 1 0 1 0 1 * * * 1 0 1 0 * * T 1 comparing with P 1 doesn’t give any err neither swap nor mismatch (the same is for T 2 against P 2 ) Without loss of generality we look only on T 1 against P 2

11 Starting with simpler problem P 2 : 1 0 * * * * * T 1 : 0 1 0 1 0 1 * * * 1 0 1 0 * * P 2 : 1 0 * * * * * 1 0 Even overlap Odd overlap We need to count how many odd overlaps we have One NSM err

12 Simpler problem We separate the sequence to 4 categories: 1.Starting at odd position and ending at odd position (called OO) 2.Starting at odd position and ending at even position (called OE) 3.Starting at even position and ending at odd position (called EO) 4.Starting at even position and ending at even position (called EE)

13 Simpler problem O O T P 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1-1 1 -1 1 -1 1 -1 The overlap must start with 1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P O(nlogm) – one convolution

14 Simpler problem O O T P 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 -1 1 The overlap must start with 1 -1 1 -1 1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P O(nlogm) – one convolution OO O

15 Simpler problem  We deal with: O? against O? and with ?O against ?O  The same method work for E? against E? and ?E against ?E  We left to deal with: –OE against EO –EO against OE –OO against EE –EE against OO

16 OO against EE O E T P P E E E O E E E E Even overlap Odd overlap We need to recognized when the segment contain one other 1 1 1 1 1 1 1 1 1 1 1 1-1 1-1 1-1 1-1 1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 0 1

17 Simpler problem  We can easily know if we are contained or we contain another segments if we know the segment size.  Smaller segments can’t contain larger segments

18 Simpler problem  Then for each segment we divide the computation against bigger segment and against smaller segments  We do it by computing the answer each time to all segments of size ‘x’

19 Simpler problem  The number of different sizes is at most square root of m

20 What we have  We have an algorithm for the Simpler problem that run in time O(n\sqrt{m}\logm)  We have an algorithm for binary alphabet that run in O(n\sqrt{m}\logm)  With several more techniques we develop an algorithm solving the original problem in O(n\sqrt{m}\logm)

21 Open problem  It is easy to see that our algorithm is at most factor of O(\sqrt{\logm}) from the optimal algorithm (due to redaction to counting mismatches)  But one can try to improve the small alphabet case


Download ppt "Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat."

Similar presentations


Ads by Google