Download presentation

Presentation is loading. Please wait.

Published byAbbie Blong Modified about 1 year ago

1
Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith

2
Outline Introduction Preliminaries Linear-Time solution for constant d Related Problems Linear-Time solution for fixed k Conclusion

3
Intro : Problem Definition Input: String s 1, s 2, …, s k over alphabet Σ of length L each, and a nonnegative integer d. Question: Is there a string s of length L such that d H (s, s i )≤d for all i=1,…,k d H (s 1, s 2 ) = |{i|s 1 [i]≠s 2 [i]}|, |s 1 |=|s 2 |

4
NP-completeness CLOSEST STRING is NP-complete d is usually small in biological applications O(kL+kd*d d ) result in this paper PTAS by Li et al

5
Extended problems d-MISMATCH DISTINGUISHING STRING SELECTION DISTINGUISHING SUBSTRING SELECTION

6
Preliminaries Given a set of string S={s 1, …,s k }, each of length L s is optimal center string iff no s ’ such that max i=1, …,k d H (s ’,s i )

7
Given a set of k strings of length L, think of this string as k x L matrix Optimal median string : a c c a s1abcd s2aadb s3bcda s4accc

8
Main idea Search! Fixed-parameter tractibility Reduction to problem kernel

9
LEMMA 1. Given a set of strings S={s 1, …,s k }, each of length L, and a permutationσ:{1,…,L} {1,…,L}. Then s is an optimal center string for {s 1,…,s k } iff σ(s) is an optimal center string for {σ(s 1 ), σ(s 2 ), …, σ(s k )}

10
LEMMA 2. To compute an optimal center string, it is sufficient to solve a normalized and reordered instance. From this, the solution of the original instance can be derived in linear time s1abcd s2aadb s3bcda s4accc s1abaa s2acbb s3babc s4aaad s1baaa s2cabb s3abbc s4aaad

11
LEMMA 3. A CLOSEST STRING instance with arbitrary alphabet Σ, |Σ|>k, isomorphic to a CLOSEST STRING instance with alphabet Σ’, |Σ’|=k. By normalization

12
LEMMA 4. Given a CLOSTEST STRING instance s 1, …,s k of length L and d. If the resulting k x L matrix has more than kd dirty dirty columns, then there is no string s with max i=1, …,k d H (s,s i )≤d A column is dirty iff it contains at least two different symbols from alphabet Σ By pigeon theorem

13
A Linear-Time solution for constant d Bounded search tree algorithm LEMMA 5. Given a set of strings S={s 1, …,s k } and a positive integer d. If there are i, j {1, …,k} with d H {s i,s j }>2d, then there is no string s with max i=1, …,k d H (s, s i )≤d

15
Theorem 1. Given a set of string S={s 1, …,s k } and d, Algorithm D determines in O(kL+kd*d d ) time. By lemma 4, reduced the input instance to O(kd) in O(kL) time Depth=d, Time(D0+D1+D2+D3)=kd by building a table containing the distances of candidate s 1 to all other given strings

16
correctness Show only the correctness of first step If s 1 is not a solution but there exists a center string s P :={p|s 1 [p]≠s i [p]}, |P|=d+1 P s1≠s=s i := {p|s 1 [p]≠s[p]=s i [p]} goal! P s1≠s=si =P s≠si ∪ P (disjoint), |P s≠si |≤d So d+1 subcases is sufficient

17
Related Problems d-MISMATCH problem S i,p,L denote the length L substring of a given string s i starting at position p Whether there is a string of length L and a position p with 1≤p≤n-L+1, such that d H (s,s i,p,L )≤d, for all I Stojanvoic et al give a linear time algorithm fo 1-MISMATCH Theorem 2. d-MISMATCH is solvable in O(kL+(n- L)kd*d d ) time which O(n*k) for fixed d Naively: O(n*(KL+kd*d d )) Maintain the queue of dirty columns Considering only the first L columns, we can build a FIFO queue in O(kL) Update at each position in O(k) time

18
DSS problem DISTINGUISHING STRING SELECTION Given S={s 1, …,s k1 }, S ’ ={s ’ 1, …,s ’ k2 } all of the same length L, and d 1,d 2 ≥0, is there a s such that LEMMA 6. Given two set of strings S 1 ={s 1,…,s k1 } and S 2 ={s’ 1,…,s’ k2 } and positive d1,d2. If there are i {1, …,k 1 } and j {1, … k 2 } with d H (s i,s ’ j )

19
A Linear-Time Solution for Fixed k Is CLOSEST STRING fixed parameter tractable? Use integer linear programming (ILP) Lenstra: ILP with a fixed number of variables can be solved in linear time(exponential space)

20
CLOSEST STRING in ILP Column types for k For k=3: (a,a,a) t, (a,a,b) t, (a,b,a) t, (b,a,a) t, (a,b,c) t |column types|=B(k)≤k! X t,φ, t: column type, φ Σ Number of column type t whose corresponding character in the desired solution string of CLOSEST STRING is set to φ B(k)*k Variables needed Minimize Φ t,i denates the alphabet symbol at the i th entry of column type t

21
Conclusion Fixed parameter tractability for CLOSEST STRING in d, k Improve previous work in d-MISMATCH DSS CLOSEST SUBSTRING ?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google