Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:

Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter: Yung-Hsing Peng Date: 2005.03.22

Example for the problem Let T = a 5 c 6 a 1 c 1 a 4 b 3 (run length coding) P1 = a 2 c P2 = c 1 a 1 b 1 δ is the scaling function with parameter k If k = 2, we have δ 2 (P1) = a 4 c 2, δ 2 (P2) = c 2 a 2 b 2  δ 2 (P1) can be found in T, so P1 is a valid pattern In this example, P2 is not a valid pattern since it failed to every k.

Algorithm for Discrete Scaling For every positive integer k, construct a new string T k for T take x y for example, if y is divisible by k, then replace it by x (y/k), else replace it by x (y/k) $ x (y/k) ex: T = a 5 c 6 a 1 c 1 a 4 b 3 (with max repeat m = 6) T 1 = a 5 c 6 a 1 c 1 a 4 b 3 T 2 = a 2 $a 2 c 3 $$a 2 b 1 $b 1 T 3 = a 1 $a 1 c 2 $$a 1 $a 1 b 1 T 4 = a 1 $a 1 c 1 $c 1 $a 1 $ T 5 = a 1 c 1 $c 1 $$$$ T 6 = $c 1 $$$$ Theorem: Let P be a valid pattern, then P must be find in T 1 $T 2 $T 3 ……$T m

An Efficient Method to Build T k Use T k-1 to compute T k (use the index I k ) ex: T = a 5 c 6 a 1 c 1 a 4 b 3 (with max repeat m = 6) 1 2 3 4 5 6 (index) I 1 = {1,2,3,4,5,6} T 1 = a 5 c 6 a 1 c 1 a 4 b 3 I 2 = {1,2,5,6} T 2 = a 2 $a 2 c 3 $$a 2 b 1 $b 1 I 3 = {1,2,5,6} T 3 = a 1 $a 1 c 2 $$a 1 $a 1 b 1 I 4 = {1,2,5} T 4 = a 1 $a 1 c 1 $c 1 $a 1 $I 5 = {1,2} T 5 = a 1 c 1 $c 1 $$$$I 6 = {2} T 6 = $c 1 $$$$I 7 = {} For any I k, there are at most (n/k) elements  |I 1 | + |I 2 | + |I 3 | + …. |I m | = nlogm  T 1 $T 2 $T 3 $...$T m can be built in O(nlogm)

Time Complexity of Discrete Scaling Lemma: T 1 $T 2 $T 3 …$T m can be built in O(nlogm) Lemma: For each T k, its length is O(n/k)  The length of T 1 $T 2 $T 3 …$T m is O(n/1 + n/2 + n/3 + ….+ n/m) = O(nlogm)  The suffix tree of T 1 $T 2 …$T m can be built in O(nlogm) where n is the length of T and m is the max repeat length of characters in T

Algorithm for the Decision Version of the Real Scaling (1/2) For every critical real number k, construct a new string T k for T Since the input pattern P is discrete in its run length coding  We can find all critical k by division. Ex: a 5 c 6 a 1 c 1 a 4 b 3  (1) divided by 1  {5, 6, 1, 4, 3} (2) divided by 2  {2.5, 3, 2, 1.5} (3) divided by 3  {1.66, 2, 1.33, 1} (4) divided by 4  {1.25, 1.5, 1} (5) divided by 5  {1, 1.2} (6) divided by 6  {1} If m is the max repeats in P, then the set Γ(T) of critical k can be computed by the union of (1)~(m)

Algorithm for the Decision Version of the Real Scaling (2/2) For all critical k in Γ(T), construct a new string T k for T take x y for example, if y is k-invertible, then replace it by x Ф(y, k), else replace it by x Ф(y, k) $ x Ф(y, k) where Ф(y, k) means the largest integer r that floor(k*r) ≤ y ex: T = a 5 c 6 a 1 c 1 a 4 b 3 (with max repeat m = 6) if k = 1.5, then T k = a 3 $a 3 c 4 $$a 3 b 2 Theorem: Let P be a valid pattern, then P must be find in T k1 $T k2 $T k3 ……$T kz, where z is the number of critical k In above example, if k = 1.7 then T k would be a 3 c 4 $$a 2 $a 2 b 2  The position of δ 1.7 (a 3 c 4 ) in T is different from that of δ 1.5 (a 3 c 4 ) in T  This algorithm can only solve the decision version of real scaling.

Time Complexity of Decision Version of Real Scaling Lemma: In worst case, the total number of critical k is O(n) Lemma: Each T ki can be computed in O(n) Lemma: T k1 $T k2 $T k3 ……$T kz can be built in O(n 2 )

Algorithm for the Real Scaling (1/4) Core: Generate all valid patterns and use them to build a Real Scale Indexing Tree (RSIT) to speed up searching.

Algorithm for the Real Scaling (2/4) The upper bound for the number of all valid patterns Since there are O(n 3 ) patterns, straightforward implementations would take O(n 4 ) in order to insert all patterns into RSIT. This paper gives an O(n 3 ) algorithm for doing so.

Algorithm for the Real Scaling (3/4) P*(g, l)  used to shrink the longest substring start from l, which can be shrink by g EX: T = a a a a b b b c c c a a a a, P = b c l = 4 P*(3,4) = b c a (means the red region shrinks by 3)  P is a prefix of P*(3,4)

Algorithm for the Real Scaling (4/4)

Conclusion of Real Scaled Indexing Problem

Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:

Similar presentations

Presentation on theme: "Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:

Similar presentations

Presentation on theme: "Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:"— Presentation transcript:

Similar presentations

About project

Feedback