Download presentation

Presentation is loading. Please wait.

Published byEsteban Barkus Modified over 2 years ago

1
Linear-time construction of CSA using o(n log n)-bit working space for large alphabets Joong Chae Na School of Computer Sci. & Eng. Seoul National University, Korea

2
Overview Background Suffix arrays(SA) Compressed suffix arrays (CSA) Problem definition Previous works Our contributions Description of our algorithm Conclusions

3
Background (1) Given a string T of length n over an alphabet Σ, Suffix array (SA) of T [Manber&Myers ’93] Lexicographically sorted list of the suffixes of T i SA T 19$ 28a $ 34a a b b a $ 42a b a a b b a $ 55a b b a $ 67b a $ 73b a a b b a $ 81b a b a a b b a $ 96b b a $ T : b a b a a b b a $ O(n log n) -bits

4
Background (2) Compressed suffix array (CSA) [Grossi&Vitter ’00] Compressed version of SA Space requirement of O(n log|Σ|) -bit FM-index [Ferragina&Manzini 2000] i SA T ΨTΨT 198$ 281a $ 345a a b b a $ 427a b a a b b a $ 559a b b a $ 672b a $ 733b a a b b a $ 814b a b a a b b a $ 966b b a $ T : b a b a a b b a $ O(n log |Σ|) -bits

5
Problem definition Constructing SA, CSA and FM-index using o(n log n) -time and o(n log n) -bit working space Working space Temporary space required for executing an algorithm Not including the space for the input and output

6
Related works Constructing SA and CSA ※ O(n log n) -bit working space Manber & Myers [1993] : O(n log n) -time Kim et al. [2003] : O(n ) -time Kärkkäinen & Sanders [2003] : O(n ) -time Ko & Aluru [2003]: O(n ) -time ※ O(n log |Σ| ) -bit working space Lam et al. [COCOON 2002]: O(|Σ|n log n ) -time Hon et al. [ISAAC 2003]: O(n log n ) -time None of these algorithms satisfy both time and space requirement of our problem.

7
Previous results Hon et al. [FOCS 2003] An algorithm using O(n loglog|Σ|) -time and O(n log|Σ|) -bit working space The first algorithm using o(n log n) -time and o(n log n) -bit working space following ½-recursion (the odd-even scheme)

8
Our contributions Another algorithm using o(n log n) -time and o(n log n) -bit working space O(n) -time and O(n log|Σ|·log |Σ| α n) -bit working space α = log 3 2 ≈ 0.63 The first alphabet-independent linear-time algorithm for constructing SA, CSA, and FM-index using o(n log n) -bit working space Following ⅔-recursion (the skew scheme)

9
Hon et al. vs. Our results Hon et al.Our results Time O(n loglog|Σ|)O(n) Space (bit) O(n log|Σ|)O(n log|Σ|·log |Σ| α n) Scheme½-recursion⅔-recursion (merging)complexsimple (encoding)*implicit *The encoding step is the most complex and time-consuming step in 2/3-recursion. However, both algorithms don’t need the encoding step.

10
Description of our algorithm

11
Overview Preliminaries Basic definitions and notations Main technique Outline of our algorithm

12
Preliminaries-Ψ function T[k..n] : lexicographically the i th smallest suffix of T ■ SA[i] = k ■ i SA T ΨTΨT 198$ 281a $ 345a a b b a $ 427a b a a b b a $ 559a b b a $ 672b a $ 733b a a b b a $ 814b a b a a b b a $ 966b b a $ T : b a b a a b b a $ 1 2 3 4 5 6 7 8 9 The position in SA where T[k+1..n] is stored

13
Preliminaries-Lemmas Text, Ψ → SA, CSA O(n) time, O(n log|Σ|)-bit working space Text, Ψ → C array (BWT) → FM-index O(n) time, O(n log|Σ|)-bit working space Note : goal Text → Ψ Hon et al. [FOCS 2003]

14
Basic def. and not. (1) Residue-1 suffixes of T T[3i-2..n] for 1 ≤ i ≤ n/3 T[1..n], T[4..n], T[7..n],… Residue-2 suffixes of T T[3i-1..n] for 1 ≤ i ≤ n/3 T[2..n], T[5..n], T[8..n],… Residue-3 suffixes of T T[3i..n] for 1 ≤ i ≤ n/3 T[3..n], T[6..n], T[9..n],… 123456789 T[1..n] =babaabba$ babaabba$ aabba$ ba$ abaabba$ abba$ a$ baabba$ bba$ $

15
Basic def. and not. (2) length : ⅔ n alphabet : Σ 3 SA 12 : suffix array of T 12 length : ⅓ n alphabet : Σ 3 SA 3 : suffix array of T 3 123456789 T =babaabba$ 1 2 34 5 67 8 92 3 45 6 78 9 1 T 12 =babaabba$abaabba$ba$b 3 4 56 7 89 1 2 T 3 =baabba$ba alphabet Σ T 12 [1.. ⅔ n] = T[1..n]T[2..n]T[1]T 3 [1.. ⅓ n] = T[3..n]T[1]T[2]

16
Main technique–Ψ’ function Ψ’ is just like Ψ, but Ψ’ is defined in SA 12 and SA 3 Ψ’ points to the position in SA 12 or SA 3 where T[k+1..n] (the next suffix of current suffix T[k..n] ) is stored. ※ Note that Ψ’ is not the Ψ-function of T 12 and T 3. Ψ’-function consists of Ψ’ T 12, and Ψ’ T 3

17
Ψ’ function (residue-1) Ψ’ T 12 (residue-1 suffixes of T) Let T[3k-2..n] be a suffix stored in SA 12 [i]. Then, Ψ’ T 12 [i] is the position in SA 12 where the next suffix T[3k-1..n] is stored. Ψ’ T 12 (residue-2 suffixes of T) Let T[3k-1..n] be a suffix stored in SA 12 [i]. Then, Ψ’ T 12 [i] is the position in SA 3 where the next suffix T[3k..n] is stored. Ψ’ T 3 (residue-3 suffixes of T) Let T[3k..n] be a suffix stored in SA 3 [i]. Then, Ψ’ T 3 [i] is the position in SA 12 where the next suffix T[3k+1..n] is stored.

18
Ψ’ function (residue-1) 123456789 T =babaabba$ 1 2 34 5 67 8 92 3 45 6 78 9 1 T 12 =babaabba$abaabba$ba$b 3 4 56 7 89 1 2 T 3 =baabba$ba i SA 12 Ψ’ T 1 2 161a$b 224aab ba$ 342aba abb a$b 453abb a$b 531ba$ 613bab aab ba$ i SA 3 Ψ’ T 3 136$ba 212baa bba $ba 325bba $ba

19
Ψ’ function (residue-2) Ψ’ T 12 (residue-1 suffixes) Let T[3k-2..n] be a suffix stored in SA 12 [i]. Then, Ψ’ T 12 [i] is the position in SA 12 where the next suffix T[3k-1..n] is stored. Ψ’ T 12 (residue-2 suffixes) Let T[3k-1..n] be a suffix stored in SA 12 [i]. Then, Ψ’ T 12 [i] is the position in SA 3 where the next suffix T[3k..n] is stored. Ψ’ T 3 (residue-3 suffixes) Let T[3k..n] be a suffix stored in SA 3 [i]. Then, Ψ’ T 3 [i] is the position in SA 12 where the next suffix T[3k+1..n] is stored.

20
Ψ’ function (residue-2) 123456789 T =babaabba$ 1 2 34 5 67 8 92 3 45 6 78 9 1 T 12 =babaabba$abaabba$ba$b 3 4 56 7 89 1 2 T 3 =baabba$ba i SA 12 Ψ’ T 1 2 161a$b 224aab ba$ 342aba abb a$b 453abb a$b 531ba$ 613bab aab ba$ i SA 3 Ψ’ T 3 136$ba 212baa bba $ba 325bba $ba

21
Ψ’ function (residue-3) Ψ’ T 12 (residue-1 suffixes) Let T[3k-2..n] be a suffix stored in SA 12 [i]. Then, Ψ’ T 12 [i] is the position in SA 12 where the next suffix T[3k-1..n] is stored. Ψ’ T 12 (residue-2 suffixes) Let T[3k-1..n] be a suffix stored in SA 12 [i]. Then, Ψ’ T 12 [i] is the position in SA 3 where the next suffix T[3k..n] is stored. Ψ’ T 3 (residue-3 suffixes) Let T[3k..n] be a suffix stored in SA 3 [i]. Then, Ψ’ T 3 [i] is the position in SA 12 where the next suffix T[3k+1..n] is stored.

22
Ψ’ function (residue-3) 123456789 T =babaabba$ 1 2 34 5 67 8 92 3 45 6 78 9 1 T 12 =babaabba$abaabba$ba$b 3 4 56 7 89 1 2 T 3 =baabba$ba i SA 12 Ψ’ T 1 2 161a$b 224aab ba$ 342aba abb a$b 453abb a$b 531ba$ 613bab aab ba$ i SA 3 Ψ’ T 3 136$ba 212baa bba $ba 325bba $ba

23
Framework- outline How to construct Ψ function of T Bottom-up approach Ψ T Ψ T T 12 Ψ T 12 … Use any linear time construction algorithm step 0 step 1 … step h h = log 3 log |Σ| n lengthalphabet step i

24
Step i - outline S S 12 Ψ S 12 S3S3 Ψ S 12 (from step i+1) Ψ’ S 12 Ψ’S3Ψ’S3 → Ψ’ S 12 Ψ’S3Ψ’S3 ΨSΨS merge ΨSΨS

25
Merging step i SA 12 Ψ’ T 1 2 161a$b 224aab ba$ 342aba abb a$b 453abb a$b 531ba$ 613bab aab ba$ i SA 3 Ψ’ T 3 136$ba 212baa bba $ba 325bba $ba i SA T ΨTΨT 198$ 281a$ 355aabba$ 427abaabba$ 559abba$ 672ba$ 733baabba$ 814babaabba$ 966bba$ba * Comparing entries of SA 12 with entries of SA 3 in order - compare two suffixes by following Ψ’- functoin at most twice

26
Conclusions & future works We presented an alphabet-independent linear- time algorithm to construct SA, CSA, FM-index using o(n log n) -bit working space Future works To Construct SA, CSA, and FM-index optimally, i.e., using O(n) -time and O(n log|Σ|) -bit working space

Similar presentations

OK

Wavelet Trees Ankur Gupta Butler University. Text Dictionary Problem The input is a text T drawn from an alphabet Σ. We want to support the following.

Wavelet Trees Ankur Gupta Butler University. Text Dictionary Problem The input is a text T drawn from an alphabet Σ. We want to support the following.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on industrial development in gujarat Free convert pdf to ppt online Ppt on eddy current brake Ppt on human chromosomes number Ppt on stages of group development Ppt on current account deficit canada Ppt on email etiquettes presentation skills Ppt on rainwater harvesting free download Attractive backgrounds for ppt on social media Ppt on data collection methods