Presentation is loading. Please wait.

Presentation is loading. Please wait.

Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.

Similar presentations

Presentation on theme: "Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo."— Presentation transcript:

1 Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo

2 Background: Succinct Data Structures  What are succinct data structures (Jacobson 1989) Representing data structures using ideally information-theoretic minimum space Supporting efficient navigational operations  Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric

3 Strings: Definitions  Notation Alphabet: [σ]={1, 2, …, σ} String: S[1..n]  Operations: access(i): S[i] rank( α, i): number of occurrences of α in S[1..i] select( α, i): position of the i th occurrence of α in S

4 Strings: An Example S = a a b a c c c d a d d a b b b c string_access(8) =d string_rank(a, 8) =3 string_select(b, 3) =14

5 Succinct Representations of Strings  Information-theoretic minimum: n lg σ bits  Succinct representation (Grossi et al. 2003) Space: n H 0 +o(n)∙lg σ bits Time: O(lg σ) There are many more results.  The case in which σ = 2 (bit vector) is even more fundamental! Jacobson 1989

6 Applications of Strings and Bit Vectors  Ordinal trees on n nodes Standard approach: 3n lg n bits Succinct data structures: 2n + o(n) bits (Jacobson 1989, Munro & Raman 1997, Benoit et al. 1999…)  Full text indexes for text string from [σ] n Suffix trees can use as much as 4n lg n to 6n lg n bits! Succinct data structures: n lg σ +o(n lg σ) bits ( Grossi et al. 2003, González and Navarro 2009… )  Labeled trees, planar graphs, binary relations, permutations, functions, …

7 Our Problem: Dynamic Strings  Motivation: In many applications, data are also updated frequently  For strings, we also consider the following update operations: insert( α, i), which inserts character α between S[i-1] and S[i] delete(i), which deletes S[i] from S

8 Comparisons Space (bits)Access, rank and select Insert and delete Gupta et al. 2007 n lg σ +lg σ∙(o(n)+O(1)) O(lg lg n)O(n ε ) amortized Mäkinen & Navarro 2008 n H 0 +o(n)∙lg σO(lg n lg σ) Lee & Park 2009 n lg σ +o(n)∙lg σ González and Navarro 2009 n H 0 +o(n)∙lg σ This papern H 0 +o(n)∙lg σ O(lg n ( ──── + 1)) lg σ lg lg n O(lg n ( ──── + 1)) lg σ lg lg n amortized O(lg n ( ──── + 1)) lg σ lg lg n O(lg n ( ──── + 1)) lg σ lg lg n O(──── ( ──── + 1)) lg σ lg lg n lg n lg lg n O(──── ( ──── + 1)) lg σ lg lg n lg n lg lg n For the special cases in which σ = polylog (n) or 2 (bit vector!), our results also improve previous results

9 Searchable Partial Sums  Data A sequence Q of n nonnegative integers  Operations sum(i): Q[1] + Q[2] + … + Q[i] search(x): the smallest i such that sum(i) ≥ x update(i, δ): Q[i] ← Q[i] + δ  Raman et al. 2001 Assumptions: |Q| = O(lg ε n), |δ| ≤ lg n Space: O(lg 1+ε n) bits, with a universal table of size O(n ε’ ) bits Operations: O(1) time

10 Collections of Searchable Partial Sums  Data d sequences of k-bit nonnegative integers of length n each  Operations sum, search, update: supported on each sequence insert, delete: operated simultaneously on the same positions of all the sequences, but only 0’s can be inserted or deleted  González and Navarro 2009 (CSPSI) 8 2 9 5 11 9 0 7 3 6 1 5 3 12 4 5 12 0 3 1 19 0 4 2 8 3 5 4 1 0 000000 sum(2, 5) =25insert(6)delete(6)

11 Our results on CSPSI  Assumptions d = O(lg η n) |δ| ≤ lg n  Space O(kdn + w) bits, where w is the word size Buffer: O(n lg n) bits  Time All operations: O ( ──── ) lg n lg lg n

12 Data Structures for Dynamic Strings Over a Small Alphabet of size O(lg 1/2 n)  Main data structure: a B-tree constructed over S  Leaf Each leaf stores a superblock of at most 2L bits which encodes a substring of S (L = ) The numbers of occurrences of each character in all the superblocks form an integer sequence Maintain the above sequences for all the characters in the alphabet in a CSPSI structure E  Internal node v (lg 1/2 n ≤ degree(v) ≤ 2lg 1/2 n) U(v): U(v)[i] = number of leaves of the subtree rooted at the i-th child of v I(v): I(v)[i] = number of characters stored in the subtree rooted at the i-th child of v ──── lg 2 n lg lg n

13 Supporting Queries  rank( α, i) Perform a top-down traversal with the help of I(v)’s Locate the superblock, j, containing S[i] with the help of U(v)’s Perform sum( α, j) operation on E to count the number of occurrences of α in superblocks 1, 2, … j-1 Read superblock j in blocks of size (lg n) / 2 bits  The support for access and select is similar v ……

14 Insert, delete and deamortization  Supporting insert and delete requires traversing and updating the B-tree and updating E  It is however much more complicated Merging and splitting B-tree nodes Deamortization

15 Succinct Global Rebuilding  A key technique for deamortizing operations on B-trees is global rebuilding (Overmars and van Leeuwen 1981)  Global rebuilding Rebuild the B-tree after the number of update operations performed exceeds half the initial length of the string A new copy and an old copy of the B-tree: more space A buffer of O(n lg n) bits is required  Succinct global rebuilding Only one copy of the data: no duplication During rebuilding, queries and updates are performed on either the new part or the old part No buffer required

16 Putting Everything Together  Dynamic strings over an alphabet of size O(lg 1/2 n) Space: n H 0 +o(n)∙lg σ bits Time:  This can be extended to general alphabets using wavelet trees Space: n H 0 +o(n)∙lg σ bits Time:  When σ = polylog (n) or 2 (bit vectors) Space: n H 0 +o(n)∙lg σ bits Time: O ( ──── ) lg n lg lg n O(──── ( ──── + 1)) lg σ lg lg n lg n lg lg n O ( ──── ) lg n lg lg n

17 Applications  Dynamic text collections Data: a collection of text strings Operations  Pattern search  Display a substring  Insert/delete a text string  Compressed construction of full-text indexes Working space: n H k +o(n)∙lg σ bits Time: O(──── ( ──── + 1)) lg σ lg lg n n lg n lg lg n

18 Conclusions  We designed a succinct representation of dynamic strings that provide more efficient operations than previous results  This structure can be directly applied to improve previous results on text indexing  We expect our results to play an important role in the design of dynamic succinct data structures  We expect succinct global rebuilding to be useful for the deamotization of algorithms on dynamic succinct data structures

19 Thank you!

Download ppt "Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo."

Similar presentations

Ads by Google