Presentation is loading. Please wait.

Presentation is loading. Please wait.

Succinct Data Structures

Similar presentations


Presentation on theme: "Succinct Data Structures"— Presentation transcript:

1 Succinct Data Structures
Kunihiko Sadakane National Institute of Informatics

2 Range Min-Max Trees [5] In existing succinct data structures for trees, for each operation to be supported, a new index is added. The o(n) term cannot be ignored. The recursive method [6] uses 3.73n bits to support only findopen, findclose, enclose. It is preferable if various operations can be supported by an index

3 Definitions For a vector P[0..2n-1] and a function g
RMQ, RMQi are defined similarly (range maximum)

4 How to support operations on balanced parentheses sequence
Lemma: Let  be a function s.t. (() = 1, ()) = 1 (()((()())())(()())()) P E findclose enclose

5 Implementing rank/select
Let ,  be functions s.t.  (0)=0,  (1)=1,  (0)=1,  (1)=0 rank/select and parentheses operations can be handled in a unified manner.

6 Range Min-Max Tree Divide the excess array E into blocks of length s
Each leaf of range min-max tree corresponds to a block, and stores min/max values in the block. Internal nodes have l children and stores min/max values of the children. (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4 s = l = 3

7 Properties of Range Min-Max Trees
Each node corresponds to a range of the array. Any range of the array is represented by a disjoint union of O(lh) ranges corresponding to internal nodes and at most two ranges corresponding to leaves. (h: tree height) (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4 s = l = 3

8 Properties of Excess Array
For each i, E[i+1] = E[i]1 or E[i]+1 Let the min/max of E[u,v] be a and b, then in the range all integers e s. t. a  e  b exist, and other values do not exist. In a range of length l, the difference between min and max is at most l1.            ⇒ values can be stored in fewer bits (()((()())())(()())()) P E 2 1 3 8 4 5 6 7 9 10 11

9 Computation of fwd_search(E,i,d)
Divide the range E[i+1,N1] (N: array length) Scan the divided ranges from left to right to find the range containing E[i]+d O(lh+s) time (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4

10 The case the array is short (polylog)
Let w be the word length (bits) of CPU Lemma If N < wc, fwd_search is done in O(c2) time, and the data structure size is N + O(Nc/w) + exp(w) bits. Proof  Excess values are between wc and wc ⇒ O(c log w) bits. w/(c log w) values can be read simultaneously.  If the branching factor l of the range min-max tree is w/log w, ⇒ the hight of the tree is O(c).  Searching a child takes O(c) time.

11 Computation of LCA lca(v,w) = parent(RMQ(v,w)+1)
RMQ: the position of minimum value in E[v,w] Constant time using the range min-max tree The maximum-depth node is found similarly (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4

12 Computation of Degree Let [v,w] be the range of E corresponding to a node v deg(v) = (# of minimum values in E[v+1,w1]) In each node of the range min-max tree, store the number of minimum values in the range. i-th child is also found. (()((()())())(()())()) E 2 1

13 The case the array is long
Divide the sequence into blocks of length wc Let M1,…, Mt, m1,…, mt be max/min values of the blocks To compute fwd_search(E,i,d), if E[i]+d < (the minimum value of the block containing i), the block containing the answer is the first block j with mj < E[i]+d

14 Other Queries RMQ is done by the sparse table algorithm
Because the number of blocks is small (n/wc), the space can be ignored. Theorem: There exists a data structure supporting all known operations on ordered trees in O(1) time using 2n + O(n/log n) bits.

15 Further Recuding the Space
Use “Succincter” [7] augmented B-tree B-tree for array A[1..n] For each node, a value is added Values are computed from those of child nodes and subtree size Range Min-Max Tree is an augmented B-tree Theorem: 2n + O(n/logc n) bits (c > 0 is an arbitrary constant.)

16 Applications of Ordered Trees
An abstract data type for “dictionary” A data structure D is called dictionary if for a set S and a key k, it supports Search(D, k): returns Yes iff k  S Insert(D, x): adds x to S Delete(D, x): removes x from S A binary search tree is used as a dictionary. Any balanced binary search tree supports the above operations in O(log n) time for a set of n elements. We assume that element comparison is done in O(1) time. 5 1 7 8

17 Tries Trie is a data structure for storing a set of strings.
A node has at most  children (: alphabet size) Each edge is labeled a character c The concatenation of characters on edges from the root to a node coincides with the string represented by the node c d a o b t c g A Trie for S = {cab, cat, do, doc, dog}

18 Compressed Tries Compressed Tries are obtained from standard Tries by compressing chains of redundant nodes redundant node = a node with only one child #nodes  #leaves1 An edge represents a string c d a o b t c g S = {cab, cat, do, doc, dog}

19 Operations on Trees Binary search trees Tries Compressed tries
left(v), right(v): returns the left/right child node of v key(v): returns the key stored in node v Tries child(v, c): returns a child w of v with edge label c Compressed tries child(v, c): returns a child w of v with edge label c… edge(w, d): returns d-th character on edge pointing to w key(v) v a child(v, a) = w edge(w, 2) = b b w

20 LOUDS Representation Degrees of nodes are encoded by unary codes in breadth-first order degree d → 1d0 2n+1 bits for n nodes (matches the lower bound) i-th node is represented by i-th 1 (ones-based numbering) 2 3 8 1 7 4 6 5 L 1 2 3 4 5 6 7 8 LOUDS

21 Tree Navigational Operations (1)
i-th node: select1(L, i) (i  1) firstchild(x) y := select0(rank1(L,x))+1 if L[y] = 0 then 1 else y lastchild(x) y := select0(rank1(L,x)+1)1 2 3 8 1 7 4 6 5 L 1 2 3 4 5 6 7 8 LOUDS

22 Tree Navigational Operations (2)
sibling(x) if L[x+1] = 0 then 1 else x+1 parent(x) = select1(rank0(L,x)) degree(x) = lastchild(x)  firstchild(x) + 1 Merits: implemented by only rank/select Demerits: cannot compute subtree sizes 2 3 8 1 7 4 6 5 1 2 3 4 5 6 7 8 LOUDS L

23 Tries using LOUDS 10110111011000000 child(v, c)
w = firstchild(v), r = rank1(L, w), k = 0 while (L[w+k] != 0) { if (C[r+k] == c) return w+k k = k+1 } Binary search can be also used key(v) is stored in array indexed with rank1(L, v) 2 3 8 1 7 4 6 5 a c d x y z L 1 2 3 4 5 6 7 8 LOUDS C _ a c x y z a d

24 By using an auxiliary array, key(v) is performed in O(1) time.
A trie with n nodes and size- label set supporting child(v, c) in O(log ) time is represented in n(2+log ) + o(n) bits. O() time sequential search is faster for small . By using an auxiliary array, key(v) is performed in O(1) time.

25 BP Representation ((()()())(()()))
Each node is represented by a pair of matching open and close parentheses 2n bits for n nodes The size matches the lower bound 2 6 8 1 7 3 5 4 P ((()()())(()())) BP

26 Tree Navigational Operations
parent(v) = enclose(P,v) firstchild(v) = v + 1 sibling(v) = findclose(P,v) + 1 lastchild(v) = findopen(P, findclose(P,v)1) 1 enclose findclose 2 3 8 11 1 2 3 4 5 6 7 8 9 10 11 4 (()((()())())(()())()) 7 9 10 5 6

27 Tries using BP ((()()())(()())) child(v, c)
w = firstchild(v) while (w != NIL) { if (C[rank((P, w)] == c) return w w = sibling(w) } key(v) is stored in array indexed with rank((P, v) 2 6 8 1 7 3 5 4 a c x y z d 1 2 3 4 5 6 7 8 BP P ((()()())(()())) C _ a x y z c a d

28 DFUDS Representation ((()((())))(()))
It encodes the degrees of nodes in unary codes in depth-first order (DFUDS = Depth First Unary Degree Sequence) Degree d ⇒ d (’s, followed by a ) Add a dummy ( at the beginning 2n bits DFUDS is balanced 1 2 6 3 4 5 7 8 DFUDS U ((()((())))(())) 1 2 3 4 5 6 7 8

29 i-th child v U1 U2 U3 (((()(())))((()))) v 1 2 6 5 3 4 7 8 9

30 Tries using DFUDS ((()((())))(())) child(v, c)
r = rank((U, v), k = 0 while (U[v+k] != ‘)’ ) { if (C[r+k] == c) return child(v, k+1) k = k+1 } O() time O(log ) is possible 2 6 8 1 7 3 5 4 a c x y z d DFUDS U ((()((())))(())) 1 2 3 4 5 6 7 8 C _ a c x y z a d

31 References [1] 定兼邦彦, 渡邉大輔. 文書列挙問題に対する実用的なデータ構造. 日本データベース学会Letters Vol.2, No.1, pp [2] Michael A. Bender, Martin Farach-Colton: The LCA Problem Revisited. LATIN 2000: 88-94 [3] Kunihiko Sadakane: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1): (2007) [4] Johannes Fischer: Optimal Succinctness for Range Minimum Queries. LATIN 2010: [5] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct Trees. SODA 2010: [6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. Theoretical Computer Science, 368:231–246, December 2006. [7] Mihai Pătraşcu. Succincter, Proc. FOCS, 2008.


Download ppt "Succinct Data Structures"

Similar presentations


Ads by Google