Presentation is loading. Please wait.

Presentation is loading. Please wait.

Succinct Data Structures

Similar presentations


Presentation on theme: "Succinct Data Structures"— Presentation transcript:

1 Succinct Data Structures
Kunihiko Sadakane National Institute of Informatics

2 BP Representation [3] ((()()())(()()))
Each node is represented by a pair of matching open and close parentheses 2n bits for n nodes The size matches the lower bound 2 6 8 1 7 3 5 4 P ((()()())(()())) BP

3 Data Structure for findclose [4]
Divide the parentheses sequence into blocks of length B = ½ log n b(p): block number containing p (p): position of parenthesis matching p parenthesis p is said to be far ⇔ b(p)  b((p)) Far open parenthesis p is said to be opening pioneer ⇔ For the far open parenthesis q which immediately precedes p, b((p))  b((q)) Represent positions of parentheses which match with opening pioneers are represented by 0,1 vector ( ( ) ) ) p (p) (q) q r ( (r)

4 Lemma: Let  denote the number of blocks
Lemma: Let  denote the number of blocks. Then the number of opening pioneers is at most 23. Proof: A graph whose nodes correspond to the blocks and whose edges are (b(p), b((p)) is an outer-planar graph. Opening/closing pioneers form a BP again.  = n/B = 2n/log n ⇒ Length of BP is O(n/log n)

5 Representing Recursive Structure
opening pioneers and their matching parentheses are represented by a 0,1 vector B B is a sparse vector of length 2n with O(n/log n) 1’s Can be represented in O(n log log n/log n) bits ( ( ) ) ) p (p) (q) q r ( (r) P B 0100 0101 0000 0000 0010 1001 P1 ((()))

6 Let S(n) denote the size of BP representation for an n node tree
S(n) = 2n + O(n log log n/log n) + S(O(n/log n)) If the number of nodes becomes O(n/log2 n), a naïve data structure which stores all the answers uses only O(n/log n) bits Therefore S(n) = 2n + O(n log log n/log n)

7 Algorithm for findclose
To compute (p) = findclose(P,p) If p is not far, (p) is computed by a table Find the pioneer p* that immediately precedes p Find (p*) using the BP for pioneers If p is not pioneer, b((p))  b((p*)) The position of (p) is determined from the difference between depths of p and p* p* p (p) (p*) ( ( ) )

8 enclose Let (p) = enclose(P,p)
If b((p)) = b(p), (p) is found from a table If b((p))  b(p), store those positions also store positions of matching parentheses if there are more than one pairs of parentheses, store only the outermost one Recur for extracted parentheses ( ( (()))( ) ) )

9 Additional Basic Operations on BP
rankp(P,i): number of pattern p in P[1..i] selectp(P,i): position of i-th occurrence of p in P If the length of p is constant, rank/select is done in O(1) time 1 1 2 3 4 5 6 7 8 9 10 11 2 3 11 8 P (()((()())())(()())()) 4 7 9 10 rank()(P,10) = 3 5 6

10 Operations on Leaves [5]
Each leaf is represented by()in BP Position of i-th leaf = select()(P, i) Number of leaves in a subtree, leftmost/rightmost leaf in a subtree are also found 1 1 2 3 4 5 6 7 8 9 10 11 2 3 11 8 P (()((()())())(()())()) 4 7 Subtree rooted at 3 9 10 5 6

11 Node Depths Define excess array E[i] = rank((P,i)  rank)(P,i)
depth(v) = E[v] E is not explicitly stored; it can be computed by the rank index on P (()((()())())(()())()) P E 2 1 3 8 4 5 6 7 9 10 11

12 Lowest Common Ancestor (lca)
lca = lowest common ancestor u = lca(v,w): common ancestor of v and w which is furthest from root Found in O(1) time v w u

13 (()((()())())(()())()) 1212343432321232321210 u v m w
u = parent(RMQE(v,w)+1) E is the excess array, which represents node depths m = RMQE(v,w): the index of a minimum value in E[v..w] (RMQ = Range Minimum Query) u 146 3 5 2 7 1 4 6 w 1 7 3 2 1 3 5 5 2 4 6 v P (()((()())())(()())()) E u v m w

14 DFUDS Representation [6]
It encodes the degrees of nodes in unary codes in depth-first order (DFUDS = Depth First Unary Degree Sequence) Degree d ⇒ d (’s, followed by a ) Add a dummy ( at the beginning 2n bits 1 2 6 3 4 5 7 8 DFUDS U ((()((())))(())) 1 2 3 4 5 6 7 8

15 Proof: For n = 1, the root has no children (degree 0).
Lemma: The DFUDS of an n node ordered tree forms a balanced parentheses sequence of length 2n. Proof: For n = 1, the root has no children (degree 0). Its DFUDS is (). Assume that for any tree with at most n1 nodes, the lemma holds. Let U1, U2,..., Up denote the DFUDS for p trees. (Summation of numbers of nodes is n1, total length of their DFUDS’s is 2n2) Consider a tree whose root has those trees as its children. The DFUDS U for this tree is Ui whose dummy parenthesis at the head is removed Degree of root = p Head dummy parenthesis

16 From the assumption of the induction, Ui is balanced.
Because the head open parenthesis is removed, it lacks an open parenthesis to be balanced. The head dummy open parenthesis of U and the parentheses sequence for the root node ((p) have p open parentheses unbalanced. Therefore U is balanced. The number of nodes is n and the length of the sequence is 2n. This proves the lemma. Ui whose dummy parenthesis at the head is remove Degree of root = p Head dummy parenthesis


Download ppt "Succinct Data Structures"

Similar presentations


Ads by Google