Succinct Data Structures

Slides:



Advertisements
Similar presentations
Space-Efficient Static Trees and Graphs Guy Jacobson IEEE Symposium on Foundations of Computer Science, 1989 Speaker: 吳展碩.
Advertisements

1 Succinct Representation of Labeled Graphs Jérémy Barbay, Luca Castelli Aleardi, Meng He, J. Ian Munro.
Two Segments Intersect?
Distance and Routing Labeling Schemes in Graphs
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Succinct Data Structures for Permutations, Functions and Suffix Arrays
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
5th July 2004CPM A Simple Optimal Representation for Balanced Parentheses Richard Geary, Naila Rahman, Rajeev Raman (University of Leicester, UK)
Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
A New Compressed Suffix Tree Supporting Fast Search and its Construction Algorithm Using Optimal Working Space Dong Kyue Kim 1 andHeejin Park 2 1 School.
Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT.
Compressed Compact Suffix Arrays Veli Mäkinen University of Helsinki Gonzalo Navarro University of Chile compact compress.
A Categorization Theorem on Suffix Arrays with Applications to Space Efficient Text Indexes Meng He, J. Ian Munro, and S. Srinivasa Rao University of Waterloo.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
1 Data structures for Pattern Matching Suffix trees and suffix arrays are a basic data structure in pattern matching Reported by: Olga Sergeeva, Saint.
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Compact Representations of Separable Graphs From a paper of the same title submitted to SODA by: Dan Blandford and Guy Blelloch and Ian Kash.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
Succinct Representations of Trees
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Summer School '131 Succinct Data Structures Ian Munro.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Succinct Dynamic Cardinal Trees with Constant Time Operations for Small Alphabet Pooya Davoodi Aarhus University May 24, 2011 S. Srinivasa Rao Seoul National.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Compact Encodings of Graphs Shin-ichi Nakano (Gunma Univ.) Gunma.
Compressed Prefix Sums O’Neil Delpratt Naila Rahman Rajeev Raman.
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
Joint Advanced Student School Compressed Suffix Arrays Compression of Suffix Arrays to linear size Fabian Pache.
© University of Auckland Trees – (cont.) CS 220 Data Structures & Algorithms Dr. Ian Watson.
Counting II: Recurring Problems And Correspondences Great Theoretical Ideas In Computer Science John LaffertyCS Fall 2005 Lecture 7Sept 20, 2005Carnegie.
Graphs Slide credits:  K. Wayne, Princeton U.  C. E. Leiserson and E. Demaine, MIT  K. Birman, Cornell U.
Graph Connectivity This discussion concerns connected components of a graph. Previously, we discussed depth-first search (DFS) as a means of determining.
ProblemData StructuresLower Bound Preprocess a set of N 3-dimensional points into an I/O-efficient data structure, such that all points inside an axis.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Succinct Data Structures
Mehdi Kargar Department of Computer Science and Engineering
Succinct Data Structures
Succinct Data Structures
Tries 07/28/16 11:04 Text Compression
Succinct Data Structures
Succinct Data Structures
Succinct Data Structures
Distributed Maintenance of Spanning Tree using Labeled Tree Encoding
Efficient processing of path query with not-predicates on XML data
Discrete Methods in Mathematical Informatics
Persistent Data Structures (Version Control)
Succinct Data Structures: Upper, Lower & Middle Bounds
Succinct Data Structures
Reducing the Space Requirement of LZ-index
Lecture Trees Chapter 9 of textbook 1. Concepts of trees
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Randomized Algorithms CS648
Distance and Routing Labeling Schemes in Graphs
Week nine-ten: Trees Trees.
Discrete Methods in Mathematical Informatics
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Succinct Representation of Labeled Graphs
Trees Addenda.
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Succinct Data Structures
Presentation transcript:

Succinct Data Structures Kunihiko Sadakane National Institute of Informatics

Succinct Data Structures for Trees Consider only ordered trees Ordered/Unordered trees Child nodes are ordered/not ordered Concerning unordered trees, trees can be regarded as the same if they become the same shape by reordering children Edges are labeled/not labeled Node labels can be represented by edge labels on the edge toward parent nodes

Ordered Trees ordered tree / ordinal tree Rooted trees Children of each node are ordered No labels With edge labels → cardinal tree 2 6 8 1 7 3 5 4

Succinct Representations of Ordered Trees LOUDS (level order unary degree sequence) BP (balanced parentheses) DFUDS (depth first unary degree sequence)

LOUDS Representation [1,2] Degrees of nodes are encoded by unary codes in breadth-first order degree d → 1d0 2n+1 bits for n nodes (matches the lower bound) i-th node is represented by i-th 1 2 3 8 1 7 4 6 5 L 10110111011000000 1 2 3 4 5 6 7 8 LOUDS

Tree Navigational Operations (1) i-th node: select1(L, i) (i  1) firstchild(x) y := select0(rank1(L,x))+1 if L[y] = 0 then 1 else y lastchild(x) y := select0(rank1(L,x)+1)1 2 3 8 1 7 4 6 5 1 2 3 4 5 6 7 8 LOUDS L 10110111011000000

Tree Navigational Operations (2) sibling(x) if L[x+1] = 0 then 1 else x+1 parent(x) = select1(rank0(L,x)) degree(x) = lastchild(x)  firstchild(x) + 1 Merits: supported by only rank/select Demerits: cannot compute subtree sizes 2 3 8 1 7 4 6 5 1 2 3 4 5 6 7 8 LOUDS L 10110111011000000

BP Representation [3] ((()()())(()())) Each node is represented by a pair of matching open and close parentheses 2n bits for n nodes The size matches the lower bound 2 6 8 1 7 3 5 4 P ((()()())(()())) BP

Basic Operations on BP A node is represented by the position of ( findclose(P,i): returns the position of )matching with( at P[i] enclose(P,i): returns the position of ( which encloses ( at P[i] enclose findclose 1 1 2 3 4 5 6 7 8 9 10 11 2 3 11 8 P (()((()())())(()())()) 4 7 9 10 5 6

Tree Navigational Operations parent(v) = enclose(P,v) firstchild(v) = v + 1 sibling(v) = findclose(P,v) + 1 lastchild(v) = findopen(P, findclose(P,v)1) 1 enclose findclose 2 3 8 11 1 2 3 4 5 6 7 8 9 10 11 4 (()((()())())(()())()) 7 9 10 5 6

Number of Descendants (Subtree Size) The size of the subtree rooted at v is subtreesize(v) = (findclose(P,v)v+1)/2 degree (#children) can be computed by repeatedly applying findclose, but it takes time proportional to the number of children 1 2 3 11 1 2 3 4 5 6 7 8 9 10 11 8 P (()((()())())(()())()) 4 7 9 10 5 6

Data Structure for findclose [4] Divide the parentheses sequence into blocks of length B = ½ log n b(p): block number containing p (p): position of parenthesis matching p parenthesis p is said to be far ⇔ b(p)  b((p)) Far open parenthesis p is said to be opening pioneer ⇔ For the far open parenthesis q which is immediately precedes p, b((p))  b((q)) Represent positions of parentheses which match with opening pioneers are represented by 0,1 vector ( ( ) ) ) p (p) (q) q r ( (r)

Lemma: Let  denote the number of blocks Lemma: Let  denote the number of blocks. Then the number of opening pioneers is at most 23. Proof: A graph whose nodes correspond to the blocks and whose edges are (b(p), b((p)) is an outer-planar graph. Opening/closing pioneers form a BP again.  = n/B = 2n/log n ⇒ Length of BP is O(n/log n)

Representing Recursive Structure opening pioneers and their matching parentheses are represented by a 0,1 vector B B is a sparse vector of length 2n with O(n/log n) 1’s Can be represented in O(n log log n/log n) bits ( ( ) ) ) p (p) (q) q r ( (r) P B 0100 0101 0000 0000 0010 1001 P1 ((()))

Let S(n) denote the size of BP representation for an n node tree S(n) = 2n + O(n log log n/log n) + S(O(n/log n)) If the number of nodes becomes O(n/log2 n), a naïve data structure which stores all the answers uses only O(n/log n) bits Therefore S(n) = 2n + O(n log log n/log n)

Algorithm for findclose To compute (p) = findclose(P,p) If p is not far, (p) is computed by a table Find the pioneer p* that immediately precedes p Find (p*) using the BP for pioneers If p is not pioneer, b((p))  b((p*)) The position of (p) is determined from the difference between depths of p and p* p* p (p) (p*) ( ( ) )

References [1] G. Jacobson. Space-efficient Static Trees and Graphs. In Proc. IEEE FOCS, pages 549–554, 1989. [2] O'Neil Delpratt, Naila Rahman, Rajeev Raman: Engineering the LOUDS Succinct Tree Representation. WEA 2006: 134-145. [3] J. I. Munro and V. Raman. Succinct Representation of Balanced Parentheses and Static Trees. SIAM Journal on Computing, 31(3):762–776, 2001. [4] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. Theoretical Computer Science, 368:231–246, December 2006. [5] J. Ian Munro, Venkatesh Raman, and S. Srinivasa Rao. Space efficient suffix trees. Journal of Algorithms, 39:205–222, 2001. [6] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing Trees of Higher Degree. Algorithmica, 43(4):275–292, 2005.

[7] J. Jansson, K. Sadakane, and W. -K. Sung [7] J. Jansson, K. Sadakane, and W.-K. Sung. Ultra-succinct Representation of Ordered Trees. In Proc. ACM-SIAM SODA, pages 575–584, 2007. [8] A. Farzan and J. I. Munro. A Uniform Approach Towards Succinct Representation of Trees. In Proc. SWAT, LNCS 5124, pages 173–184, 2008. [9] A. Farzan, R. Raman, and S. S. Rao. Universal Succinct Representations of Trees? In Proc. ICALP, LNCS 5555, pages 451–462, 2009. [10] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Compressing and indexing labeled trees, with applications. Journal of the ACM, 57(1):4:1–4:33, 2009. [11] R. F. Geary, R. Raman, and V. Raman. Succinct ordinal trees with levelancestor queries. ACM Trans. Algorithms, 2:510–534, 2006. [12] H.-I. Lu and C.-C. Yeh. Balanced parentheses strike back. ACM Transactions on Algorithms (TALG), 4(3):No. 28, 2008.