Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT.

Slides:



Advertisements
Similar presentations
1 Succinct Representation of Labeled Graphs Jérémy Barbay, Luca Castelli Aleardi, Meng He, J. Ian Munro.
Advertisements

I/O and Space-Efficient Path Traversal in Planar Graphs Craig Dillabaugh, Carleton University Meng He, University of Waterloo Anil Maheshwari, Carleton.
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Succinct Data Structures for Permutations, Functions and Suffix Arrays
Tight Bounds for Dynamic Convex Hull Queries (Again) Erik DemaineMihai Pătraşcu.
Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
Engineering a Set Intersection Algorithm for Information Retrieval Alex Lopez-Ortiz UNB / InterNAP Joint work with Ian Munro and Erik Demaine.
Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa.
Fast Compressed Tries through Path Decompositions Roberto Grossi Giuseppe Ottaviano* Università di Pisa * Part of the work done while at Microsoft Research.
Succincter Mihai P ă trașcu unemployed ??. Storing trits Store A[1..n] ∈ {1,2,3} n to retrieve any A[i] efficiently Plank, 2005.
A New Compressed Suffix Tree Supporting Fast Search and its Construction Algorithm Using Optimal Working Space Dong Kyue Kim 1 andHeejin Park 2 1 School.
Compressed Compact Suffix Arrays Veli Mäkinen University of Helsinki Gonzalo Navarro University of Chile compact compress.
A Categorization Theorem on Suffix Arrays with Applications to Space Efficient Text Indexes Meng He, J. Ian Munro, and S. Srinivasa Rao University of Waterloo.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Tries Standard Tries Compressed Tries Suffix Tries.
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
Modern Information Retrieval Chapter 8 Indexing and Searching.
Wavelet Trees Ankur Gupta Butler University. Text Dictionary Problem The input is a text T drawn from an alphabet Σ. We want to support the following.
Full-Text Indexing via Burrows-Wheeler Transform Wing-Kai Hon Oct 18, 2006.
Compressed Suffix Arrays based on Run-Length Encoding Veli Mäkinen Bielefeld University Gonzalo Navarro University of Chile BWTRLFID.
Obtaining Provably Good Performance from Suffix Trees in Secondary Storage Pang Ko & Srinivas Aluru Department of Electrical and Computer Engineering Iowa.
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
Compact Representations of Separable Graphs From a paper of the same title submitted to SODA by: Dan Blandford and Guy Blelloch and Ian Kash.
Chapter 18 - basic definitions - binary trees - tree traversals Intro. to Trees 1CSCI 3333 Data Structures.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
 Divide the encoded file into blocks of size b  Use an auxiliary bit vector to indicate the beginning of each block  Time – O(b)  Time vs. Memory.
Succinct Representations of Trees
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
1 Trees 2 Binary trees Section Binary Trees Definition: A binary tree is a rooted tree in which no vertex has more than two children –Left and.
Summer School '131 Succinct Data Structures Ian Munro.
Succinct Geometric Indexes Supporting Point Location Queries Prosenjit Bose, Eric Y. Chen, Meng He, Anil Maheshwari, Pat Morin.
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Succinct Dynamic Cardinal Trees with Constant Time Operations for Small Alphabet Pooya Davoodi Aarhus University May 24, 2011 S. Srinivasa Rao Seoul National.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Compressed Prefix Sums O’Neil Delpratt Naila Rahman Rajeev Raman.
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
Trees 2: Section 4.2 and 4.3 Binary trees. Binary Trees Definition: A binary tree is a rooted tree in which no vertex has more than two children
Joint Advanced Student School Compressed Suffix Arrays Compression of Suffix Arrays to linear size Fabian Pache.
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections Jouni Sirén 1, Niko Välimäki 1, Veli Mäkinen 1, and Gonzalo Navarro.
Layout by orngjce223, CC-BY Compact BVH StorageFabianowski ∙ Dingliana Compact BVH Storage for Ray Tracing and Photon Mapping Bartosz Fabianowski ∙ John.
Succinct Data Structures
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Succinct Data Structures
Tries 07/28/16 11:04 Text Compression
Succinct Data Structures
Tries 5/27/2018 3:08 AM Tries Tries.
A Worst Case, Constant Time Priority Queue
Succinct Data Structures
Succinct Data Structures
Chapter 5 : Trees.
Discrete Methods in Mathematical Informatics
Succinct Data Structures: Upper, Lower & Middle Bounds
Reducing the Space Requirement of LZ-index
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Discrete Methods in Mathematical Informatics
Succinct Representation of Labeled Graphs
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Tries 2/27/2019 5:37 PM Tries Tries.
Succinct Data Structures
Presentation transcript:

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen

Background: Succinct Data Structures What are succinct data structures Jacobson 1989 Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric An implementation: Delpratt et al Succinct integrated encodings Main data and auxiliary data structures

Our Problem: Succinct Indexes Use of the concept in previous work Compact PAT trees: Clark & Munro 1996 Lower bounds: Demaine & López-Ortiz 2001; Miltersen 2005 Upper bounds: Sadakane & Grossi 2006 Definition of succinct indexes in data structure design ADT: primitive access operators Succinct index: more powerful operators

Succinct Integrated Encodings + Navigational Operations Auxiliary Data Structures X Main Data

Succinct Indexes + Navigational Operations Succinct IndexMain Data

Succinct Indexes vs. Integrated Encodings Maximizing the freedom of the encoding of the main data Allowing incremental design Supporting implicit data

Strings: Definitions Notation Alphabet: [σ]={1, 2, …, σ} String: S[1..n] Operations: string_access(x): S[x] string_rank( α, x): number of occurrences of α in S[1..x] string_select( α, r): position of the r th occurrence of α in S

Strings: An Example S = a a b a c c c d a d d a b b b c string_access(8) =d string_rank(a, 8) =3 string_select(b, 3) =14

Strings: Previous Results Succinct Integrated Encodings Wavelet trees: Grossi et al Space: nH 0 + o(n)∙lg σ bits Time: O(lg σ) time for all three operations Golynski et al Space: n (lg σ + o(lg σ)) bits Time: O(lglg σ) time for string_access and string_rank, O(1) time for string_select

Strings: Our Results Succinct Indexes ADT string_access: f(n, σ) time Space: n∙o(lg σ) bits Operations string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg σ)) string_select: O(lglglg σ (f(n, σ)+lglg σ)) Other operations: negations

Binary Relations: Definitions Notation Binary relation: R ⊆ [n] x [σ] Number of objects: n; number of labels: σ Number of object-label pairs: t Operations object_access(x, r): r th label associated with x label_access(x, α ): whether x is associated with α label_rank( α, x): number of objects labeled α up to object x label_select( α, r): r th object labeled α

Binary Relations: An Example σ n object_access(1, 2) = label_access(2, 3) = label_rank(3, 4) = label_select(4, 3) = 4 false

Binary Relations: Previous Results Succinct Integrated Encodings Barbay et al., 2006 Space: t (lg σ + o(lg σ)) bits Time: O(lglg σ) time for object_access, label_rank and label_access, O(1) time for label_select

Binary Relations: Our Results Succinct Indexes ADT: object_access: f(n,σ,t) Space: t∙o(lg σ) bits Time: label_rank and label_access: O(lglg σ lglglg σ (f(n,σ,t) + lglg σ)) label_select: O(lglglg σ (f(n,σ,t) + lglg σ))

Multi-labeled Trees: Definitions Notation Number of nodes: n Number of labels: σ Number of node-label pairs: t Operations α -descendant α -child α -ancestor

Multi-labeled Trees: An Example {a, c, d} {c, d} {a} {a, c} {a, b}{b,d} {a, b}{b} {c}{c,d}{b,c,d} Node 2 is a c-ancestor of node 6 Node 6 is a b-descendant of node 2 Node 10 is a d-child of node 8

Multi-labeled Trees: Previous Results Labeled trees Geary et al Ferragina et al Barbay et al Multi-labeled trees Barbay et al. 2006

3 Multi-labeled Trees: Our Approach Traversal Orders Preorder DFUDS order Ordinal Trees: DFUDS Benoit et al & 2005 Jansson et al Binary Relations Nodes in preorder & labels Nodes in DFUDS order & labels

Multi-labeled Trees: Our Results Succinct Indexes ADT: node_label(x, r) Supporting α -child/descendant queries: t∙o(lg σ) bits Supporting α -child/descendant/ancestor queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits (ρ: recursivity) Supporting α -child/descendant/ancestor queries of node x after another node y

Applications Compressed Succinct Encodings Strings Space: nH k + o(nlg σ) bits Operations: string_access: O(1) String_rank: O((lglg σ) 2 lglglg σ) string_select: O(lglg σ lglglg σ) First high-order entropy-compressed encoding supporting rank/select efficiently Other Data Structures

Applications (Continued) High-order entropy-compressed text indexes for large alphabets Notations: n-text size, σ-alphabet size, m- pattern length, occ-number of occurrences Our results Space: n H k +o(n lg σ) bits Pattern searching: O(m lglg σ+occ lg 1+ε n lglg σ) Previous results: a lg σ factor instead of lglg σ or incompressible

Conclusions We showed the importance of succinct indexes in the design of succinct data structures by designing: Succinct representation of multi-labeled trees that supports efficient retrieval of ancestors / children / descendants by label First high-order entropy compressed representation of strings supporting rank/select High-order entropy compressed text indexes for large alphabets

Conclusions (Continued) The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs.

Thank you!