Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo.

Slides:



Advertisements
Similar presentations
1 Succinct Representation of Labeled Graphs Jérémy Barbay, Luca Castelli Aleardi, Meng He, J. Ian Munro.
Advertisements

1 Computational Geometry Chapter Range queries How do you efficiently find points that are inside of a rectangle? –Orthogonal range query ([x 1,
I/O and Space-Efficient Path Traversal in Planar Graphs Craig Dillabaugh, Carleton University Meng He, University of Waterloo Anil Maheshwari, Carleton.
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Succinct Data Structures for Permutations, Functions and Suffix Arrays
Space-Efficient Algorithms for Document Retrieval Veli Mäkinen University of Helsinki Joint work with Niko Välimäki.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa.
Succincter Mihai P ă trașcu unemployed ??. Storing trits Store A[1..n] ∈ {1,2,3} n to retrieve any A[i] efficiently Plank, 2005.
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space Roberto GrossiGiuseppe Ottaviano * Università di Pisa * Part of the work.
A New Compressed Suffix Tree Supporting Fast Search and its Construction Algorithm Using Optimal Working Space Dong Kyue Kim 1 andHeejin Park 2 1 School.
Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT.
Compressed Compact Suffix Arrays Veli Mäkinen University of Helsinki Gonzalo Navarro University of Chile compact compress.
A Categorization Theorem on Suffix Arrays with Applications to Space Efficient Text Indexes Meng He, J. Ian Munro, and S. Srinivasa Rao University of Waterloo.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
The Trie Data Structure Basic definition: a recursive tree structure that uses the digital decomposition of strings to represent a set of strings for searching.
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
Wavelet Trees Ankur Gupta Butler University. Text Dictionary Problem The input is a text T drawn from an alphabet Σ. We want to support the following.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Rank-Sensitive Data Structures Iwona Bialynicka-Birula and Roberto Grossi (Università di Pisa) 12 th Symposium on String Processing and Information Retrieval.
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete.
Approximate Range Searching in the Absolute Error Model Guilherme D. da Fonseca CAPES BEX Advisor: David M. Mount.
1 A simple construction of two- dimensional suffix trees in linear time * Division of Electronics and Computer Engineering Hanyang University, Korea Dong.
Full-Text Indexing via Burrows-Wheeler Transform Wing-Kai Hon Oct 18, 2006.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
1 A Lempel-Ziv text index on secondary storage Diego Arroyuelo and Gonzalo Navarro Combinatorial Pattern Matching 2007.
Approximate Distance Oracles for Geometric Spanner Networks Joachim Gudmundsson TUE, Netherlands Christos Levcopoulos Lund U., Sweden Giri Narasimhan Florida.
Compressed Index for a Dynamic Collection of Texts H.W. Chan, W.K. Hon, T.W. Lam The University of Hong Kong.
Compact Representations of Separable Graphs From a paper of the same title submitted to SODA by: Dan Blandford and Guy Blelloch and Ian Kash.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
 Divide the encoded file into blocks of size b  Use an auxiliary bit vector to indicate the beginning of each block  Time – O(b)  Time vs. Memory.
Succinct Representations of Trees
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
Compressed suffix arrays and suffix trees with applications to text indexing and string matching.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Succinct Geometric Indexes Supporting Point Location Queries Prosenjit Bose, Eric Y. Chen, Meng He, Anil Maheshwari, Pat Morin.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
Random access to arrays of variable-length items
Szymon Grabowski, Marcin Raniszewski Institute of Applied Computer Science, Lodz University of Technology, Poland The Prague Stringology Conference, 1-3.
Compressed Suffix Arrays and Suffix Trees Roberto Grossi, Jeffery Scott Vitter.
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
Semi-dynamic compact index for short patterns and succinct van Emde Boas tree 1 Yoshiaki Matsuoka 1, Tomohiro I 2, Shunsuke Inenaga 1, Hideo Bannai 1,
Joint Advanced Student School Compressed Suffix Arrays Compression of Suffix Arrays to linear size Fabian Pache.
Index construction: Compression of postings Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 5.3 and a paper.
CMPS 3130/6130 Computational Geometry Spring 2015
Optimal Planar Orthogonal Skyline Counting Queries Gerth Stølting Brodal and Kasper Green Larsen Aarhus University 14th Scandinavian Workshop on Algorithm.
ETRI Linear-Time Search in Suffix Arrays July 14, 2003 Jeong Seop Sim, Dong Kyue Kim Heejin Park, Kunsoo Park.
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Navigation Piles with Applications to Sorting, Priority Queues, and Priority Deques Jyrki Katajainen and Fabio Vitale Department of Computing, University.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections Jouni Sirén 1, Niko Välimäki 1, Veli Mäkinen 1, and Gonzalo Navarro.
Index construction: Compression of postings
Succinct Data Structures
Succinct Data Structures
Succinct Data Structures
Reducing the Space Requirement of LZ-index
Discrete Methods in Mathematical Informatics
Index construction: Compression of postings
Succinct Representation of Labeled Graphs
Rank and Select data structures
Presentation transcript:

Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo Anil Maheshwari and Pat Morin, Carleton University

2D Orthogonal Range Search  A fundamental geometric query problem  Data sets: A set, N, of n points in the plane  Query: Given an orthogonal query rectangle R, return information about the points in N∩R Orthogonal range counting queries Orthogonal range reporting queries  k: size of the output

Example Range counting query:Range reporting query5

Classic Solutions Data Structures Space (words) Time (counting) Time (reporting) R-treesO(n) kd-treesO(n)O(n 1/2 + k) Chazelle 1988O(n)O(lg n)O(lg n + k lg ε n) Range treesO(n lg n)O(lg n + k) Chazelle 1988O(n lg ε n)O(lg n + k)

Range Search on an n×n Grid  A special case: points coordinates are from [1..n]×[1..n] (rank space)  The general problem can be reduced to this special case using a standard approach Alstrup et al  Orthogonal range search structures in the rank space and succinct data structures

Background: Succinct Data Structures  What are succinct data structures (Jacobson 1989) Representing data structures using ideally information-theoretic minimum space Supporting efficient navigational operations  Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric

Succinct Orthogonal Range Search Structures in rank space  Wavelet Trees (Grossi et al. 2003) Space: n lg n + o (n lg n) bits Query time for orthogonal range search (Makinen and Navarro 2006):  Restriction: no points have the same x or y coordinates  Counting: O(lg n)  Reporting: O(k lg n)  Applications Space-efficient text indexes: Makinen and Navarro 2006, Chien et al. 2008

Support counting: an Overview  Reduce orthogonal range counting to Dominance counting  Design a succinct data structure supporting dominance counting on a narrow grid, i.e. an n×t grid where t = O(lg ε n) (0<ε<1). We also assume that each point has a distinct x-coordinate  Recursively divide the n×n grid into narrow grids and use the above structure at each level  Remove the restriction that each point has a distinct x-coordinate

Range counting on a Narrow Grid S = … Divide the grid into blocks of size lg 2 n × t A 2D array A: A[i,j] stores the result of dominance counting when (i lg 2 n+1, j) is given as the query point Divide each block into subblocks of size lg λ n × t (0< λ < ε) A 2D array B: B[i,j] stores, when (i lg λ n+1, j) is given as a query point, the result of dominance counting inside the block containing this point A table C that stores for each possible set of lg λ n points on a lg λ n × t grid and each query point in the grid, the result of dominance counting Space: n lg t + o(n) bits Time: O(1)

Range Counting on an n×n Grid Transform the original grid into a narrow grid by grouping y-coordinates into ranges of size n/t Construct orthogonal range search structures for this narrow grid and recurse Number of levels: log t n Space: n lg n + o(n lg n) bits Time: O(log t n)

More results  The restriction that each point has a distinct x- coordinate can be removed using 2n+o(n) extra bits  The support for range reporting is based on similar ideas but is more complicated  Our main result Space: n lg n + o (n lg n) bits Query time for orthogonal range  Counting: O(lg n / lg lg n)  Reporting: O(k lg n / lg lg n)

Applications: Substring Search  Notation: T-text, n-text size, σ-alphabet size P-pattern, m-pattern length occ-number of occurrences  Query: report the occurrences of P in T  Chien et al. 2008: O(n lg σ) bits, O(m + lg n × (log σ n + occ lg n)) time  Our results: O(n lg σ) bits, O(m + lg n × (log σ n + occ lg n) / lglg n) time

Applications: Position-Restricted Substring Search  Query: Given a pattern P and a range [i, j], how many times does P occur in T[i, j]?  Makinen and Navarro 2006 Space: 3n lg n + o(n lg n) bits Time: O(m + occ lg n)  Our results: Space: 3n lg n + o(n lg n) bits Time: O(m + occ lg n / lglg n)

Applications: Representing Small Integers  Data: A sequence S of n numbers in [1..s], where s = polylog (n)  Ferragina et al Space: nH 0 (S) + o(n) bits Operations: rank/select in O(1) time  Our result: New operation: Given a range of position [p 1..p 2 ] and a range of values [v 1..v 2 ], retrieve the entries in S[p 1..p 2 ] whose values are in [v 1..v 2 ] Time: O(1) for counting, O(1) per entry for reporting

Applications: A Restricted Versions of Range Search  Restriction: the query rectangle is defined by two points in the given point set  Notation: c: the number of bits required to encode the coordinates of a point  Space: cn + n lg n + o(n lg n) bits  Time: Counting: O (lg n / lglg n) Reporting: O(k lg n / lglg n)

Conclusions  We designed a succinct data structure for orthogonal range search on an n×n grid that provides more efficient support for both counting and reporting queries  This structure can be used to improve and extend previous results on succinct data structures, such as succinct text indexes and sequence representation.

Thank you!