I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.

Slides:



Advertisements
Similar presentations
Interval Trees Store intervals of the form [li,ri], li <= ri.
Advertisements

Temporal Databases S. Srinivasa Rao April 12, 2007
Orthogonal range searching. The problem (1-D) Given a set of points S on the line, preprocess them to build structure that allows efficient queries of.
An Optimal Dynamic Interval Stabbing-Max Data Structure? Pankaj K. Agarwal, Lars Arge and Ke Yi Department of Computer Science Duke University.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
External Memory Geometric Data Structures
Chapter 7 Data Structure Transformations Basheer Qolomany.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Micha Streppel TU Eindhoven  NCIM-Groep, the Netherlands and Ke Yi AT&T Labs, USA  HKUST, Hong Kong.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Aarhus University March 16, 2006.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
Approximate Range Searching in the Absolute Error Model Guilherme D. da Fonseca CAPES BEX Advisor: David M. Mount.
1 Balanced Search Trees  several varieties  AVL trees  trees  Red-Black trees  B-Trees (used for searching secondary memory)  nodes are added.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
Amortized Rigidness in Dynamic Cartesian Trees Iwona Białynicka-Birula and Roberto Grossi Università di Pisa STACS 2006.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 1) On Indexing Mobile Objects Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
An experimental study of priority queues By Claus Jensen University of Copenhagen.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
B-Tree – Delete Delete 3. Delete 8. Delete
Segment Trees Basic data structure in computational geometry. Computational geometry.  Computations with geometric objects.  Points in 1-, 2-, 3-, d-space.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
CMPS 3130/6130 Computational Geometry Spring 2015
Problem Definition I/O-efficient Rectangular Segment Search Gautam K. Das and Bradford G. Nickerson Faculty of Computer science, University of New Brunswick,
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
Navigation Piles with Applications to Sorting, Priority Queues, and Priority Deques Jyrki Katajainen and Fabio Vitale Department of Computing, University.
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
CMPS 3130/6130 Computational Geometry Spring 2017
CMPS 3130/6130 Computational Geometry Spring 2017
Advanced Topics in Data Management
R-tree: Indexing Structure for Data in Multi-dimensional Space
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Reporting (1-D) Given a set of points S on the line, preprocess them to build structure that allows efficient queries of the from: Given an interval I=[x1,x2]
8th Workshop on Massive Data Algorithms, August 23, 2016
Presentation transcript:

I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang

2 Problem Definition: Range Max Queries Range-aggregate queries: range-count, range-sum, range-max N points in R d Each point p is associated with a weight w(p) Query rectangle Q Compute max{w(p) | p  Q} Static and dynamic

3 Problem Definition: Stabbing Max Queries N hyper-rectangles in R d Each rectangle γ is associated with a weight w(γ) Query point q Compute max{w(γ) | q  γ}

4 Model I/O Model –N: Elements in structure –B: Elements per block –M: Elements in main memory –n = N/B Assumptions – M>B 2 – Each word holds log 2 N bits –Any coordinate or weight can be stored in one word D P M Block I/O

5 Related Work & Our Results: Range Queries 1D range queries are easy: B-tree *O(n) space, O(log B n) query & update 2D range queries: –Poly-logarithmic query: CRB-tree [AAG03] *O(nlog B n) space, O(log 2 B n) query –Linear space: kdB-tree, cross-tree, O-tree * query, O(log B n) update Our results:

6 Related Work & Our Results: Stabbing Queries 1D stabbing queries –SB-tree [YW01] *O(n) space, O(log B n) query & insert *Does not allow deletions! 2D stabbing queries –No structures with worst-case guarantee Our results:

7 2D Range Max Queries The external version of Chazelle’s structure [C88] –Linear space, –Static: O(log 1+ε N) query –Dynamic: O(log 3 N log log N) query & update Overall structure –A normal B-tree Φ on y-coordinates of all the points –A Fan-out base B-tree T on x-coordinates *P v : all points stored in the subtree of v *Each internal node v stores two secondary structures C v, M v storing information about P v in a compressed manner *C v and M v of size O(|P v | / log B n) → linear size in total *Weights of points stored at leaves explicitly

8 2D Range Max Queries C v borrowed from CRB-tree –Compute the ranks of the points one level down in O(1) I/Os –Identify the weight of a point explicitly in O(log B n) I/Os M v computes the maximum weight in a multislab in O(log B n) I/Os Answering a query: –Use Φ to compute the ranks in the root of T –Use M v to compute maximum at each level –For a total of O(log 2 B n) I/Os v v1v1 v2v2 v3v3 v4v4 v5v5 v6v6

9 2D Range Max Queries: M v Divide P v into chunks of Blog B N Divide each chunk into minichunks of size B Three-level structures –M v =(Ψ 1, Ψ 2, Ψ 3 ) –each of size O(|P v | / log B n) v

10 2D Range Max Queries: M v Basic idea: encode the range max information in a compressed manner, identify the maximum point using C v once its rank is found Ψ 3 [l]: for each minichunk, stores a (slab index, weight rank) pair for each point inside the minichunk –Find the rank of the maximum-weight point in O(1) I/Os; –Identify it in O(log B N) I/Os. Ψ 2 [k]: for each chunk, encode a Cartesian tree on the O(log B N) minichunks for each of the O(B) multislabs –Find the minichunk containing the maximum-weight point in O(1) I/Os; –Use Ψ 3 to find the exact point in O(log B N) I/Os; Ψ 1 : A fanout B-tree on the O(|P v | / (Blog B n)) chunks –Find the maximum-weight point in O(log B N) I/Os.

11 2D Range Max Queries Static structures –O(n) size, O(log 2 B N) query, O(nlog B N) construction –O(n) size, O(log B 1+ε N) query, O(Nlog B N) construction Dynamization: –Throw away Ψ 2 and expandΨ 3 –O(nlog B log B N) size –O(log 3 B N) query, worst case –O(log 2 B N log M/B log B N) insert, amortized –O(log 2 B N) delete, amortized Extending to d-dimension –Standard technique –Pay an extra O(log d-2 B N) factor to all these bounds

12 1D Stabbing Max Queries Modify the external interval tree [AV96] to support max Fan-out base B-tree on x-coordinates –Interval stored in highest node v where it contains slab boundary –In one left (right) slab structure and the multislab structure Answering a query –Search down tree and visit O(log B N) nodes –Compute the maximum weight in left (right) slab structure and the multislab structure v

13 1D Stabbing Max Queries Slab structures are implemented using B-trees –Query and update: O(log B N) I/Os Multislab structure: Fan-out B-tree –At each internal node, we store the maximum weight for each of the slabs and for each of the children –Query: O(1) I/Os (only look at the root) –Update: O(log B N) I/Os Rebalancing the base tree: O(log B N) I/Os –Weight-balanced B-trees Overall cost: size O(n), query O(log 2 B N), update O(log B N).

14 1D Stabbing Max Queries Space-time tradeoff: –O(nlog B ε N) size –O(nlog B 2-ε N) query Can handle the general semigroup queries –A semigroup (S, +) –Each weight w(γ)  S –Want to compute ∑ q  γ w(γ) Ideas can also be used to improve the internal memory algorithm –Linear size, O(log 2 N / log log N) query and update

15 2D Stabbing Max Queries Extend our 1D stabbing query structure Use our 2D range query structure as a building block Extending to d-dimension –Standard technique –Pay an extra O(log d-2 B N) factor to all these bounds

16 Conclusions and Open Problems In this project, we developed I/O-efficient –linear space structures with poly-logarithmic query cost for the static 2D range max queries –near linear space structures with poly-logarithmic query & update cost for the dynamic 2D range max queries –linear space structures with poly-logarithmic query cost for the dynamic 1D stabbing max queries –near linear space structures with poly-logarithmic query & update cost for the dynamic 2D stabbing max queries Open problems –Linear size dynamic structures for the 2D range & stabbing max queries? –General semigroup queries?

THE END Thank you!