I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
External Memory Geometric Data Structures
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
2-dimensional indexing structure
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete.
Spatial Indexing for NN retrieval
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Aarhus University March 16, 2006.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
Accessing Spatial Data
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
Spatial Indexing SAMs.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Spatial Queries Nearest Neighbor Queries.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 1) On Indexing Mobile Objects Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Bin Yao, Feifei Li, Piyush Kumar Presenter: Lian Liu.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Problem Definition I/O-efficient Rectangular Segment Search Gautam K. Das and Bradford G. Nickerson Faculty of Computer science, University of New Brunswick,
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Advanced Topics in Data Management
R-tree: Indexing Structure for Data in Multi-dimensional Space
Spatial Indexing I R-trees
Presentation transcript:

I/O-Algorithms Lars Arge Spring 2009 April 28, 2009

I/O-Model D M P Parameters N = # elements in problem instance I/O-algorithms I/O-Model D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem We often assume that M>B2 I/O: Movement of block between memory and disk Block I/O M P Lars Arge

Until now: Data Structures I/O-algorithms Until now: Data Structures B-trees Trees with fanout B, balanced using split/fuse query, space, update Weight-balanced B-trees Weight balancing constraint rather than degree constraint Ω(w(v)) updates below v between consecutive operations on v Persistent B-trees Update current version (getting new version) Query all versions Buffer-trees amortized bounds using buffers and lazyness Lars Arge

Until now: Data Structures I/O-algorithms Until now: Data Structures Special cases of two-dimensional range search: Diagonal corner queries: External interval tree Three-sided queries: External priority search tree query, space, update Same bounds cannot be obtained for general planar range searching q q3 q2 q1 Lars Arge

Until now: Data Structures I/O-algorithms Until now: Data Structures General planer range searching: External range tree: query, space, update O-tree: query, space, update q3 q2 q1 q4 Lars Arge

Techniques Tools: B-trees Persistent B-trees Buffer trees I/O-algorithms Techniques Tools: B-trees Persistent B-trees Buffer trees Logarithmic method Weight-balanced B-trees Global rebuilding Techniques: Bootstrapping Filtering (x,x) q3 q2 q1 q3 q2 q1 q4 Lars Arge

Other results Many other results for e.g. I/O-algorithms Other results Many other results for e.g. Higher dimensional range searching Range counting, range/stabbing max, and stabbing queries Halfspace (and other special cases) of range searching Queries on moving objects Proximity queries (closest pair, nearest neighbor, point location) Structures for objects other than points (bounding rectangles) Many heuristic structures in database community (EXTEND?) Lars Arge

Point Enclosure Queries I/O-algorithms Point Enclosure Queries Dual of planar range searching problem Report all rectangles containing query point (x,y) Internal memory: Can be solved in O(N) space and O(log N + T) time x y Lars Arge

Point Enclosure Queries I/O-algorithms Point Enclosure Queries Similarity between internal and external results (space, query) in general tradeoff between space and query I/O Internal External 1d range search (N, log N + T) (N/B, logB N + T/B) 3-sided 2d range search 2d range search 2d point enclosure (N/B, log N+T/B) (N/B1-ε, logB N+T/B) (N/B, log N + T/B)? B 2 Lars Arge

Rectangle Range Searching I/O-algorithms Rectangle Range Searching Report all rectangles intersecting query rectangle Q Often used in practice when handling complex geometric objects Store minimal bounding rectangles (MBR) Q Lars Arge

Rectangle Data Structures: R-Tree I/O-algorithms Rectangle Data Structures: R-Tree Most common practically used rectangle range searching structure Similar to B-tree Rectangles in leaves (on same level) Internal nodes contain MBR of rectangles below each child Note: Arbitrary order in leaves/grouping order Lars Arge

Example I/O-algorithms Lars Arge Note: Basic R-tree slides (up to bulk loading) modified versions of slides by Marc Van Kreveld Lars Arge

I/O-algorithms Example Lars Arge

I/O-algorithms Example Lars Arge

I/O-algorithms Example Lars Arge

I/O-algorithms Example Lars Arge

Recursively visit relevant nodes I/O-algorithms (Point) Query: Recursively visit relevant nodes Lars Arge

Query Efficiency The fewer rectangles intersected the better I/O-algorithms Query Efficiency The fewer rectangles intersected the better Lars Arge

Rectangle Order Intuitively I/O-algorithms Rectangle Order Intuitively Objects close together in same leaves  small rectangles  queries descend in few subtrees Grouping in internal nodes? Small area of MBRs Small perimeter of MBRs Little overlap among MBRs Lars Arge

R-tree Insertion Algorithm I/O-algorithms R-tree Insertion Algorithm When not yet at a leaf (choose subtree): Determine rectangle whose area increment after insertion is smallest (small area heuristic) Increase this rectangle if necessary and recurse At a leaf: Insert if room, otherwise Split Node (while trying to minimize area) Lars Arge

I/O-algorithms Node Split New MBRs Lars Arge

Linear Split Heuristic I/O-algorithms Linear Split Heuristic Determine R1 and R2 with largest MBR: the seeds for sets S1 and S2 While not all MBRs distributed Add next MBR to the set whose MBR increases the least Lars Arge

Quadratic Split Heuristic I/O-algorithms Quadratic Split Heuristic Determine R1 and R2 with largest area(MBR)-area(R1) - area(R2): the seeds for sets S1 and S2 While not all MBRs distributed Determine for every not yet distributed rectangle Rj : d1 = area increment of S1  Rj d2 = area increment of S2  Rj Choose Ri with maximal |d1-d2| and add to the set with smallest area increment Lars Arge

R-tree Deletion Algorithm I/O-algorithms R-tree Deletion Algorithm Find the leaf (node) and delete object; determine new (possibly smaller) MBR If the node is too empty: Delete the node recursively at its parent Insert all entries of the deleted node into the R-tree Note: Insertions of entries/subtrees always occurs at the level where it came from Would be natural to merge underful nodes Lars Arge

I/O-algorithms Lars Arge

I/O-algorithms Lars Arge

I/O-algorithms Lars Arge

I/O-algorithms Lars Arge

Insert as rectangle on middle level I/O-algorithms Insert as rectangle on middle level Lars Arge

I/O-algorithms Insert in a leaf object Lars Arge

R*-trees Why try to minimize area? Why not overlap, perimeter,… I/O-algorithms R*-trees Why try to minimize area? Why not overlap, perimeter,… R*-tree: Experimentally determined algorithms for Choose Subtree and Split Node Lars Arge

R*-trees; Choose Subtree I/O-algorithms R*-trees; Choose Subtree At nodes directly above leaves: Choose entry (rectangle) with smallest overlap-increase At higher nodes: smallest area-increase Lars Arge

I/O-algorithms R*-trees; Split Node Determine split axis: For both the x- and the y-axis: Sort the rectangles by smallest and largest coordinate Determine the M-2m+2 allowed distributions into two groups Determine for each the perimeter of the two MBRs Add up all perimeters Choose the axis with smallest sum of perimeters Determine split index (given the split axis): Choose the allowed distribution among the M - 2m + 2 with the smallest area of intersection of the MBRs What is M and m? Lars Arge

R*-trees; Forced reinsert I/O-algorithms R*-trees; Forced reinsert Intuition: When building R-tree by repeated insertion first inserted rectangles are possibly badly placed Experiment: Make R-tree by inserting 20.000 rectangles Delete the first inserted 10.000 and insert them again Search time improvement of 20-50% Lars Arge

I/O-algorithms R-Tree Variants Many, many R-tree variants (heuristics) have been proposed Often bulk-loaded R-trees are used (forced reinsertion intuition) Much faster than repeated insertions Better space utilization Can optimize more “globally” Can be updated using previous update algorithms Recently first worst-case efficient structure: PR-tree Lars Arge

I/O-algorithms Pseudo-PR-Tree Place B extreme rectangles from each direction in priority leaves Split remaining rectangles by xmin coordinates (round-robin using xmin, ymin, xmax, ymax– like a 4d kd-tree) Recursively build sub-trees Query in I/Os – O(T/B) nodes with priority leaf completely reported – nodes with no priority leaf completely reported Lars Arge

Pseudo-PR-Tree: Query Complexity I/O-algorithms Pseudo-PR-Tree: Query Complexity Nodes v visited where all rectangles in at least one of the priority leaves of v’s parent are reported: O(T/B) Let v be a node visited but none of the priority leaves at its parent are reported completely, consider v’s parent u 2d 4d Q ymin = ymax(Q) THIS AND NEXT SLIDE NEED TO BE IMPROVED! xmax = xmin(Q) Lars Arge

Pseudo-PR-Tree: Query Complexity I/O-algorithms Pseudo-PR-Tree: Query Complexity The cell in the 4d kd-tree region of u is intersected by two different 3-dimensional hyper-planes defined by sides of query Q The intersection of each pair of such 3-dimensional hyper-planes is a 2-dimensional hyper-plane Lemma: # of cells in a d-dimensional kd-tree that intersect an axis-parallel f-dimensional hyper-plane is O((N/B)f/d) So, # such cells in a 4d kd-tree: Total # nodes visited: u Lars Arge

PR-tree from Pseudo-PR-Tree I/O-algorithms PR-tree from Pseudo-PR-Tree Lars Arge

Query Complexity Remains Unchanged I/O-algorithms Query Complexity Remains Unchanged Next level: # nodes visited on leaf level Lars Arge

PR-Tree PR-tree construction in I/Os Pseudo-PR-tree in I/Os I/O-algorithms PR-Tree PR-tree construction in I/Os Pseudo-PR-tree in I/Os Cost dominated by leaf level Updates O(logB N) I/Os using known heuristics Loss of worst-case query guarantee I/Os using logarithmic method Worst-case query efficiency maintained Extending to d-dimensions Optimal O((N/B)1-1/d + T/B) query Lars Arge

I/O-algorithms Geometric Algorithms We will now (quickly) look at geometric algorithms Solves problem on set of objects Example: Orthogonal line segment intersection Given set of axis-parallel line segments, report all intersections In internal memory many problems is solved using sweeping Lars Arge

I/O-algorithms Plane Sweeping Sweep plane top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates) Top endpoint of vertical segment: Insert in T Bottom endpoint of vertical segment: Delete from T Horizontal segment: Perform range query with x-interval on T Lars Arge

I/O-algorithms Plane Sweeping In internal memory algorithm runs in optimal O(Nlog N+T) time In external memory algorithm performs badly (>N I/Os) if |T|>M Even if we implements T as B-tree  O(NlogB N+T/B) I/Os Solution: Distribution sweeping Lars Arge

Distribution Sweeping I/O-algorithms Distribution Sweeping Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each Sweep plane top-down while reporting intersections between part of horizontal segment spanning slab(s) and vertical segments Distribute data to M/B-1 slabs vertical segments and non-spanning parts of horizontal segments Recurse in each slab Lars Arge

Distribution Sweeping I/O-algorithms Distribution Sweeping Sweep performed in O(N/B+T’/B) I/Os  I/Os Maintain active list of vertical segments for each slab (<B in memory) Top endpoint of vertical segment: Insert in active list Horizontal segment: Scan through all relevant active lists Removing “expired” vertical segments Reporting intersections with “non-expired” vertical segments Lars Arge

Distribution Sweeping I/O-algorithms Distribution Sweeping Other example: Rectangle intersection Given set of axis-parallel rectangles, report all intersections. Lars Arge

Distribution Sweeping I/O-algorithms Distribution Sweeping Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each Sweep plane top-down while reporting intersections between part of rectangles spanning slab(s) and other rectangles Distribute data to M/B-1 slabs Non-spanning parts of rectangles Recurse in each slab Lars Arge

Distribution Sweeping I/O-algorithms Distribution Sweeping Seems hard to perform sweep in O(N/B+T’/B) I/Os Solution: Multislabs Reduce fanout of distribution to Recursion height still Room for block from each multislab (activlist) in memory Lars Arge

Distribution Sweeping I/O-algorithms Distribution Sweeping Sweep while maintaining rectangle active list for each multisslab Top side of spanning rectangle: Insert in active multislab list Each rectangle: Scan through all relevant multislab lists Removing “expired” rectangles Reporting intersections with “non-expired” rectangles  I/Os Lars Arge

Distribution Sweeping I/O-algorithms Distribution Sweeping Distribution sweeping can relatively easily be used to solve a number of other problems in the plane I/O-efficiently By decreasing distribution fanout to for c≥1 a number of higher-dimensional problems can also be solved I/O-efficiently Lars Arge

Other Results Other geometric algorithms results include: I/O-algorithms Other Results Other geometric algorithms results include: Red blue line segment intersection (using distribution sweep, buffer trees/batched filtering, external fractional cascading) General planar line segment intersection (as above and external priority queue) 2d and 3d Convex hull: (Complicated deterministic 3d and simpler randomized) 2d Delaunay triangulation Lars Arge

References External Memory Geometric Data Structures I/O-algorithms References External Memory Geometric Data Structures Lecture notes by Lars Arge. Section 8-9 The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. L. Arge, M. de Berg, H. Haverkort, and K. Yi. Proc. SIGMOD'04. Section 1-2 External-Memory Computational Geometry. M.T. Goodrich, J-J. Tsay, D.E. Vengroff, and J.S. Vitter. Proc. FOCS'93. Section 2.0-2.1 Lars Arge