I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.

Slides:



Advertisements
Similar presentations
Temporal Databases S. Srinivasa Rao April 12, 2007
Advertisements

An Optimal Dynamic Interval Stabbing-Max Data Structure? Pankaj K. Agarwal, Lars Arge and Ke Yi Department of Computer Science Duke University.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
External Memory Geometric Data Structures
2-dimensional indexing structure
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Aarhus University March 16, 2006.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
Persistent Data Structures Computational Geometry, WS 2007/08 Lecture 12 Prof. Dr. Thomas Ottmann Khaireel A. Mohamed Algorithmen & Datenstrukturen, Institut.
I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part II Prof. Dr. Th. Ottmann Summer Semester 2006.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
14/13/15 CMPS 3130/6130 Computational Geometry Spring 2015 Windowing Carola Wenk CMPS 3130/6130 Computational Geometry.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
CMPS 3130/6130 Computational Geometry Spring 2015
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
UNC Chapel Hill M. C. Lin Geometric Data Structures Reading: Chapter 10 of the Textbook Driving Applications –Windowing Queries Related Application –Query.
X1x1 x2x2 top-k y 3-sided x1x1 x2x2 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates Gerth Stølting Brodal Aarhus.
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Computational Geometry
CMPS 3130/6130 Computational Geometry Spring 2017
B+ Tree.
Segment tree and Interval Tree
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
8th Workshop on Massive Data Algorithms, August 23, 2016
Presentation transcript:

I/O-Algorithms Lars Arge Spring 2009 March 3, 2009

I/O-Model D M P Parameters N = # elements in problem instance I/O-algorithms I/O-Model D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem We often assume that M>B2 I/O: Movement of block between memory and disk Block I/O M P Lars Arge

Fundamental Bounds Internal External Scanning: N Sorting: N log N I/O-algorithms Fundamental Bounds Internal External Scanning: N Sorting: N log N Permuting Searching: Lars Arge

Fundamental Data Structures I/O-algorithms Fundamental Data Structures B-trees: Node degree (B)  queries in Rebalancing using split/fuse  updates in Weight-balanced B-tress: Weight rather than degree constraint  Ω(w(v)) updates below v between rebalancing operations on v Persistent B-trees: Update in current version in Search in all previous versions in Buffer trees Batching of operations to obtain bounds  construction algorithms Lars Arge

Last time: Interval management I/O-algorithms Last time: Interval management Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently Static solution: Persistent B-tree Linear space and query Dynamic solution: External interval tree update x Lars Arge

Internal Interval Tree I/O-algorithms Internal Interval Tree Base tree on endpoints – “slab” Xv associated with each node v Interval stored in highest node v where it contains midpoint of Xv Intervals Iv associated with v stored in Left slab list sorted by left endpoint (search tree) Right slab list sorted by right endpoint (search tree)  Linear space and O(log N) update Lars Arge

Internal Interval Tree I/O-algorithms Internal Interval Tree x Query with x on left side of midpoint of Xroot Search left slab list left-right until finding non-stabbed interval Recurse in left child  O(log N+T) query bound Lars Arge

Externalizing Interval Tree I/O-algorithms Externalizing Interval Tree Natural idea: Block tree Use B-tree for slab lists Number of stabbed intervals in large slab list may be small (or zero) We can be forced to do I/O in each of O(log N) nodes Lars Arge

Externalizing Interval Tree I/O-algorithms Externalizing Interval Tree multislab Idea: Decrease fan-out to  height remains slabs define multislabs Interval stored in two slab lists (as before) and one multislab list Intervals in small multislab lists collected in underflow structure Query answered in v by looking at 2 slab lists and not O(log N) Lars Arge

External Interval Tree I/O-algorithms External Interval Tree Linear space, query, update General solution techniques: Filtering: Charge part of query cost to output Bootstrapping: Use O(B2) size structure in each internal node Constructed using persistence Dynamic using global rebuilding Weight-balanced B-tree: Split/fuse in amortized O(1) Lars Arge

Last time: Three-Sided Range Queries I/O-algorithms Last time: Three-Sided Range Queries Interval management: “1.5 dimensional” search More general 2d problem: Dynamic 3-sidede range searching Maintain set of points in plane such that given query (q1, q2, q3), all points (x,y) with q1  x  q2 and y  q3 can be found efficiently Linear space, O(logB N+T/B) query static solution using persistence Dynamic: External priority search tree x x1 x2 (x,x) (x1,x2) q3 q2 q1 Lars Arge

Internal Priority Search Tree I/O-algorithms Internal Priority Search Tree 9 16.20 16 19,9 13 13,3 19 20,3 4 5,6 5 9,4 1 1,2 20 4,1 Base tree on x-coordinates with nodes augmented with points Heap on y-coordinates Decreasing y values on root-leaf path (x,y) on path from root to leaf holding x If v holds point then parent(v) holds point  Linear space and O(log N) update Lars Arge

Internal Priority Search Tree I/O-algorithms Internal Priority Search Tree 9 16.20 4 4 16 5,6 19,9 4 19 1 5 13 19 1,2 9,4 13,3 20,3 1 4 5 9 13 16 19 20 4,1 Query with (q1, q2, q3) starting at root v: Report point in v if satisfying query Visit both children of v if point reported Always visit child(s) of v on path(s) to q1 and q2  O(log N+T) query Lars Arge

Externalizing Priority Search Tree I/O-algorithms Externalizing Priority Search Tree 9 16.20 4 16 5,6 19,9 1 5 13 19 1,2 9,4 13,3 20,3 1 4 5 9 13 16 19 20 4,1 Natural idea: Block tree Problem: I/Os to follow paths to to q1 and q2 But O(T) I/Os may be used to visit other nodes (“overshooting”)  query Lars Arge

Externalizing Priority Search Tree I/O-algorithms Externalizing Priority Search Tree 9 16.20 4 16 5,6 19,9 1 5 13 19 1,2 9,4 13,3 20,3 1 4 5 9 13 16 19 20 4,1 Solution idea: Store B points in each node  O(B2) points stored in each supernode B output points can pay for “overshooting” Bootstrapping: Store O(B2) points in each supernode in static structure Lars Arge

Summary/Conclusion: Priority Search Tree I/O-algorithms Summary/Conclusion: Priority Search Tree We have now discussed structures for special cases of two-dimensional range searching Space: O(N/B) Query: Updates: Cannot be obtained for general (4-sided) 2d range searching: query requires space space requires query q q3 q q1 q2 q3 q2 q1 q4 Lars Arge

I/O-algorithms External Range Tree Base tree: Weight balanced tree with branching parameter and leaf parameter B on x-coordinates  height Points below each node stored in 4 linear space secondary structures: “Right” priority search tree “Left” priority search tree B-tree on y-coordinates Interval (priority search) tree space Lars Arge

External Range Tree Secondary interval tree: I/O-algorithms External Range Tree Secondary interval tree: Connect points in each slab in y-order Project obtained segments in y-axis Intervals stored in interval tree Interval augmented with pointer to corresponding points in y-coordinate B-tree in corresponding child node Lars Arge

I/O-algorithms External Range Tree Query with (q1, q2, q3 , q4) answered in top node with q1 and q2 in different slabs v1 and v2 Points in slab v1 Found with 3-sided query in v1 using right priority search tree Points in slab v2 Found with 3-sided query in v2 using left priority search tree Points in slabs between v1 and v2 Answer stabbing query with q3 using interval tree  first point above q3 in each of the slabs Find points using y-coordinate B-tree in slabs v1 v2 Lars Arge

External Range Tree Query analysis: I/Os to find relevant node I/O-algorithms External Range Tree Query analysis: I/Os to find relevant node I/Os to answer two 3-sided queries I/Os to query interval tree I/Os to traverse B-trees  I/Os v1 v2 Lars Arge

External Range Tree Insert: I/O-algorithms External Range Tree Insert: Insert x-coordinate in weight-balanced B-tree Split of v can be performed in I/Os  I/Os Update secondary structures in all nodes on one root-leaf path Update priority search trees Update interval tree Update B-tree  I/Os Delete: Similar and using global rebuilding v1 v2 Lars Arge

Summary: External Range Tree I/O-algorithms Summary: External Range Tree 2d range searching in space I/O query I/O update Optimal among query structures q4 q3 q1 q2 Lars Arge

I/O-algorithms kdB-tree kd-tree: Recursive subdivision of point-set into two half using vertical/horizontal line Horizontal line on even levels, vertical on uneven levels One point in each leaf  Linear space and logarithmic height Lars Arge

I/O-algorithms kd-Tree: Query Query Recursively visit nodes corresponding to regions intersecting query Report point in trees/nodes completely contained in query Query analysis Horizontal line intersect Q(N) = 2+2Q(N/4) = regions Query covers T regions I/Os worst-case Lars Arge

I/O-algorithms kdB-tree kdB-tree: Stop subdivision when leaf contains between B/2 and B points BFS-blocking of internal nodes Query as before Analysis as before but each region now contains Θ(B) points  I/O query Lars Arge

Construction of kdB-tree I/O-algorithms Construction of kdB-tree Simple algorithm Find median of y-coordinates (construct root) Distribute point based on median Recursively build subtrees Construct BFS-blocking top-down Idea in improved algorithm Construct levels at a time using O(N/B) I/Os Lars Arge

Construction of kdB-tree I/O-algorithms Construction of kdB-tree • Sort N points by x- and by y-coordinates using I/Os • Building levels ( nodes) in O(N/B) I/Os: 1. Construct by grid with points in each slab 2. Count number of points in each grid cell and store in memory 3. Find slab s with median x-coordinate 4. Scan slab s to find median x-coordinate and construct node 5. Split slab containing median x-coordinate and update counts 6. Recurse on each side of median x-coordinate using grid (step 3) Grid grows to during algorithm Each node constructed in I/Os TURN Y-X!! Lars Arge

kdB-tree kdB-tree: Linear space Query in I/Os Construction in I/Os I/O-algorithms kdB-tree kdB-tree: Linear space Query in I/Os Construction in I/Os Point search in I/Os Dynamic? Deletions relatively easily in I/Os (partial rebuilding) Describe point search Discuss dynamic Deletions: Just delete, only problem is to maintain Θ(B) in each leaf (spare you details) Insertion: Problem - even problem in internal memory (ADD DELETE ALGORITHM) Lars Arge

kdB-tree Insertion using Logarithmic Method I/O-algorithms kdB-tree Insertion using Logarithmic Method Partition pointset S into subsets S0, S1, … Slog N, |Si| = 2i or |Si| = 0 Build kdB-tree Di on Si Query: Query each Di  Insert: Find first empty Di and construct Di out of elements in S0,S1, … Si-1 I/Os  per moved point Point moved O(log N) times  I/Os amortized Lars Arge

kdB-tree Insertion and Deletion I/O-algorithms kdB-tree Insertion and Deletion Insert: Use logarithmic method ignoring deletes Delete: Simply delete point p from relevant Di i can be calculated based on # insertions since p was inserted # insertions calculated by storing insertion number of each point in separate B-tree  extra update cost To maintain O(log N) structures Di Perform global rebuild after every Θ(N) updates Lars Arge

Summary: kdB-tree 2d range searching in O(N/B) space Query in I/Os I/O-algorithms Summary: kdB-tree 2d range searching in O(N/B) space Query in I/Os Construction in I/Os Updates in I/Os Optimal query among linear space structures q3 q2 q1 q4 Lars Arge

O-Tree Structure O-tree: B-tree on vertical slabs I/O-algorithms O-Tree Structure O-tree: B-tree on vertical slabs B-tree on horizontal slabs in each vertical slab kdB-tree on points in each leaf Linear space (DETAILS INSTEAD OF Θ) (CONSTRUCTION) Lars Arge

O-Tree Query Perform rangesearch with q1 and q2 in vertical B-tree I/O-algorithms O-Tree Query Perform rangesearch with q1 and q2 in vertical B-tree Query all kdB-trees in leaves of two horizontal B-trees with x-interval intersected but not spanned by query Perform rangesearch with q3 and q4 horizontal B-trees with x-interval spanned by query Query all kdB-trees with range intersected by query Lars Arge

O-Tree Query Analysis Vertical B-tree query: I/O-algorithms O-Tree Query Analysis Vertical B-tree query: Query of all kdB-trees in leaves of two horizontal B-trees: Query horizontal B-trees: Query kdB-trees not completely in query Query in kdB-trees completely contained in query:  I/Os Lars Arge

O-Tree Update Insert: Search in vertical B-tree: I/Os I/O-algorithms O-Tree Update Insert: Search in vertical B-tree: I/Os Search in horizontal B-tree: I/Os Insert in kdB-tree: I/Os Use global rebuilding when structures grow too big/small B-trees not contain elements kdB-trees not contain elements  I/Os Deletes can be handled in I/Os similarly (GLOBAL REBUILD DETAIL? – NOT CORRECT AS IS) Lars Arge

Summary: O-Tree 2d range searching in linear space I/O query I/O-algorithms Summary: O-Tree 2d range searching in linear space I/O query I/O update Optimal among structures using linear space Can be extended to work in d-dimensions with optimal query bound q4 q3 q1 q2 Lars Arge

Summary/Conclusion: 3 and 4-sided Queries I/O-algorithms Summary/Conclusion: 3 and 4-sided Queries 3-sided 2d range searching: External priority search tree query, space, update General (4-sided) 2d range searching: External range tree: query, space, update O-tree: query, space, update q3 q2 q1 q3 q2 q1 q4 Lars Arge

Summary/Conclusion: Tools and Techniques I/O-algorithms Summary/Conclusion: Tools and Techniques Tools: B-trees Persistent B-trees Buffer trees Logarithmic method Weight-balanced B-trees Global rebuilding Techniques: Bootstrapping Filtering (x,x) q3 q2 q1 q3 q2 q1 q4 Lars Arge

References External Memory Geometric Data Structures I/O-algorithms References External Memory Geometric Data Structures Lecture notes by Lars Arge. Section 8-9 Lars Arge