I/O-Algorithms Lars Arge Aarhus University February 27, 2007.

Slides:



Advertisements
Similar presentations
Temporal Databases S. Srinivasa Rao April 12, 2007
Advertisements

An Optimal Dynamic Interval Stabbing-Max Data Structure? Pankaj K. Agarwal, Lars Arge and Ke Yi Department of Computer Science Duke University.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
External Memory Geometric Data Structures
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
2-dimensional indexing structure
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Aarhus University March 16, 2006.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
Geometric Data Structures Computational Geometry, WS 2007/08 Lecture 13 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
Point Location Computational Geometry, WS 2007/08 Lecture 5 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät für.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part II Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 11 : More Geometric Data Structures Computational Geometry Prof. Dr. Th. Ottmann 1 Geometric Data Structures 1.Rectangle Intersection 2.Segment.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
14/13/15 CMPS 3130/6130 Computational Geometry Spring 2015 Windowing Carola Wenk CMPS 3130/6130 Computational Geometry.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
UNC Chapel Hill M. C. Lin Geometric Data Structures Reading: Chapter 10 of the Textbook Driving Applications –Windowing Queries Related Application –Query.
X1x1 x2x2 top-k y 3-sided x1x1 x2x2 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates Gerth Stølting Brodal Aarhus.
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
B+ Tree.
Segment tree and Interval Tree
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
8th Workshop on Massive Data Algorithms, August 23, 2016
Presentation transcript:

I/O-Algorithms Lars Arge Aarhus University February 27, 2007

Lars Arge I/O-algorithms 2 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem We often assume that M>B 2 I/O: Movement of block between memory and disk D P M Block I/O

Lars Arge I/O-algorithms 3 Fundamental Bounds Internal External Scanning: N Sorting: N log N Permuting Searching:

Lars Arge I/O-algorithms 4 Fundamental Data Structures B-trees: Node degree  (B)  queries in –Rebalancing using split/fuse  updates in Weight-balanced B-tress: Weight rather than degree constraint  Ω(w(v)) updates below v between rebalancing operations on v Persistent B-trees: –Update in current version in –Search in all previous versions in Buffer trees –Batching of operations to obtain bounds  construction algorithms

Lars Arge I/O-algorithms 5 Last time: Interval management Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently Static solution: Persistent B-tree –Linear space and query Dynamic solution: External interval tree – update x

Lars Arge I/O-algorithms 6 Base tree on endpoints – “slab” X v associated with each node v Interval stored in highest node v where it contains midpoint of X v Intervals I v associated with v stored in –Left slab list sorted by left endpoint (search tree) –Right slab list sorted by right endpoint (search tree)  Linear space and O(log N) update Internal Interval Tree

Lars Arge I/O-algorithms 7 Query with x on left side of midpoint of X root –Search left slab list left-right until finding non-stabbed interval –Recurse in left child  O(log N+T) query bound x Internal Interval Tree

Lars Arge I/O-algorithms 8 Externalizing Interval Tree Natural idea: –Block tree –Use B-tree for slab lists Number of stabbed intervals in large slab list may be small (or zero) –We can be forced to do I/O in each of O(log N) nodes

Lars Arge I/O-algorithms 9 Externalizing Interval Tree Idea: –Decrease fan-out to  height remains – slabs define multislabs –Interval stored in two slab lists (as before) and one multislab list –Intervals in small multislab lists collected in underflow structure –Query answered in v by looking at 2 slab lists and not O(log N) multislab

Lars Arge I/O-algorithms 10 External Interval Tree Linear space, query, update General solution techniques: –Filtering: Charge part of query cost to output –Bootstrapping: *Use O(B 2 ) size structure in each internal node *Constructed using persistence *Dynamic using global rebuilding –Weight-balanced B-tree: Split/fuse in amortized O(1)

Lars Arge I/O-algorithms 11 Three-Sided Range Queries Interval management: “1.5 dimensional” search More general 2d problem: Dynamic 3-sidede range searching –Maintain set of points in plane such that given query (q 1, q 2, q 3 ), all points (x,y) with q 1  x  q 2 and y  q 3 can be found efficiently (x,x) (x 1,x 2 ) x x1x1 x2x2 q3q3 q2q2 q1q1

Lars Arge I/O-algorithms 12 Three-Sided Range Queries Report all points (x,y) with q 1  x  q 2 and y  q 3 Static solution: –Sweep top-down inserting x in persistent B-tree at (x,y) –Answer query by performing range query with [q 1,q 2 ] in B-tree at q 3 Optimal: –O(N/B) space –O(log B N+T/B) query – construction Dynamic? … in internal memory priority search tree

Lars Arge I/O-algorithms 13 Base tree on x-coordinates with nodes augmented with points Heap on y-coordinates –Decreasing y values on root-leaf path –(x,y) on path from root to leaf holding x –If v holds point then parent(v) holds point Internal Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 14 Linear space Insert of (x,y) (assuming fixed x-coordinate set): –Compare y with y-coordinate in root –Smaller: Recursively insert (x,y) in subtree on path to x –Bigger: Insert in root and recursively insert old point in subtree  O(log N) update Internal Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1 Insert (10,21) 10,21

Lars Arge I/O-algorithms 15 Internal Priority Search Tree Query with (q 1, q 2, q 3 ) starting at root v: –Report point in v if satisfying query –Visit both children of v if point reported –Always visit child(s) of v on path(s) to q 1 and q 2  O(log N+T) query , , ,3 4 5,6 5 9,4 1 1, ,

Lars Arge I/O-algorithms 16 Natural idea: Block tree Problem: – I/Os to follow paths to to q 1 and q 2 –But O(T) I/Os may be used to visit other nodes (“overshooting”)  query Externalizing Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 17 Externalizing Priority Search Tree Solution idea: –Store B points in each node  *O(B 2 ) points stored in each supernode *B output points can pay for “overshooting” –Bootstrapping: *Store O(B 2 ) points in each supernode in static structure , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 18 External Priority Search Tree Base tree: Weight-balanced B-tree with branching parameter B/4 and leaf parameter B on x-coordinates Points in “heap order”: –Root stores B top points for each of the child slabs –Remaining points stored recursively Points in each node stored in “B 2 -structure” –Persistent B-tree structure for static problem  Linear space

Lars Arge I/O-algorithms 19 External Priority Search Tree Query with (q 1, q 2, q 3 ) starting at root v: –Query B 2 -structure and report points satisfying query –Visit child v if *v on path to q 1 or q 2 *All points corresponding to v satisfy query

Lars Arge I/O-algorithms 20 External Priority Search Tree Analysis: – I/Os used to visit node v – nodes on path to q 1 or q 2 –For each node v not on path to q 1 or q 2 visited, B points reported in parent(v)  query

Lars Arge I/O-algorithms 21 External Priority Search Tree Insert (x,y) (ignoring insert in base tree - rebalancing): –Find relevant node v: *Query B 2 -structure to find B points in root corresponding to node u on path to x *If y smaller than y-coordinates of all B points then recursively search in u –Insert (x,y) in B 2 -structure of v –If B 2 -structure contains >B points for child u, remove lowest point and insert recursively in u Delete: Similarly u

Lars Arge I/O-algorithms 22 Analysis: –Update visits nodes –B 2 -structure queried/updated in each node *One query *One insert and one delete B 2 -structure analysis: –Query: –Update: O(1) using global rebuilding *Store updates in update block *Rebuild after B updates using I/Os  I/O updates External Priority Search Tree u

Lars Arge I/O-algorithms 23 Dynamic Base Tree Deletion: –Delete point as previously –Delete x-coordinate from base tree using global rebuilding  I/Os amortized Insertion: –Insert x-coordinate in base tree and rebalance (using splits) –Insert point as previously Split: Boundary in v becomes boundary in parent(v) v v’’ v’

Lars Arge I/O-algorithms 24 Dynamic Base Tree Split: When v splits B new points needed in parent(v) One point obtained from v’ (v’’) using “bubble-up” operation: –Find top point p in v’ –Insert p in B 2 -structure –Remove p from B 2 -structure of v’ –Recursively bubble-up point to v’ Bubble-up in I/Os –Follow one path from v to leaf –Uses O(1) I/O in each node  Split in I/Os v’’ v’

Lars Arge I/O-algorithms 25 Dynamic Base Tree O(1) amortized split cost: –Cost: O(w(v)) –Weight balanced base tree: inserts below v between splits  External Priority Search Tree –Space: O(N/B) –Query: –Updates: I/Os amortized Amortization can be removed from update bound in several ways –Utilizing lazy rebuilding v’’ v’

Lars Arge I/O-algorithms 26 Summary/Conclusion: Priority Search Tree We have now discussed structures for special cases of two- dimensional range searching –Space: O(N/B) –Query: –Updates: Cannot be obtained for general (4-sided) 2d range searching: – query requires space – space requires query q3q3 q2q2 q1q1 q q q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 27 Base tree: Weight balanced tree with branching parameter and leaf parameter B on x-coordinates  height Points below each node stored in 4 linear space secondary structures: –“Right” priority search tree –“Left” priority search tree –B-tree on y-coordinates –Interval (priority search) tree  space External Range Tree

Lars Arge I/O-algorithms 28 Secondary interval tree: –Connect points in each slab in y-order –Project obtained segments in y-axis –Intervals stored in priority search tree *Interval augmented with pointer to corresponding points in y- coordinate B-tree in corresponding child node External Range Tree

Lars Arge I/O-algorithms 29 Query with (q 1, q 2, q 3, q 4 ) answered in top node with q 1 and q 2 in different slabs v 1 and v 2 Points in slab v 1 –Found with 3-sided query in v 1 using right priority search tree Points in slab v 2 –Found with 3-sided query in v 2 using left priority search tree Points in slabs between v 1 and v 2 –Answer stabbing query with q 3 using interval tree  first point above q 3 in each of the slabs –Find points using y-coordinate B-tree in slabs External Range Tree v1v1 v2v2

Lars Arge I/O-algorithms 30 External Range Tree Query analysis: – I/Os to find relevant node – I/Os to answer two 3-sided queries – I/Os to query interval tree – I/Os to traverse B-trees  I/Os v1v1 v2v2

Lars Arge I/O-algorithms 31 External Range Tree Insert: –Insert x-coordinate in weight-balanced B-tree *Split of v can be performed in I/Os  I/Os –Update secondary structures in all nodes on one root-leaf path *Update priority search trees *Update interval tree *Update B-tree  I/Os Delete: –Similar and using global rebuilding v1v1 v2v2

Lars Arge I/O-algorithms 32 Summary: External Range Tree 2d range searching in space – I/O query – I/O update Optimal among query structures q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 33 kdB-tree kd-tree: –Recursive subdivision of point-set into two half using vertical/horizontal line –Horizontal line on even levels, vertical on uneven levels –One point in each leaf  Linear space and logarithmic height

Lars Arge I/O-algorithms 34 kd-Tree: Query Query –Recursively visit nodes corresponding to regions intersecting query –Report point in trees/nodes completely contained in query Query analysis –Horizontal line intersect Q(N) = 2+2Q(N/4) = regions –Query covers T regions  I/Os worst-case

Lars Arge I/O-algorithms 35 kdB-tree kdB-tree: –Stop subdivision when leaf contains between B/2 and B points –BFS-blocking of internal nodes Query as before –Analysis as before but each region now contains Θ(B) points  I/O query

Lars Arge I/O-algorithms 36 Construction of kdB-tree Simple algorithm –Find median of y-coordinates (construct root) –Distribute point based on median –Recursively build subtrees –Construct BFS-blocking top-down Idea in improved algorithm –Construct levels at a time using O(N/B) I/Os

Lars Arge I/O-algorithms 37 Construction of kdB-tree Sort N points by x- and by y-coordinates using I/Os Building levels ( nodes) in O(N/B) I/Os: 1. Construct by grid with points in each slab 2. Count number of points in each grid cell and store in memory 3. Find slab s with median x-coordinate 4. Scan slab s to find median x-coordinate and construct node 5. Split slab containing median x-coordinate and update counts 6. Recurse on each side of median x-coordinate using grid (step 3)  Grid grows to during algorithm  Each node constructed in I/Os

Lars Arge I/O-algorithms 38 kdB-tree kdB-tree: –Linear space –Query in I/Os –Construction in I/Os –Point search in I/Os Dynamic? –Deletions relatively easily in I/Os (partial rebuilding)

Lars Arge I/O-algorithms 39 kdB-tree Insertion using Logarithmic Method Partition pointset S into subsets S 0, S 1, … S log N, |S i | = 2 i or |S i | = 0 Build kdB-tree D i on S i Query: Query each D i  Insert: Find first empty D i and construct D i out of elements in S 0,S 1, … S i-1 – I/Os  per moved point –Point moved O(log N) times  I/Os amortized

Lars Arge I/O-algorithms 40 kdB-tree Insertion and Deletion Insert: Use logarithmic method ignoring deletes Delete: Simply delete point p from relevant D i –i can be calculated based on # insertions since p was inserted –# insertions calculated by storing insertion number of each point in separate B-tree  extra update cost To maintain O(log N) structures D i –Perform global rebuild after every Θ(N) updates  extra update cost

Lars Arge I/O-algorithms 41 Summary: kdB-tree 2d range searching in O(N/B) space –Query in I/Os –Construction in I/Os –Updates in I/Os Optimal query among linear space structures q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 42 O-Tree Structure O-tree: –B-tree on vertical slabs –B-tree on horizontal slabs in each vertical slab –kdB-tree on points in each leaf

Lars Arge I/O-algorithms 43 O-Tree Query Perform rangesearch with q 1 and q 2 in vertical B-tree –Query all kdB-trees in leaves of two horizontal B-trees with x- interval intersected but not spanned by query –Perform rangesearch with q 3 and q 4 horizontal B-trees with x- interval spanned by query *Query all kdB-trees with range intersected by query

Lars Arge I/O-algorithms 44 O-Tree Query Analysis Vertical B-tree query: Query of all kdB-trees in leaves of two horizontal B-trees: Query horizontal B-trees: Query kdB-trees not completely in query Query in kdB-trees completely contained in query:  I/Os

Lars Arge I/O-algorithms 45 O-Tree Update Insert: –Search in vertical B-tree: I/Os –Search in horizontal B-tree: I/Os –Insert in kdB-tree: I/Os Use global rebuilding when structures grow too big/small –B-trees not contain elements –kdB-trees not contain elements  I/Os Deletes can be handled in I/Os similarly

Lars Arge I/O-algorithms 46 Summary: O-Tree 2d range searching in linear space – I/O query – I/O update Optimal among structures using linear space Can be extended to work in d-dimensions with optimal query bound q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 47 Summary/Conclusion: 3 and 4-sided Queries 3-sided 2d range searching: External priority search tree – query, space, update General (4-sided) 2d range searching: –External range tree: query, space, update –O-tree: query, space, update q3q3 q2q2 q1q1 q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 48 Summary/Conclusion: Tools and Techniques Tools: –B-trees –Persistent B-trees –Buffer trees –Logarithmic method –Weight-balanced B-trees –Global rebuilding Techniques: –Bootstrapping –Filtering q3q3 q2q2 q1q1 q3q3 q2q2 q1q1 q4q4 (x,x)

Lars Arge I/O-algorithms 49 References External Memory Geometric Data Structures Lecture notes by Lars Arge. –Section 7-9