B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)

Slides:



Advertisements
Similar presentations
Nearest Neighbor Queries using R-trees
Advertisements

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Searching on Multi-Dimensional Data
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
External Memory Geometric Data Structures
2-dimensional indexing structure
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Micha Streppel TU Eindhoven  NCIM-Groep, the Netherlands and Ke Yi AT&T Labs, USA  HKUST, Hong Kong.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Chapter 10 Search Structures Instructors: C. Y. Tang and J. S. Roger Jang All the material are integrated from the textbook "Fundamentals of Data Structures.
The Complexity of Algorithms and the Lower Bounds of Problems
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B-trees Eduardo Laber David Sotelo. What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Multiway Search Trees Data may not fit into main memory
KD Tree A binary search tree where every node is a
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
Binary Search Trees < > =
Presentation transcript:

B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)

Lars Arge External memory data structures 2 Before we start If you are considering taking this class (or attending just a few lectures), send me an Web page up and running: Reading list updated – if you want to present, send me an

Lars Arge External memory data structures 3 Today 1D data structure for searching in external memory –O(log N) I/O’s using standard data structures –Will show how to reduce it to O( log B N) We already know how to sort using O(N/B log M/B N) I/O’s Therefore, we will move on to 2D We will start from main memory data structure for range search problem: –Input: a set of points in 2D –Goal: a data structure, which given a query rectangle, reports all points within the rectangle Then we will continue with approximate nearest neighbor (also in main memory)

Lars Arge External memory data structures 4 Searching in External Memory Dictionary (or successor) data structure for 1D data: –Maintains elements (e.g., numbers) under insertions and deletions –Given a key K, reports the successor of K; i.e., the smallest element which is greater or equal to K

Lars Arge External memory data structures 5  Search in time Binary search tree: –Standard method for search among N elements –We assume elements in leaves –Search traces at least one root-leaf path Internal Search Trees

Lars Arge External memory data structures 6 Model Model as previously –N: Elements in structure –B: Elements per block –M: Elements in main memory –T: Output size in searching problems D P M Block I/O

Lars Arge External memory data structures 7 (a,b)-tree uses linear space and has height  Choosing a,b = each node/leaf stored in one disk block   space and query (a,b)-tree (or B-tree) T is an (a,b)-tree (a≥2 and b≥2a-1) –All leaves on the same level (contain between a and b elements) –Except for the root, all nodes have degree between a and b –Root has degree between 2 and b  tree

Lars Arge External memory data structures 8 (a,b)-Tree Insert Insert: Search and insert element in leaf v DO v has b+1 elements Split v: make nodes v’ and v’’ with and elements insert element (ref) in parent(v) (make new root if necessary) v=parent(v) Insert touches nodes v v’v’’

Lars Arge External memory data structures 9 (a,b)-Tree Delete Delete: Search and delete element from leaf v DO v has a-1 children Fuse v with sibling v’: move children of v’ to v delete element (ref) from parent(v) (delete root if necessary) If v has >b (and ≤ a+b-1) children split v v=parent(v) Delete touches nodes v v

Lars Arge External memory data structures 10 Range Searching in 2D Recall the definition: given a set of n points, build a data structure that for any query rectangle R, reports all points in R Updates are also possible, but: –Fairly complex in theory –Straightforward approach works well in practice

Lars Arge External memory data structures 11 Kd-trees Not the most efficient solution in theory Everyone uses it in practice Algorithm: –Choose x or y coordinate (alternate) –Choose the median of the coordinate; this defines a horizontal or vertical line –Recurse on both sides We get a binary tree: –Size: O(N) –Depth: O(log N) –Construction time: O(N log N)

Lars Arge External memory data structures 12 Kd-tree: Example Each tree node v corresponds to a region Reg(v).

Lars Arge External memory data structures 13 Kd-tree: Range Queries Recursive procedure, starting from v=root Search (v,R): –If v is a leaf, then report the point stored in v if it lies in R –Otherwise, if Reg(v) is contained in R, report all points in the subtree of v (*) –Otherwise: *If Region(left(v)) intersects R, then Search(left(v),R) *If Region(right(v)) intersects R, then Search(right(v),R)

Lars Arge External memory data structures 14 Query Time Analysis We will show that Search takes at most O(sqrt{n}+k) time, where k is the number of reported points –The total time needed to report all points in all subtrees (i.e., taken by step (*)) is O(k) –We just need to bound the number of nodes v such that Reg(v) intersects R but is not contained in R. In other words, the boundary of R intersects the boundary of Reg(v) –Will make a gross overestimation: will bound the number of Reg(v) which cross any horizontal or vertical line

Lars Arge External memory data structures 15 Query Time Continued What is the max number Q(n) of regions in an n-point kd-tree intersecting (say, vertical ) line ? –If we split on x, Q(n)=1+Q(n/2) –If we split on y, Q(n)=2*Q(n/2)+2 –Since we alternate, we can write Q(n)=3+2Q(n/4) This solves to O(sqrt{n})

Lars Arge External memory data structures 16 Approximate Nearest Neighbor (ANN) Definition: –Given: a set of points P in 2D –Goal: given a query point q, and eps>0, find a point p’ whose distance to q is at most (1+eps) times the distance from q to its nearest neighbor We will “solve” the problem using kd-trees… …under the assumption that all leaf cells of the kd-tree for P have bounded aspect ratio Assumption somewhat strict, but satisfied in practice for most of the leaf cells We will show –O(log n/eps 2 ) query time –O(n) space (inherited from kd-tree)

Lars Arge External memory data structures 17 ANN Query Procedure Locate the leaf cell containing q Enumerate all leaf cells C in the increasing order of distance from q (denote it by r) –Let p(C) be the point in C – Update p’ so that it is the closest point seen so far Stop when dist(q,p’)<(1+eps)*r

Lars Arge External memory data structures 18 Analysis Correctness: –We have touched all cells within distance r from q. Thus, if there is a point within distance r from q, we already found it –If there is no such point, then the p’ provides a (1+eps)- approximate solution Running time: –All cells C seen so far (except maybe for the last one) have diameter > eps*r –…Because if not, then p(C) would have been a (1+eps)- approximate nearest neighbor, and we would have stopped –The number of cells with diameter eps*r, bounded aspect ratio, and touching a ball of radius r is at most O(1/eps 2 )

Lars Arge External memory data structures 19 References B-trees: “Introduction to Algorithms”, Cormen, Leiserson, Rivest, Stein, 2 nd edition. Kd-trees: “Computational Geometry”, M. de Berg, M. van Kreveld, M. Overmars, O, Schwarzkopf. Chapter 5 Approximate Nearest Neighbor (general algorithm without the bounded ratio assumption): Arya et al, ``An optimal algorithm for approximate nearest neighbor searching,'' Journal of the ACM, 45 (1998), For implementation, see