School of Computing Clemson University Fall, 2012

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Nearest Neighbor Search
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Multidimensional Indexing
Searching on Multi-Dimensional Data
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
BST Data Structure A BST node contains: A BST contains
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
 This lecture introduces multi-dimensional queries in databases, as well as addresses how we can query and represent multi- dimensional data.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Spatial Data Structures Jason Goffeney, 4/26/2006 from Real Time Rendering.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Multi-dimensional Search Trees
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
Computational Geometry Piyush Kumar (Lecture 5: Range Searching) Welcome to CIS5930.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2015.
CMPS 3130/6130 Computational Geometry Spring 2015
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
Instructor: Lilian de Greef Quarter: Summer 2017

School of Computing Clemson University Fall, 2012
CSE 554 Lecture 5: Contouring (faster)
School of Computing Clemson University Fall, 2012
Lecture 15 Nov 3, 2013 Height-balanced BST Recall:
Week 7 - Friday CS221.
Multiway Search Trees Data may not fit into main memory
School of Computing Clemson University Fall, 2012
CMPS 3130/6130 Computational Geometry Spring 2017
School of Computing Clemson University Fall, 2013
School of Computing Clemson University Fall, 2012
Multidimensional Range Search
Spatial data structures -kdtrees
Spatial Indexing I Point Access Methods.
CO4301 – Advanced Games Development Week 10 Red-Black Trees continued
Lecture 22 Binary Search Trees Chapter 10 of textbook
i206: Lecture 13: Recursion, continued Trees
Chapter Trees and B-Trees
Chapter Trees and B-Trees
KD Tree A binary search tree where every node is a
CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees Catie Baker Spring 2015.
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Data Structures: Segment Trees, Fenwick Trees
Chapter 6 Transform and Conquer.
15-826: Multimedia Databases and Data Mining
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
Topics covered (since exam 1, excluding PQ):
B- Trees D. Frey with apologies to Tom Anastasio
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
Range Queries on Uncertain Data
CS679 - Fall Copyright Univ. of Wisconsin
Topic 6: Binary Search Tree Data structure Operations
CSE2331/5331 Topic 7: Balanced search trees Rotate operation
CMPS 3130/6130 Computational Geometry Spring 2017
CSE 326: Data Structures Lecture #9 AVL II
Richard Anderson Spring 2016
CMSC 341 Splay Trees.
Presentation transcript:

School of Computing Clemson University Fall, 2012 Lecture 12. Range Queries and Multi-Dimensional Search Structures CpSc 212: Algorithms and Data Structures Brian C. Dean School of Computing Clemson University Fall, 2012

Range Queries : Examples In a dictionary: “Tell me all elements with keys in the range [a, b].” “How many elements are there with keys in the range [a, b]?” “What is the min / max / sum of all elements in the range [a, b]?” In a sequence A1…An: “What is the min / max / sum of all elements in Ai…Aj?” “What are the k largest values in the range Ai…Aj?” In more than one dimension: “Tell me all the points in this region.” “Tell me some aggregate statistic about all points in this region (e.g., count, min, max, sum, etc.).” Age Household income

Range Updates : Examples In a dictionary: “Delete all elements in the range [a, b].” “Apply some operation to all elements in the range [a, b].” In a sequence A1…An: “Delete all elements in the range Ai…Aj.” “Increase all elements in Ai…Aj by a common value v.” In more than one dimension: “Apply some operation to all points in this region (e.g., delete, change some attribute by a common value).” Age Household income

Finding all Elements in [a, b] in a Dictionary First find a (or the successor of a, if a is not present). Then call successor repeatedly until we’ve stepped through all elements in [a, b]. Total time: O(k + log n) on a balanced BST, where k is the number of elements written as output. This is called an “output-sensitive” running time, and we’ll see such running times often in the study of data structures. LCA(a,b) = “lowest common ancestor” of a and b. a b = in the range [a, b]. (or succ(a)) (or pred(b))

Computing Aggregate Statistics Over a Range We can count or find the min/max/sum of elements in a range in O(log n) time on a balanced BST. This works for a dictionary or a sequence encoded within a BST. On a sequence, we can use this to implement the operations range-sum(i, j), range-min(i, j), and range-max(i, j). Aggregate all node information at yellow nodes and augmented subtree information at red nodes: LCA(a,b) (possibly succ(a), or alternatively select(i) if encoding a sequence) a (possibly pred(b), or alternatively select(j) if encoding a sequence) b

Range Queries in Splay Trees Range queries (and updates) are particularly nice on splay trees. Given a range query over [a, b] in a dictionary: Splay(b) Splay(a), making sure we perform a single rotation at the root. This effectively isolates all the elements in (a, b) in a single subtree! And this of course works for a sequence too… a (possibly pred(b), or alternatively select(j) if encoding a sequence) (possibly succ(a), or alternatively select(i) if encoding a sequence) b (a, b)

Static Range Queries in Dictionaries Input: Set of n numbers (points in 1 dimension). Common Problems: 1. Tell me all the points in the range [a, b]. 2. Count the number of points in the range [a, b]. What is the best data structure for (1) and (2) in the static case?

Static Range Queries in Dictionaries Input: Set of n numbers (points in 1 dimension). Common Problems: 1. Tell me all the points in the range [a, b]. 2. Count the number of points in the range [a, b]. What is the best data structure for (1) and (2) in the static case? A sorted array! This solves (1) in O(k + log n) time and (2) in O(log n) time.

Databases Many computing professionals work with large databases. Structured Query Language (SQL) is a common way to interact with databases. For example: SELECT title, author FROM books_in_library WHERE price <= 100 AND publication date >= 1990 AND page_count BETWEEN 500 AND 750; Range queries like this can be sped up by telling the database to build indexes (usually a B-trees) on particular fields (e.g., page_count, price).

Multidimensional Range Queries We can think of the records in a database as points in a high-dimensional space: Example of a multidimensional range query: “Tell me all records with age in the range [18, 24] and household income in the range [$50,000, $80,000]”. Today, we’ll focus on static multidimensional range queries (usually 2D); no range updates… Age Household income

The Quadtree Root node splits plane into 4 quadrants at some point (usually (xmid, ymid), although random is usually fine). Divide until ≤ 1 point in region. Preprocessing time: O(n log n) Space: O(n) Height: O(log n) Generalizes naturally to d = 3 (octrees) and higher dimensions. A 1 2 B D C A C D (xmid, ymid)

The Quadtree : Range Queries To perform a range query, recursively traverse the parts of the quadtree intersecting a query region: In practice, this usually runs reasonably quickly. In theory, however, worst-case performance is quite bad… A B D C

The Quadtree : Range Queries Bad example of a quadtree query: Query essentially traverses the entire quadtree, but returns no points. Running time: O(n)

The kd-Tree First split (at root) is in the x direction, next level splits on y, and so on. In d > 2 dimensions, we cycle through splits along each dimension as we move down the tree. O(n log n) build time, O(n) space, O(log n) height Worst-case query time for d=2: O(k + √n) In general: O(k + n1-1/d).

The kd-Tree : Range Queries Same as with quadtrees: recursively visit all parts of the tree intersecting the query region R: RangeQuery(T, R): If T->boundingbox does not intersect R, return If T is a leaf, return the single point (if any) in this leaf if it is contained in R. Otherwise, recursively query the children of T. R

The kd-Tree : Nearest Neighbor Search Recursively traverse entire tree, always branching first in the direction that contains P (as if we were searching for P) Keep track of closest point found so far. Prune search if we ever find that our bounding box can’t contain a closer point than best so far. P

k-Nearest Neighbor Search Recursively traverse entire tree, always branching first in the direction that contains P (as if we were searching for P) Keep track of closest k points found so far. Prune search if we ever find that our bounding box can’t contain a closer point than best k so far. P

Applications of Quad-trees & kd-Trees: Nearest-Neighbor Classification Unclassified object

Related Topic: Mesh Refinement (i.e., Spatial Decomposition at Varying Resolution)

Applications of Quad-trees & kd-Trees: Speeding up Geometric Algorithms

Binary Space Partition Trees Another way to recursively decompose 2D / 3D space. Often used to pre-process a static scene, allowing fast back-to-front rendering from any vantage point.

The Range Tree (2D) Step 1: Sort all n points (x1, y1) … (xn, yn) by x coordinate and build a complete balanced binary tree “on top of” this ordering: Interior nodes augmented with x ranges of their subtrees. 1..15 1..6 7..15 Height = log2 n 1..3 4..6 7..10 12..15 (1, 7) (3, 2) (4, 9) (6, 1) (7, 0) (10, 4) (12, 5) (15, 6)

The Range Tree (2D) Recall that we can answer a range query in x with a collection of ≤ 2log2 n subtrees: Height = log2 n x range query

The Range Tree (2D) Step 2: Augment each internal node with an array of all points in its subtree, sorted by y: Total preprocessing time and space: O(n log n) (since each point appears in only log2 n arrays) (1, 7) (3, 2) (4, 9) (6, 1) (7, 0) (10, 4) (12, 5) (15, 6) 1..3 4..6 7..10 12..15 1..6 7..15 1..15

The Range Tree : Answering Queries To find all points in [x1, x2] x [y1, y2], first do a range query in the top-level tree based on x: At the root of each of the ≤ 2log2 n resulting subtrees, query augmented y array over the y range [y1, y2]. All points with x coordinate in [x1, x2]

The Range Tree : Query Performance Each 2D range query results in our performing 2log2 n individual 1D range queries, each with O(log n) overhead (for a binary search). Total query time: O(k + log2 n). In d dimensions: O(k + logd n). (in a d-dimensional range tree, our top level tree is sorted by 1st coordinate, and each internal node is augmented with a (d-1)-dimensional range tree built using the other coordinates).