Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of Computing Clemson University Fall, 2012

Similar presentations


Presentation on theme: "School of Computing Clemson University Fall, 2012"— Presentation transcript:

1 School of Computing Clemson University Fall, 2012
Lecture 12. Range Queries and Multi-Dimensional Search Structures CpSc 212: Algorithms and Data Structures Brian C. Dean School of Computing Clemson University Fall, 2012

2 Range Queries : Examples
In a dictionary: “Tell me all elements with keys in the range [a, b].” “How many elements are there with keys in the range [a, b]?” “What is the min / max / sum of all elements in the range [a, b]?” In a sequence A1…An: “What is the min / max / sum of all elements in Ai…Aj?” “What are the k largest values in the range Ai…Aj?” In more than one dimension: “Tell me all the points in this region.” “Tell me some aggregate statistic about all points in this region (e.g., count, min, max, sum, etc.).” Age Household income

3 Range Updates : Examples
In a dictionary: “Delete all elements in the range [a, b].” “Apply some operation to all elements in the range [a, b].” In a sequence A1…An: “Delete all elements in the range Ai…Aj.” “Increase all elements in Ai…Aj by a common value v.” In more than one dimension: “Apply some operation to all points in this region (e.g., delete, change some attribute by a common value).” Age Household income

4 Finding all Elements in [a, b]
in a Dictionary First find a (or the successor of a, if a is not present). Then call successor repeatedly until we’ve stepped through all elements in [a, b]. Total time: O(k + log n) on a balanced BST, where k is the number of elements written as output. This is called an “output-sensitive” running time, and we’ll see such running times often in the study of data structures. LCA(a,b) = “lowest common ancestor” of a and b. a b = in the range [a, b]. (or succ(a)) (or pred(b))

5 Computing Aggregate Statistics
Over a Range We can count or find the min/max/sum of elements in a range in O(log n) time on a balanced BST. This works for a dictionary or a sequence encoded within a BST. On a sequence, we can use this to implement the operations range-sum(i, j), range-min(i, j), and range-max(i, j). Aggregate all node information at yellow nodes and augmented subtree information at red nodes: LCA(a,b) (possibly succ(a), or alternatively select(i) if encoding a sequence) a (possibly pred(b), or alternatively select(j) if encoding a sequence) b

6 Range Queries in Splay Trees
Range queries (and updates) are particularly nice on splay trees. Given a range query over [a, b] in a dictionary: Splay(b) Splay(a), making sure we perform a single rotation at the root. This effectively isolates all the elements in (a, b) in a single subtree! And this of course works for a sequence too… a (possibly pred(b), or alternatively select(j) if encoding a sequence) (possibly succ(a), or alternatively select(i) if encoding a sequence) b (a, b)

7 Static Range Queries in Dictionaries
Input: Set of n numbers (points in 1 dimension). Common Problems: 1. Tell me all the points in the range [a, b]. 2. Count the number of points in the range [a, b]. What is the best data structure for (1) and (2) in the static case?

8 Static Range Queries in Dictionaries
Input: Set of n numbers (points in 1 dimension). Common Problems: 1. Tell me all the points in the range [a, b]. 2. Count the number of points in the range [a, b]. What is the best data structure for (1) and (2) in the static case? A sorted array! This solves (1) in O(k + log n) time and (2) in O(log n) time.

9 Databases Many computing professionals work with large databases.
Structured Query Language (SQL) is a common way to interact with databases. For example: SELECT title, author FROM books_in_library WHERE price <= 100 AND publication date >= 1990 AND page_count BETWEEN 500 AND 750; Range queries like this can be sped up by telling the database to build indexes (usually a B-trees) on particular fields (e.g., page_count, price).

10 Multidimensional Range Queries
We can think of the records in a database as points in a high-dimensional space: Example of a multidimensional range query: “Tell me all records with age in the range [18, 24] and household income in the range [$50,000, $80,000]”. Today, we’ll focus on static multidimensional range queries (usually 2D); no range updates… Age Household income

11 The Quadtree Root node splits plane into 4 quadrants at some point (usually (xmid, ymid), although random is usually fine). Divide until ≤ 1 point in region. Preprocessing time: O(n log n) Space: O(n) Height: O(log n) Generalizes naturally to d = 3 (octrees) and higher dimensions. A 1 2 B D C A C D (xmid, ymid)

12 The Quadtree : Range Queries
To perform a range query, recursively traverse the parts of the quadtree intersecting a query region: In practice, this usually runs reasonably quickly. In theory, however, worst-case performance is quite bad… A B D C

13 The Quadtree : Range Queries
Bad example of a quadtree query: Query essentially traverses the entire quadtree, but returns no points. Running time: O(n)

14 The kd-Tree First split (at root) is in the x direction, next level splits on y, and so on. In d > 2 dimensions, we cycle through splits along each dimension as we move down the tree. O(n log n) build time, O(n) space, O(log n) height Worst-case query time for d=2: O(k + √n) In general: O(k + n1-1/d).

15 The kd-Tree : Range Queries
Same as with quadtrees: recursively visit all parts of the tree intersecting the query region R: RangeQuery(T, R): If T->boundingbox does not intersect R, return If T is a leaf, return the single point (if any) in this leaf if it is contained in R. Otherwise, recursively query the children of T. R

16 The kd-Tree : Nearest Neighbor Search
Recursively traverse entire tree, always branching first in the direction that contains P (as if we were searching for P) Keep track of closest point found so far. Prune search if we ever find that our bounding box can’t contain a closer point than best so far. P

17 k-Nearest Neighbor Search
Recursively traverse entire tree, always branching first in the direction that contains P (as if we were searching for P) Keep track of closest k points found so far. Prune search if we ever find that our bounding box can’t contain a closer point than best k so far. P

18 Applications of Quad-trees & kd-Trees: Nearest-Neighbor Classification
Unclassified object

19 Related Topic: Mesh Refinement
(i.e., Spatial Decomposition at Varying Resolution)

20 Applications of Quad-trees & kd-Trees:
Speeding up Geometric Algorithms

21 Binary Space Partition Trees
Another way to recursively decompose 2D / 3D space. Often used to pre-process a static scene, allowing fast back-to-front rendering from any vantage point.

22 The Range Tree (2D) Step 1: Sort all n points (x1, y1) … (xn, yn) by x coordinate and build a complete balanced binary tree “on top of” this ordering: Interior nodes augmented with x ranges of their subtrees. 1..15 1..6 7..15 Height = log2 n 1..3 4..6 7..10 12..15 (1, 7) (3, 2) (4, 9) (6, 1) (7, 0) (10, 4) (12, 5) (15, 6)

23 The Range Tree (2D) Recall that we can answer a range query in x with a collection of ≤ 2log2 n subtrees: Height = log2 n x range query

24 The Range Tree (2D) Step 2: Augment each internal node with an array of all points in its subtree, sorted by y: Total preprocessing time and space: O(n log n) (since each point appears in only log2 n arrays) (1, 7) (3, 2) (4, 9) (6, 1) (7, 0) (10, 4) (12, 5) (15, 6) 1..3 4..6 7..10 12..15 1..6 7..15 1..15

25 The Range Tree : Answering Queries
To find all points in [x1, x2] x [y1, y2], first do a range query in the top-level tree based on x: At the root of each of the ≤ 2log2 n resulting subtrees, query augmented y array over the y range [y1, y2]. All points with x coordinate in [x1, x2]

26 The Range Tree : Query Performance
Each 2D range query results in our performing 2log2 n individual 1D range queries, each with O(log n) overhead (for a binary search). Total query time: O(k + log2 n). In d dimensions: O(k + logd n). (in a d-dimensional range tree, our top level tree is sorted by 1st coordinate, and each internal node is augmented with a (d-1)-dimensional range tree built using the other coordinates).


Download ppt "School of Computing Clemson University Fall, 2012"

Similar presentations


Ads by Google