Download presentation

Presentation is loading. Please wait.

Published byTimothy Creech Modified about 1 year ago

1
Multidimensional Indexing

2
More Dimensions There are applications that require us to see data in two or more dimensions, e.g. Geographical Information System Roughly, every attribute of data can be seen as a dimension

3
Example Queries Partial Match: looking for a set data items with specific values for every dimension Range: looking for a set of data items within a specific range for every dimension Nearest-Neighbor: looking for the closets point to a given point, e.g. a city of over population closest to a given city Where-am-I: finding out where a specific point is located, e.g. locating mouse pointer on the screen

4
Using Conventional Indexes Suppose points are distributed randomly in a 2D space with x and y ranging from 0 to If we are looking for points with 450

5
Using Conventional Indexes Almost any data structure allows us to execute nearest-neighbor query by specifying a range in each dimension, but which point is closer?

6
Multidimensional Indexes Hash-Table like Structures Tree Like Structures

7
Multidimensional Indexes Hash-Table like Structures – Grid File, does not hash, partitions the dimensions by sorting the values along those dimensions – Partitioned Hashing, does hash the various dimensions, each dimension contributes to the bucket number

8
Grid File (Hash Table) Each of the regions can be thought of as a bucket of a hash table Each point in that region has its record placed in a block belonging to that bucket For example: the central rectangle represents data items with 40 ≤ age < 55 and 90 ≤ salary < 225

9
Grid File Instead of one dimensional array of buckets Grid file uses an array with number of dimensions same as the data file Hashing is different from applying a hash function The positions of the data item in each of the dimension together determine the bucket

10
Grid File

11
Inserting: – If there is place in the block of the proper bucket, then we insert – If there is no place Add overflow blocks to the bucket Reorganize the structure by adding or moving grid lines

12
Grid File Reorganizing the structure: – Adding a grid line splits all the buckets along that line – It may not be possible to select a new line that does the best for all buckets – This may create for example too many empty buckets or leaving several very full buckets

13
Grid File Age = 51 Example: Inserting point (52, 200) Vertical Line age = 51 doesn’t help, Since it doesn’t split any other bucket, It only create 3 empty buckets

14
Partitioned Hashing Example: 1.Three bits used for bucket number 2.The left most bit is determined by first attribute 3.The two right most bits are determined by second attribute 4.h(25) = 25 % 2 = 1 10 = h(60) = 60 % 4 = 0 10 = 0 2 = Therefore h(25,60) = h(45) = 45 % 2 = 1 10 = h(350) = 350 % 4 = 2 10 = Therefore h(45,350) = 110

15
Grid File Partitioned Hash Partial Match Query -> Partitioned Hash Nearest Neighbor -> Grid File Range Query -> Grid file However with these methods we no longer have the advantage that the answer is in exactly one bucket, but still they limit our search to a subset of the buckets

16
Tree Like Structures Multiple-key Indexes: a tree in which the nodes at each level are indexes for one attribute kd-trees (k-dimensional search tree): a binary tree Note: in these structures we are going to lose the advantage of having balanced trees

17
Multiple-key Indexes Very efficient for partial match query Works quite well for range queries

18
kd-tree Index A binary tree Interior nodes have an attributes, a dividing value for that attribute, and pointer to left and right children. Leaves are blocks, with space for as many records as a block can hold.

19
kd-tree Index

20
Inserting data item (35,500) If there is no room in the proper block We split the leaf node and create a new internal node

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google