# Multidimensional Indexing

## Presentation on theme: "Multidimensional Indexing"— Presentation transcript:

Multidimensional Indexing

More Dimensions There are applications that require us to see data in two or more dimensions, e.g. Geographical Information System Roughly, every attribute of data can be seen as a dimension

Example Queries Partial Match: looking for a set data items with specific values for every dimension Range: looking for a set of data items within a specific range for every dimension Nearest-Neighbor: looking for the closets point to a given point, e.g. a city of over population closest to a given city Where-am-I: finding out where a specific point is located, e.g. locating mouse pointer on the screen

Using Conventional Indexes
Suppose points are distributed randomly in a 2D space with x and y ranging from 0 to 1000. If we are looking for points with 450<x<550 and 450<y<550, i.e. an area of 100x100 Using a B-Tree for x we find pointers having x within the range One way is to retrieve all those points and verify their y value, in order to find the points at the intersection

Using Conventional Indexes
Almost any data structure allows us to execute nearest-neighbor query by specifying a range in each dimension, but which point is closer?

Multidimensional Indexes
Hash-Table like Structures Tree Like Structures

Multidimensional Indexes
Hash-Table like Structures Grid File, does not hash, partitions the dimensions by sorting the values along those dimensions Partitioned Hashing, does hash the various dimensions, each dimension contributes to the bucket number

Grid File (Hash Table) Each of the regions can be thought of as a bucket of a hash table Each point in that region has its record placed in a block belonging to that bucket For example: the central rectangle represents data items with 40 ≤ age < 55 and 90 ≤ salary < 225

Grid File Instead of one dimensional array of buckets
Grid file uses an array with number of dimensions same as the data file Hashing is different from applying a hash function The positions of the data item in each of the dimension together determine the bucket

Grid File

Grid File Inserting: If there is place in the block of the proper bucket, then we insert If there is no place Add overflow blocks to the bucket Reorganize the structure by adding or moving grid lines

Grid File Reorganizing the structure:
Adding a grid line splits all the buckets along that line It may not be possible to select a new line that does the best for all buckets This may create for example too many empty buckets or leaving several very full buckets

Grid File Age = 51 Example: Inserting point (52, 200)
Vertical Line age = 51 doesn’t help, Since it doesn’t split any other bucket, It only create 3 empty buckets

Partitioned Hashing Example: Three bits used for bucket number
The left most bit is determined by first attribute The two right most bits are determined by second attribute h(25) = 25 % 2 = 110 = 12 h(60) = 60 % 4 = 010 = 02 = 002 Therefore h(25,60) = 100 h(45) = 45 % 2 = 110 = 12 h(350) = 350 % 4 = 210 = 102 Therefore h(45,350) = 110

Grid File <-> Partitioned Hash
Partial Match Query -> Partitioned Hash Nearest Neighbor -> Grid File Range Query -> Grid file However with these methods we no longer have the advantage that the answer is in exactly one bucket, but still they limit our search to a subset of the buckets

Tree Like Structures Multiple-key Indexes: a tree in which the nodes at each level are indexes for one attribute kd-trees (k-dimensional search tree): a binary tree Note: in these structures we are going to lose the advantage of having balanced trees

Multiple-key Indexes Very efficient for partial match query
Works quite well for range queries

kd-tree Index A binary tree
Interior nodes have an attributes, a dividing value for that attribute, and pointer to left and right children. Leaves are blocks, with space for as many records as a block can hold.

kd-tree Index

kd-tree Index Inserting data item (35,500)
If there is no room in the proper block We split the leaf node and create a new internal node