2More DimensionsThere are applications that require us to see data in two or more dimensions, e.g. Geographical Information SystemRoughly, every attribute of data can be seen as a dimension
3Example QueriesPartial Match: looking for a set data items with specific values for every dimensionRange: looking for a set of data items within a specific range for every dimensionNearest-Neighbor: looking for the closets point to a given point, e.g. a city of over population closest to a given cityWhere-am-I: finding out where a specific point is located, e.g. locating mouse pointer on the screen
4Using Conventional Indexes Suppose points are distributed randomly in a 2D space with x and y ranging from 0 to 1000.If we are looking for points with 450<x<550 and 450<y<550, i.e. an area of 100x100Using a B-Tree for x we find pointers having x within the rangeOne way is to retrieve all those points and verify their y value, in order to find the points at the intersection
5Using Conventional Indexes Almost any data structure allows us to execute nearest-neighbor query by specifying a range in each dimension, but which point is closer?
6Multidimensional Indexes Hash-Table like StructuresTree Like Structures
7Multidimensional Indexes Hash-Table like StructuresGrid File, does not hash, partitions the dimensions by sorting the values along those dimensionsPartitioned Hashing, does hash the various dimensions, each dimension contributes to the bucket number
8Grid File (Hash Table)Each of the regions can be thought of as a bucket of a hash tableEach point in that region has its record placed in a block belonging to that bucketFor example: the central rectangle represents data items with40 ≤ age < 55 and 90 ≤ salary < 225
9Grid File Instead of one dimensional array of buckets Grid file uses an array with number of dimensions same as the data fileHashing is different from applying a hash functionThe positions of the data item in each of the dimension together determine the bucket
11Grid FileInserting:If there is place in the block of the proper bucket, then we insertIf there is no placeAdd overflow blocks to the bucketReorganize the structure by adding or moving grid lines
12Grid File Reorganizing the structure: Adding a grid line splits all the buckets along that lineIt may not be possible to select a new line that does the best for all bucketsThis may create for example too many empty buckets or leaving several very full buckets
13Grid File Age = 51 Example: Inserting point (52, 200) Vertical Line age = 51 doesn’t help,Since it doesn’t split any other bucket,It only create 3 empty buckets
14Partitioned Hashing Example: Three bits used for bucket number The left most bit is determined by first attributeThe two right most bits are determined by second attributeh(25) = 25 % 2 = 110 = 12h(60) = 60 % 4 = 010 = 02 = 002Therefore h(25,60) = 100h(45) = 45 % 2 = 110 = 12h(350) = 350 % 4 = 210 = 102Therefore h(45,350) = 110
15Grid File <-> Partitioned Hash Partial Match Query -> Partitioned HashNearest Neighbor -> Grid FileRange Query -> Grid fileHowever with these methods we no longer have the advantage that the answer is in exactly one bucket, but still they limit our search to a subset of the buckets
16Tree Like StructuresMultiple-key Indexes: a tree in which the nodes at each level are indexes for one attributekd-trees (k-dimensional search tree): a binary treeNote: in these structures we are going to lose the advantage of having balanced trees
17Multiple-key Indexes Very efficient for partial match query Works quite well for range queries
18kd-tree Index A binary tree Interior nodes have an attributes, a dividing value for that attribute, and pointer to left and right children.Leaves are blocks, with space for as many records as a block can hold.