2 More DimensionsThere are applications that require us to see data in two or more dimensions, e.g. Geographical Information SystemRoughly, every attribute of data can be seen as a dimension
3 Example QueriesPartial Match: looking for a set data items with specific values for every dimensionRange: looking for a set of data items within a specific range for every dimensionNearest-Neighbor: looking for the closets point to a given point, e.g. a city of over population closest to a given cityWhere-am-I: finding out where a specific point is located, e.g. locating mouse pointer on the screen
4 Using Conventional Indexes Suppose points are distributed randomly in a 2D space with x and y ranging from 0 to 1000.If we are looking for points with 450<x<550 and 450<y<550, i.e. an area of 100x100Using a B-Tree for x we find pointers having x within the rangeOne way is to retrieve all those points and verify their y value, in order to find the points at the intersection
5 Using Conventional Indexes Almost any data structure allows us to execute nearest-neighbor query by specifying a range in each dimension, but which point is closer?
6 Multidimensional Indexes Hash-Table like StructuresTree Like Structures
7 Multidimensional Indexes Hash-Table like StructuresGrid File, does not hash, partitions the dimensions by sorting the values along those dimensionsPartitioned Hashing, does hash the various dimensions, each dimension contributes to the bucket number
8 Grid File (Hash Table)Each of the regions can be thought of as a bucket of a hash tableEach point in that region has its record placed in a block belonging to that bucketFor example: the central rectangle represents data items with40 ≤ age < 55 and 90 ≤ salary < 225
9 Grid File Instead of one dimensional array of buckets Grid file uses an array with number of dimensions same as the data fileHashing is different from applying a hash functionThe positions of the data item in each of the dimension together determine the bucket
11 Grid FileInserting:If there is place in the block of the proper bucket, then we insertIf there is no placeAdd overflow blocks to the bucketReorganize the structure by adding or moving grid lines
12 Grid File Reorganizing the structure: Adding a grid line splits all the buckets along that lineIt may not be possible to select a new line that does the best for all bucketsThis may create for example too many empty buckets or leaving several very full buckets
13 Grid File Age = 51 Example: Inserting point (52, 200) Vertical Line age = 51 doesn’t help,Since it doesn’t split any other bucket,It only create 3 empty buckets
14 Partitioned Hashing Example: Three bits used for bucket number The left most bit is determined by first attributeThe two right most bits are determined by second attributeh(25) = 25 % 2 = 110 = 12h(60) = 60 % 4 = 010 = 02 = 002Therefore h(25,60) = 100h(45) = 45 % 2 = 110 = 12h(350) = 350 % 4 = 210 = 102Therefore h(45,350) = 110
15 Grid File <-> Partitioned Hash Partial Match Query -> Partitioned HashNearest Neighbor -> Grid FileRange Query -> Grid fileHowever with these methods we no longer have the advantage that the answer is in exactly one bucket, but still they limit our search to a subset of the buckets
16 Tree Like StructuresMultiple-key Indexes: a tree in which the nodes at each level are indexes for one attributekd-trees (k-dimensional search tree): a binary treeNote: in these structures we are going to lose the advantage of having balanced trees
17 Multiple-key Indexes Very efficient for partial match query Works quite well for range queries
18 kd-tree Index A binary tree Interior nodes have an attributes, a dividing value for that attribute, and pointer to left and right children.Leaves are blocks, with space for as many records as a block can hold.