Multidimensional Indexing

Slides:



Advertisements
Similar presentations
Multidimensional Index Structures One dimensional index structures assume a single search key, and retrieve records that match a given search-key value.
Advertisements

Nearest Neighbor Search
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Searching on Multi-Dimensional Data
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Multidimensional Data
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Hash Table indexing and Secondary Storage Hashing.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Spatial Indexing I Point Access Methods.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
COMP 451/651 Multiple-key indexes
Primary Indexes Dense Indexes
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Multi-dimensional Search Trees
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
CS422 Principles of Database Systems Indexes Chengyu Sun California State University, Los Angeles.
CS422 Principles of Database Systems Indexes
Spatial Data Management
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Multidimensional Access Structures
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
Database Management Systems (CS 564)
KD Tree A binary search tree where every node is a
Multidimensional Indexes
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Multidimensional Indexing

More Dimensions There are applications that require us to see data in two or more dimensions, e.g. Geographical Information System Roughly, every attribute of data can be seen as a dimension

Example Queries Partial Match: looking for a set data items with specific values for every dimension Range: looking for a set of data items within a specific range for every dimension Nearest-Neighbor: looking for the closets point to a given point, e.g. a city of over 100000 population closest to a given city Where-am-I: finding out where a specific point is located, e.g. locating mouse pointer on the screen

Using Conventional Indexes Suppose 1000000 points are distributed randomly in a 2D space with x and y ranging from 0 to 1000. If we are looking for points with 450<x<550 and 450<y<550, i.e. an area of 100x100 Using a B-Tree for x we find 100000 pointers having x within the range One way is to retrieve all those points and verify their y value, in order to find the 10000 points at the intersection

Using Conventional Indexes Almost any data structure allows us to execute nearest-neighbor query by specifying a range in each dimension, but which point is closer?

Multidimensional Indexes Hash-Table like Structures Tree Like Structures

Multidimensional Indexes Hash-Table like Structures Grid File, does not hash, partitions the dimensions by sorting the values along those dimensions Partitioned Hashing, does hash the various dimensions, each dimension contributes to the bucket number

Grid File (Hash Table) Each of the regions can be thought of as a bucket of a hash table Each point in that region has its record placed in a block belonging to that bucket For example: the central rectangle represents data items with 40 ≤ age < 55 and 90 ≤ salary < 225

Grid File Instead of one dimensional array of buckets Grid file uses an array with number of dimensions same as the data file Hashing is different from applying a hash function The positions of the data item in each of the dimension together determine the bucket

Grid File

Grid File Inserting: If there is place in the block of the proper bucket, then we insert If there is no place Add overflow blocks to the bucket Reorganize the structure by adding or moving grid lines

Grid File Reorganizing the structure: Adding a grid line splits all the buckets along that line It may not be possible to select a new line that does the best for all buckets This may create for example too many empty buckets or leaving several very full buckets

Grid File Age = 51 Example: Inserting point (52, 200) Vertical Line age = 51 doesn’t help, Since it doesn’t split any other bucket, It only create 3 empty buckets

Partitioned Hashing Example: Three bits used for bucket number The left most bit is determined by first attribute The two right most bits are determined by second attribute h(25) = 25 % 2 = 110 = 12 h(60) = 60 % 4 = 010 = 02 = 002 Therefore h(25,60) = 100 h(45) = 45 % 2 = 110 = 12 h(350) = 350 % 4 = 210 = 102 Therefore h(45,350) = 110

Grid File <-> Partitioned Hash Partial Match Query -> Partitioned Hash Nearest Neighbor -> Grid File Range Query -> Grid file However with these methods we no longer have the advantage that the answer is in exactly one bucket, but still they limit our search to a subset of the buckets

Tree Like Structures Multiple-key Indexes: a tree in which the nodes at each level are indexes for one attribute kd-trees (k-dimensional search tree): a binary tree Note: in these structures we are going to lose the advantage of having balanced trees

Multiple-key Indexes Very efficient for partial match query Works quite well for range queries

kd-tree Index A binary tree Interior nodes have an attributes, a dividing value for that attribute, and pointer to left and right children. Leaves are blocks, with space for as many records as a block can hold.

kd-tree Index

kd-tree Index Inserting data item (35,500) If there is no room in the proper block We split the leaf node and create a new internal node