Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins – 2012-2013.

Slides:



Advertisements
Similar presentations
Multidimensional Index Structures One dimensional index structures assume a single search key, and retrieve records that match a given search-key value.
Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Multidimensional Indexing
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS144: Spatial Index. Example Dataset Grid File (2 points per bucket)
Searching on Multi-Dimensional Data
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Multidimensional Data
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Spatial indexing PAMs (II).
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Efficient Storage and Retrieval of Data
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Spatial Indexing I Point Access Methods.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
COMP 451/651 Multiple-key indexes
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
B + -Trees and Trees for Multidimensional Data Jan. 2012Yangjun Chen ACS Database Index Techniques B + - tree kd – tree Quad - tree R – tree Bitmap.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
Indexing and Hashing.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
METU Department of Computer Eng Ceng 302 Introduction to DBMS Indexing Structures for Files by Pinar Senkul resources: mostly froom Elmasri, Navathe and.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Multi-dimensional Search Trees
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
The Present. Outline Index structures for in-memory Quad trees kd trees Index structures for databases kdB trees Grid files II. Index Structure.
File Organizations and Indexing
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Chapter 5. Multidimensional Indexes
COMP3017 Advanced Databases
CPS216: Data-intensive Computing Systems
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Multidimensional Access Structures
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
Database Management Systems (CS 564)
Multidimensional Indexes
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
CPS216: Advanced Database Systems
Lecture 20: Indexes Monday, February 27, 2006.
Presentation transcript:

Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –

Overview Conventional indexes Hash-like –grid files, partitioned hashing Hierarchical indexes –multiple key, kd-trees, quad trees, r-trees, ub-trees Bitmap indexes

Multidimensional Access Structures Indexes discussed so far are one-dimensional –assume a single search key –require a single linear order for keys (B-trees) –require that the key be completely known for any lookup (hash tables)

Applications Geographic information systems –partial match queries –range queries –nearest-neighbour queries

Conventional Indexes

Scenario Personnel database EMPLOYEE table with attributes –dept –salary How can we find employees who work in the sales department and have salaries greater than £40,000?

1.Get all matching records using an index on one attribute 2.Check values of other attribute on those records Approach #1 I dept... scan for salary>40000 dept=sales

1.Use secondary indexes on each attribute to get two sets of record pointers 2.Take intersection of sets Approach #2 I dept... compare for intersection dept=sales I salary... salary>40000

1.Use secondary index on one attribute to select suitable index on other attribute 2.Get all matching records using selected index Approach #3 I dept dept=sales... I salary sales research production

For which queries is this index good? dept=sales  salary=40000 dept=sales  salary>40000 dept=sales salary = 40000

Grid Files

Partition multi-dimensional space with a grid Grid lines partition space into stripes Intersections of stripes from different dimensions define regions Grid File salary age k 40k 100k

Each region associated with a pointer to a bucket of record pointers Attribute values for record determine region and therefore bucket Fixed number of regions – overflow blocks used to increase bucket size as necessary Can index grid on value ranges Grid File salary age k 40k 100k

Grid files Pro –Good for multiple-key search –Supports partial-match, range and nearest-neighbour queries Con -Space, management overhead (nothing is free) -Need partitioning ranges that evenly split keys

Partitioned Hash

Hash function takes a list of attribute values as arguments Bits of hash value divided between attributes -Effectively, a hash function per attribute Partitioned Hash hash1hash2 attribute 1attribute 2

Example hash1(sales) = 0 hash1(research) = 1 hash2(10000) = 00 hash2(20000) = 01 hash2(40000)=10 hash2(100000)=11

Insertion hash1(sales) = 0 hash1(research) = 1 hash2(10000) = 00 hash2(20000) = 01 hash2(40000)=10 hash2(100000)=11 Fred works in sales Fred’s salary is £40,000

Retrieval hash1(sales) = 0 hash1(research) = 1 hash2(10000) = 00 hash2(20000) = 01 hash2(40000)=10 hash2(100000)=11 dept=sales  salary=40000

Retrieval hash1(sales) = 0 hash1(research) = 1 hash2(10000) = 00 hash2(20000) = 01 hash2(40000)=10 hash2(100000)=11 salary=20000

Retrieval hash1(sales) = 0 hash1(research) = 1 hash2(10000) = 00 hash2(20000) = 01 hash2(40000)=10 hash2(100000)=11 dept=sales

Partitioned hash Pro –Good hash function will evenly distribute records between buckets –Supports partial-match queries Con –No good for nearest-neighbour or range queries

kd-Tree

Multidimensional binary search tree Each node splits the k-dimensional space along a hyperplane Nodes contain –an attribute-value pair –a pair of pointers All nodes at the same level discriminate for the same attribute Levels rotate between attributes of all dimensions

Example, k=2 salary age k salary=45k 20,20k age=40 salary=55k age=70 25,80k 35,45k 40,35k 55,45k 70,20k50,55k 80,60k

Partial-Match Queries If we know value of attribute, we can choose which branch to explore If we don’t know value of attribute, must explore both branches

Adapting kd-Trees to Secondary Storage Average path length from root to leaf: log 2 n Disk accesses should be kept as few as possible Two approaches: 1.Multiway nodes (split values into n ranges) 2.Group nodes in blocks (node plus descendants to a given ply)

Quad-Tree

Quad-Trees Two main types: –Region quad-tree –Point quad-tree

Each partition divides the space into four equal sub-regions -ne, nw, se, sw Split regions if they contain more records than will fit into a block Operations similar to those for kd-trees Region Quad-tree salary age k 40k 100k

Region Quad-tree salary age k 25,80k 50,55k 50,50k 25,25k 55,45k 70,20k 80,60k nwswnese 20,20k35,45k 40,35k nwswnese

Partitions are not equal area -Split lines centred on data points -ne/nw/se/sw sub-regions Otherwise, equivalent to region quad-tree Point Quad-Tree salary age k

Point Quad-Tree salary age k salary 25,80k 50,55k 35,45k 55,45k 70,20k 50,55k 80,60k nwswnese 20,20k35,45k40,35k nwswnese

R-Tree

Used to represent data that consists of k-dimensional data regions Internal nodes of tree represent regions that contain data regions Regions typically defined as top- right, bottom-left coordinates R-Trees r1 r2 r3 d1 d2 d3 d4 d5 d6

R-Trees r1 r2 r3 d1 d2 d3 d4 d5 d6 root r1r2r3 d1 d2 d3 d4 d5 d6

UB-Tree

Basic approach: -Map n-dimensional space onto a 1-dimensional line using a fractal space-filling curve -Partition ranges and index using a B+tree -When querying, identify regions of n-d space (= segments of 1-d line) that intersect with query rectangle UB-Tree

Map domain of each attribute onto n-bit integer Order of points on Z-curve given by bit-interleaving the positions on the axes x = x 1 x 2 y = y 1 y 2 z-index = y 1 x 1 y 2 x 2 Z-Index

Z-curve partitioned into contiguous ranges (z-regions) -Note that these may not be contiguous regions in the multidimensional space Z-regions mapped to leaf nodes of a B+tree -A leaf node contain pointers to records whose attribute value locate them within the associated Z-region Z-Region Partition 0d.2 n -1 Z-index

Multidimensional range query can be considered as a k- dimensional rectangle Algorithm identifies z-regions that intersect with the query rectangle Querying UB-Trees

Bitmap Indexes

Bitmap indexes Collection of bit-vectors used to index an attribute –One bit-vector for each unique attribute value –One bit for each record Querying index involves combining bit-vectors with bitwise operators (&, |) –A 1 in the ith position indicates that record i is a match

Example An online homeware vendor sells products p1...p10 –Products p3 and p5 cost £100 –Product p1 costs £200 –Products p2, p7 and p10 cost £300 –Products p4, p6, p8 and p9 cost £400 –Products p1, p4, p5 and p9 are designed for lounges –Products p5 and p7 are designed for dining rooms –Products p3, p5, p6 and p10 are designed for kitchens

Example bitmap index p1p2p3p4p5p6p7p8p9p10 £ £ £ £ Lounge Dining Kitchen

Example bitmap index p1p2p3p4p5p6p7p8p9p10 £ £ £ £ Lounge Dining Kitchen price=£300  room=kitchen & = p10 is matching product

Compression Bit-vectors are typically sparse, with few 1 bits –Large amount of wasted space –Run-length encoding of bit-vectors to reduce stored size Bitwise operators must be applied to original bit-vectors –Can decode RLE bit-vectors one run at a time

Bitmap indexes Pro –Efficient answering of partial-match queries Con –Requires fixed record numbers –Changes to data file require changes to bitmap index