Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Nearest Neighbor Search
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Multidimensional Indexing
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Searching on Multi-Dimensional Data
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Multidimensional Data
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Indexes. An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a fixed value for attribute.
BTrees & Bitmap Indexes
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Spatial Indexing I Point Access Methods.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
COMP 451/651 Multiple-key indexes
Primary Indexes Dense Indexes
B + -Trees and Trees for Multidimensional Data Jan. 2012Yangjun Chen ACS Database Index Techniques B + - tree kd – tree Quad - tree R – tree Bitmap.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
CS 255: Database System Principles slides: B-trees
Storage and Indexing February 26 th, 2003 Lecture 19.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Indexing Structures for Files by Pinar Senkul resources: mostly froom Elmasri, Navathe and.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Chapter 5. Multidimensional Indexes
Spatial Data Management
Multidimensional Access Structures
B+-Trees.
Spatial Indexing I Point Access Methods.
KD Tree A binary search tree where every node is a
CPSC-310 Database Systems
Multidimensional Indexes
Presentation transcript:

Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach by following key V and its pointer is an index into the set of points that have V for their first value in the first attribute and any value for the second attribute.

``Who buys gold jewelry'' (age and salary only). Raw data in age­salary pairs: (25; 60) (45; 60) (50; 75) (50; 100) (50; 120) (70; 110) (85; 140) (30; 260) (25; 400) (45; 350) (50; 275) (60; 260) Question: For what kinds of queries will a multiple­ key index (age first) significantly reduce the number of disk I/O's? Example The indexes can be organized as B-Trees.

Partial match queries If the first attribute is specified, then the access is quite efficient If the first attribute isn’t specified, then we have to search every sub- index. Range queries Quite well, provided the individual indexes themselves support range queries on their attribute (e.g. they are B-Trees) - Example. Range query is 35  age  55 AND 100  sal  200 NN queries Similar to range queries. Operations Also, the sub-indexes should be “primary” ones if we want to support efficiently range queries.

KD-Trees Generalizes binary search trees, but search attributes rotate among dimensions Levels rotate among the dimensions, partitioning the points by comparison with a value for that dimension. Leaves are blocks

Geometrically… Remember we didn’t want the stripes in grid files to continue all along the vertical or horizontal direction? Here they don’t.

Operations Lookup in KD­Trees Find appropriate leaf by binary search. Is the record there? Insert Into KD­Trees Lookup record to be inserted, reaching the appropriate leaf. If there is room, put record in that block. If not, find a suitable value for the appropriate dimension and split the leaf block. Example Someone 35 years old with a salary of $500K buys gold jewelry. Belongs in leaf with (25; 400) and (45; 350). Too full: split on age. See figure next.

Split at 35 is because it is the median.

Queries Partial match queries When we don’t know the value of the attribute at the node, we must explore both of its children. - E.g. find points with age=50 Range Queries Sometimes a range will allow us to move to only one child of a node. But if the range straddles the splitting value then we must explore both children.

KD-trees in secondary storage 1000 leaves  log 2 (1000) = 10 levels. If each internal node is stored in one block then too many disk I/O’s Solution: Group nodes into blocks

Quad trees Nodes split at all dimensions at once For a quad tree of k dimension, each interior node has 2 k children. Division fixed; tree can’t be balanced j k fg l d a b c e i h Age h b i a c de g f k j Sal l Age 25, Sal 300 Age 50, Sal 200 Age 75, Sal 100

Why quad trees? k-dimensions  node has 2 k children, e.g. k=7  128 children. We can pack all children of a node in 1 block

Quad­Tree Insert and Queries Insert Find leaf node in which new point belongs. If room, put it there. If not, make the leaf an interior node and give it leaves for each quadrant. Split the points among the new leaves. Problem: may make lots of null pointers, especially in high­dimensions. Quad­Tree Queries Single point queries: easy; just go down the tree to proper leaf. Range queries: varies by position of range. - Example: a range like 45<age<55; 180<salary<220 requires search of four leaves, none of which is guaranteed to produce any answers. - But if range covers a large subtree of the quad tree, then even if we have to search a large number of leaves, we know that everything we find is an answer. Nearest neighbor: Problems and strategies similar to grid files.

R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries. Generalizes B­tree to multidimensional case. - Problem: no ideal way to partition children without overlap. - In place of B­tree's key­pointer pairs, R­tree has region­ pointer pairs.

Lookup We start at the root, with which the entire region is associated. We examine the subregions at the root and determine which children correspond to interior regions that contain point P. If there are zero regions we are done; P is not in any data region. If there are some subregions we must recursively search those children as well, until we reach the leaves of the tree.

Insertion We start at the root and try to find some subregion into R fits. If more than one we pick just one, and repeat the process there. If there is no region, we expand, and we want to expand as little as possible. So, we pick the child that will be expanded as little as possible. Eventually we reach a leaf, where we insert the region R. However, if there is no room we have to split the leaf. We split the leaf in such a way as to have the smallest subregions.

Example Suppose that the leaves have room for six regions. Further suppose that the six regions are together on one leaf, whose region is represented by the outer solid rectangle. Now suppose that another region POP is added.

((0,0),(60,50))((20,20),(100,80)) Road1 Road2 House1School House2 Pipeline Pop Example (Cont’ ed)

Suppose now that House3 ((70,5),(80,15)) gets added. We do have space to the leaves, but we need to expand one of the regions at the parent. We choose to expand the one which needs to be expanded the least.

Which one should we expand? ((0,0),(80,50))((20,20),(100,80)) Road1 Road2 House1 House3School House2 Pipeline Pop ((0,0),(60,50))((5,20),(100,80)) Road1 Road2 House1School House2 Pipeline Pop House3

Bitmap Indexes Suppose we have n tuples. A bitmap index for a field F is a collection of bit vectors of length n, one for each possible value that may appear in the field F. The vector for value v has 1 in position i if the i-th record has v in field F, and it has 0 there if not. (30, foo) (30, bar) (40, baz) (50, foo) (40, bar) (30, baz) foo bar 0… baz …

Motivation for Bitmap Indexes They allow very fast evaluation of partial match queries. SELECT title FROM Movie WHERE studioName=‘Disney’ AND year=1995; If there are bitmap indexes on both studioName and year, we can intersect the vectors for the Disney value and 1995 value. We should have another index to retrieve the tuples by number.

Compressed Bitmaps i-1 in unary and followed by the binary representation. i=4 Encoding: 1110