Spatial Indexing I Point Access Methods.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Multidimensional Indexing
Access Methods for Advanced Database Applications.
Multidimensional Data
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Indexing for Multidimensional Data An Introduction.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Chapter 5 Record Storage and Primary File Organizations
Grid Files Multi-dimensional Index Structures. Jaruloj Chongstitvatana 2006Grid Files 2 Properties of Grid Files  Support multi-dimensional data, but.
Indexing Structures for Files
Spatial Data Management
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes: Introduction
Multidimensional Access Structures
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Extra: B+ Trees CS1: Java Programming Colorado State University
B+-Trees.
B+-Trees.
Hash-Based Indexes Chapter 11
Database Management Systems (CS 564)
The Quad tree The index is represented as a quaternary tree
Spatial Indexing I Point Access Methods
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
External Memory Hashing
Spatial Indexing I Point Access Methods
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
Multidimensional Indexes
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
Spatial Indexing I R-trees
File Processing : Index and Hash
Database Systems (資料庫系統)
Database Design and Programming
File Processing : Multi-dimensional Index
Index tuning Hash Index.
Hash-Based Indexes Chapter 11
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Spatial Indexing I Point Access Methods

Spatial Indexing Point Access Methods (PAMs) vs. Spatial Access Methods (SAMs) PAM: index only point data Hierarchical (tree-based) structures Multidimensional Hashing SAM: index both points and regions Transformations Overlapping regions

The problem Given a point set and a rectangular query, find the points enclosed in the query Query

Tree-based PAMs Most of tb-PAMs are based on kd-tree kd-tree is a main memory binary tree for indexing k- dimensional points Need to be adapted for disk model Levels rotate among the dimensions, partitioning the space based on a value for that dimension kd-tree is not necessarily balanced

KD-TREE At each level we use a different dimension X=5 x=5 C y=6 B y=3

Kd-tree properties Height of the tree O(log n) Search time for exact match: O(log n) Search time for range query: O(n1/2 + k)

kd-tree example X=5 X=3 X=7 y=6 y=5 Y=6 x=8 x=7 x=3 y=2 Y=2 X=5 X=8

KD-TREE NEAREST NEIGHBOR

External memory kd-trees Similar to B-tree, tree nodes split many ways instead of two ways insertion becomes quite complex and expensive. No storage utilization guarantee since when a higher level node splits, the split has to be propagated all the way to leaf level resulting in many empty blocks. Pack many interior nodes (forming a subtree) into a block. it may not be feasible to group nodes at lower level into a block productively.

Grid File Idea: Use a grid to partition the space each cell is associated with one page The grid file is a data access method that divides the address space along each dimension. The file is so named because the divisions occur in a grid-like fashion. The G-tree is a balanced index structure that divides the data space into a set of non-overlapping, rectangular regions.

Grid File The split is performed along a single dimension, and the dimension used alternates using the round-robin scheme. A region is formed by dividing another in half, the value used for splitting also does not have to be stored. G-tree has the advantage of requiring less storage. Hashing methods for multidimensional points (extension of Extendible hashing)

Grid File Select dividers along each dimension. Partition space into half. Unlike kd-tree dividers cut all the way. Each cell corresponds to 1 disk page. Many cells can point to the same page. Cell directory potentially exponential in the number of dimensions

Grid File Implementation Dynamic structure using a grid directory Grid array: a 2 dimensional array with pointers to buckets (this array can be large, disk resident) G(0,…, nx-1, 0, …, ny-1) Linear scales: Two 1 dimensional arrays that used to access the grid array (main memory) X(0, …, nx-1), Y(0, …, ny-1)

Example Buckets/Disk Blocks Grid Directory Linear scale Y Linear scale X

Grid File Search Exact Match Search: at most 2 I/Os assuming linear scales fit in memory. First use liner scales to determine the index into the cell directory access the cell directory to retrieve the bucket address (may cause 1 I/O if cell directory does not fit in memory) access the appropriate bucket (1 I/O) Range Queries: use linear scales to determine the index into the cell directory. Access the cell directory to retrieve the bucket addresses of buckets to visit. Access the buckets.

Grid File Insert Determine the bucket into which insertion must occur. If space in bucket, insert. Else, split bucket how to choose a good dimension to split? If bucket split causes a cell directory to split do so and adjust linear scales. insertion of these new entries potentially requires a complete reorganization of the cell directory--- expensive!!!

Grid File deletions Deletions may decrease the space utilization. Merge buckets We need to decide which cells to merge and a merging threshold Buddy system and neighbor system A bucket can merge with only one buddy in each dimension Merge adjacent regions if the result is a rectangle

LSD-TREE This Local Split Decision tree is so named because the criteria used for splitting is performed independently for each rectangular partition. The split is not restricted to any specific dimension or whether or not it must divide the data space in half. This means that we may split using any direction and any value we choose. Since any dimension can be used for partitioning at any time, the splitting information must be stored for each node. If the directory is large, we store a sub-tree on disk

DATA SPACE PARTITION FOR THE LSD-TREE

Example: LSD-tree

1 請畫出上方的KD tree 、 LSD tree 和Grid file