Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science 600.416 Johns Hopkins University.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Multidimensional Indexing
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Access Methods for Advanced Database Applications.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Searching on Multi-Dimensional Data
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Multidimensional Data
Spatial Mining.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
1 One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
COMP 451/651 Multiple-key indexes
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Sensor Network Databases1 Overview: Chapter 6  Sensor Network Databases  Sensor networks are conceptually a distributed DB  Store collected data  Indexes.
Partitioning – A Uniform Model for Data Mining Anne Denton, Qin Ding, William Jockheck, Qiang Ding and William Perrizo.
Indexing for Multidimensional Data An Introduction.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Multi-dimensional Search Trees
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Page 1 MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services Shoji Nishimura (NEC Service Platforms Labs.), Sudipto Das,
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
File Organizations and Indexing
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Data Preprocessing: Data Reduction Techniques Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Chapter 5. Multidimensional Indexes
Storage and File Organization
Indexing Multidimensional Data
Spatial Data Management
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
Multidimensional Access Structures
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
Database Management Systems (CS 564)
Indexing and Hashing Basic Concepts Ordered Indices
Multidimensional Indexes
File Processing : Multi-dimensional Index
Presentation transcript:

Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University

1 and 2D Indexing Index structures we know so far are one dimensional –Event when indexing on multiple attributes –The attributes are either: ordered – a 1 dimensional binary relation hashed – placed into a 1-d hash space Not all data are 1 dimensional –Substructure within data items –Need to be looked up on several fields

Dimensionality in Std. DBs What is the dimensionality of data in the relational model? The arity of a relational, data is inherently multi- dimensional in DBs Do relational DBs support mutli-dimensional queries using only 1-d indices? Several techniques –Mutliple indices – can look up on different attributes, but still only one at a time –General queries – can conduct any query, this is the power of the relational model! What is the outstanding problem? Indices optimize queries. While there is support for multi-d data, the indices are not “tuned” for these queries

Overview of Techniques Multidimensional hash tables –Grid files –Partitioned hash functions Hierarchical indices –Multiple-key indices Multidimensional trees –Kd-trees –Quad trees –R-trees

Applications: Geographic Data Geographic information systems – map Circuit design – placment of components Queries –Partial match – match some dimensions, find all objects in others. Equality on some dimensions. –Range – find objects in ranges in dimensions –Nearest neighbor – find objects close the a point or specified object –Where-am-I queries – reverse mapping of a point to an object, e.g. mouse click to button

Applications: Data Cubes View all data as high-dimensional –Consider a sale day and time store item cost –Creates a 4-d grid Information in this grid can be clustered –Decision support –Data mining Look for trends in data –Example: determine what products sell in what stores and bind it to demographic/political/cultural data

Multdimensional Queries in SQL SQL support for a nearest-neighbor query –Relation POINTS { float x, float y } Find the nearest point to point (10.0, 20.0) SELECT * FROM POINTS p WHERE NOT EXISTS ( SELECT * FROM POINTS q WHERE (q.x-10.0)*(q.x-10.0) + (q.y-20.0)*(q.y-20.0) < (p.x- 10.0)*(p.x-10.0) + (p.y-20.0)*(p.y-20.0)

Multdimensional Queries in SQL SQL support for a point in rectangle query –Relation RECT { id, xll, yll, xur, yur } SELECT id FROM RECT WHERE xll = 10 AND yur >= 20.0

Grid Files Partition each dimension into ranges –Create a bucket (block) for each combination of dimensions –Buckets are in n-dimensions now Lookup – index in both dimension Insert – reverse lookup and insert –Complexities come if out of space Chain blocks in a grid bucket Reorganize grid lines/add new grid lines –Same skew problems as with range partitioning, just in mutliple dimensions now

Grid File Support for Queries Partial match: scoped to buckets in the specified dimensions Range: scoped to buckets in ranges Nearest neighbor: need to consider grid boundaries (draw), but scoped to feasible buckets Where-am-I: no, this represents data points not data objects

Partitioned Hashing For a series of hash attributes A1,A2,A3,…,An compute a function h=h1(A1),h2(A2),…,hn(An) Queries that only specify some dimensions, are scoped to suitable buckets If one specified all dimension except A2 and A4 with 3 bits per bucket Look in buckets 101XXX010XXX1100…..

Part. Hash. Query Support Partial match: scoped to buckets in the specified dimensions Range: useless Nearest neighbor: useless Where-am-I: useless Relation between Part. Hash and Grids is similar to that between range and hash partitioning –Skew and generality

Mutliple Key Indexes Tree like multi-dimensional structure Figure 5.11 Partial match: scoped when the higher dimensions are specified, otherwise bad news Range: very effective Nearest neighbor: reasonably efficient when built on top of a range query –E.g. find all neighbors less than distance d and compute their distance

Kd-trees B-Tree in which each level alternates attribute –Leaves occur when only a block’s worth of tuples are specified Figure 5.13 – specifies with block size of 2

Kd–trees Query Support Partial-match: only on specified attributes Range queries – when a range straddles a branch, must explore both sides –But this is what they are good for Nearest neighbor, same approach as muliple-key indexes Compared –Kd-trees might (depending on data) provide better scoping by alternating between dimensions –Gains are specious for increased complexity

Quad Trees Each interior node divides the tree into another dimension of square regions Figure 5.17

Quad Trees Query Support Partial-match: on all attributes Range queries – yes, but all overlapping quads Nearest neighbor – only in so far as range queries Has more in common with grid files Problems – with knowing domains a priori – skew in data leads to different dimensionality of regions and many empty regions

Region Trees (R-Trees) Partial-match: on all attributes, but complicated by overlap Range queries – yes, but complicated by overlap Nearest neighbor – only in so far as range queries Where am I – yes, can represent objects in R-tree regions Complexities –Managing shapes, limiting overlap, preserving containment property Overlap is required for containment property to server the where-am-I query

R-Trees Query Support Represent objects, not just points Good for spatial data Capture the spirit of B-Trees for multi-dimensional data –B-tree divides a line (1-d space) into intervals –R-tree divides a space (n-d space) into regions generally use simple shapes, like rectangles –Regions may overlap, but should do so minimally –Each object should be contained entirely within a single region Develop 5.20 and 5.21 –Add another house 5.22