Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Access Methods for Advanced Database Applications.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
Spatial Indexing SAMs.
Multi-dimensional Indexes
Retrieving Data from Disk – Access Methods Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 19, 2005 Slides.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
 Spatial data requires special data structures, similar to B-trees.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing Spatial Data (Parts of Chapter 25+R-tree paper)
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Adapted from Mike Franklin
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Spatial and Geographic Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spatial and Geographic Databases. Spatial databases store information related to spatial locations, and support efficient storage, indexing and querying.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Spatio-Temporal Databases
Chapter 25: Advanced Data Types and New Applications
Spatio-Temporal Databases
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Spatial Indexing I R-trees
Spatial Data Management
Multidimensional Search Structures
Tree-Structured Indexes
Presentation transcript:

Spatial Data Management Chapter 28

Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value E.g., Feature vectors extracted from text Region Data –Objects have spatial extent with location and boundary DB typically uses geometric approximations constructed using line segments, polygons, etc., called vector data.

Types of Spatial Queries Spatial Range Queries –Find all cities within 50 miles of Ames –Query has associated region (location, boundary) –Answer includes overlapping or contained data regions Nearest-Neighbor Queries –Find the 10 cities nearest to Ames –Results must be ordered by proximity Spatial Join Queries –Find pairs of cities within 200 miles of each other –Expensive, join condition involves regions and proximity

Spatial Indexing Data items “spatially” close to each other should be clustered and stored “nearby’ for efficient retrieval –One-dimensional indexes don’t support multidimensional searching efficiently –Hash indexes only support point queries; we want to support range queries as well –Must support inserts and deletes gracefully

The R-Tree The R-tree is a tree-structured index that remains balanced on inserts and deletes. Each key stored in a leaf entry is intuitively a box, or collection of intervals, with one interval per dimension. Example in 2-D: X Y Root of R Tree Leaf level

R1R2 R3R4R5R6R7 R8R9R10R11R12R13R14R15 R16 R17R18R19 R8 R9 R10 R11 R12 R17 R18 R19 R13 R14 R15 R16 R1 R2 R3 R4 R5 R6 R7 Leaf entry Index entry Spatial object approximated by bounding box R8

R-Tree Properties Leaf entry = –This is Alternative (2), with key value being a box. –Box is the tightest bounding box for a data object. Non-leaf entry = –Box covers all boxes in child node (in fact, subtree). All leaves at same distance from root. Nodes can be kept 50% full (except root). –Can choose a parameter m that is <= 50%, and ensure that every node is at least m% full. The box at a parent entry contains all boxes in its children node R1R2 R3R4R5R6R7 R8R9R10R11R12R13R14R15 R16 R17R18R19

Search for Objects Overlapping Box Q Start at root. 1. If current node is non-leaf, for each entry, if box E overlaps Q, search subtree identified by ptr. 2. If current node is leaf, for each entry, if E overlaps Q, rid identifies an object that might overlap Q. Note: May have to search several subtrees at each node! (In contrast, a B-tree equality search goes to just one leaf.)

Insert Entry Start at root and go down to “best-fit” leaf L. –Go to child whose box needs least enlargement to cover B; resolve ties by going to smallest area child. If best-fit leaf L has space, insert entry and stop. Otherwise, split L into L1 and L2. –Adjust entry for L in its parent so that the box now covers (only) L1. –Add an entry (in the parent node of L) for L2. This could cause the parent node to recursively split.

Splitting a Node During Insertion The entries in node L plus the newly inserted entry must be distributed between L1 and L2. Goal is to reduce likelihood of both L1 and L2 being searched on subsequent queries. Idea: Redistribute so as to minimize area of L1 plus area of L2. Exhaustive algorithm is too slow; quadratic and linear heuristics are described in the paper. GOOD SPLIT! BAD!

R-Tree Variants The R*-tree uses the concept of forced reinserts to reduce overlap in tree nodes. When a node overflows, instead of splitting: –Remove some (say, 30% of the) entries and reinsert them into the tree. –Could result in all reinserted entries fitting on some existing pages, avoiding a split. R*-tree also uses a different heuristic, minimizing box perimeters rather than box areas during insertion. Another variant, the R+ tree, avoids overlap by inserting an object into multiple leaves if necessary. –Searches now take a single path to a leaf, at cost of redundancy.

Summary Spatial data management has many applications, including GIS, CAD/CAM, multimedia indexing. –Point and region data –Overlap/containment and nearest-neighbor queries Many approaches to indexing spatial data –R-tree approach is widely used in GIS systems –Other approaches include Grid Files, Quad trees, and techniques based on “space-filling” curves. –For high-dimensional datasets, unless data has good “contrast”, nearest-neighbor may not be well-separated