Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multimedia Database Systems
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Search Trees.
Balanced Search Trees. 2-3 Trees Trees Red-Black Trees AVL Trees.
COMP 451/651 Indexes Chapter 1.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing for NN retrieval
B+-tree and Hashing.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Spatial Indexing SAMs.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Spatial Queries Nearest Neighbor Queries.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Carnegie Mellon Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Joint work with Shimin Chen School of Computer Science Carnegie.
Indexing (cont.). Insertion in a B+ Tree Another B+ Tree
(B+-Trees, that is) Steve Wolfman 2014W1
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Making B+-Trees Cache Conscious in Main Memory
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
By: Sang K. Cha, Sangyong Hwang, Kihong Kim and Kunjoo Kwon
Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
CS 405G: Introduction to Database Systems 22 Index Chen Qian University of Kentucky.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
B+-Tree Deletion Underflow conditions B+ tree Deletion Algorithm
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Mehdi Kargar Department of Computer Science and Engineering
Spatial Indexing I Point Access Methods.
Spatio-Temporal Databases
Spatial Indexing I R-trees
Presentation transcript:

Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon, Jessica Wong

Outline 1. Abstraction 2. Introduction 3. Motivation 4. MBR Compression 5. CR-tree 6. Analysis 7. Conclusion

Abstraction CR-tree –Cash Conscious version of the R-tree Compress MBR key –Remove reading and trailing less significant bits CR-tree becomes wider and smaller –Faster searching, less memory consuming than ordinary R-tree

Introduction DB tables and indexes in Main Memory How to search faster in R-tree ??? – MBR key compression

Motivation 1.Memory Hierarchy -> Reduce Cache miss !!!

Motivation (Cont’d) 2. Basic Idea - Compression scheme

Motivation (Cont’d) 2. Basic Idea (Cont’d) Quantize into 16 levels or 4 bits by cutting off trailing insignificant bits the Result MBR -> QRMBR CR- tree use QRMBR as index key

Motivation (Cont’d) 2. Basic Idea (Cont’d) Power of Quantize Original coordinates (43166,27102),(43178,27190) After Quantize =>(8,11),(14,15) 8bytes of entry size become 2bytes =>each node can pack 4 times more entry!!!

Motivation (Cont’d) 2. Basic Idea (Cont’d) -Structure of CR-tree node

Motivation (Cont’d) 3. Problem Formulation (reduce index search time) c->node size, N node access -> # of node access T index search = c · N node access · (C key compare + C cache miss + C TLB miss / c) Index search time mostly depends on c · N node ccess

Motivation (Cont’d) 3. Problem Formulation (reduce index search time) How to minimize c · N node access - ?? Change node size c · N node access - become minimal Packing more entries into a fixed-size node Clustering index entries

MBR Compression Two Desirable properties of MBR Compression Overlap Check without Decompression Simplicity

RMBR ( Relative Representation of MBR) -Represent the coordinates of an MBR relatively to the left corner of its parents MBR -Cut off leading non-discriminating bits -Can save only 32 bits per MBR

QRMBR (Quantized Relative Representation of MBR) Cannot obtain a sufficient compression ratio form the RMBR Cut off trailing insignificant bits form an RMBR There’s overhead but it’s paid off by the significant savings in cache misses

QRMBR (Cont’d) Correctness If two MBRs overlap, the resulting QRMBR must overlap also Two non-overlapping MBRs may overlap

CR-tree Operations-Search Query rectangle need to change to QRMBR using MBR of each node as the reference MBR Compare Query QRMBR and object QRMBR whether they overlap

CR-tree Operations-Search MBR of R3 Entry of R4 Entry of R5 MBR of R0 Entry of R1 Entry of R2 Entry of R3 Node of R0 level Node of R3 level R4 R5 {(43166,27102),(43178,27112)}

CR-tree Operations-Insertion Choosing child node that needs the least enlargement to enclose the object MBR In Internal node, Object MBR is first transformed into the QRMBR using the reference MBR Enlargement calculated between a pair of QRMBR In leaf node, node MBR is first adjusted such that it encloses the object MBR Index entry for the object is created in the node QRMBR in the node are recalculated because their reference MBR has changed

CR-tree Operations-Insertion R4 R5 R6 MBR of R3 Entry of R4 Entry of R6 Node of R3 level Entry of R5 1.Change!!!2. Insert !!! 3. Recalculate QRMBR

CR-tree Operations- Insertion(Cont’d) Algorithm SplitNode.. Algorithm AdjustTree. Ascend from a leaf node L up to the root, adjusting MBRs of nodes and propagating node splits as necessary. When a node MBR has been adjusted, recalculate the QRMBRs in the node.

CR-tree Operations-Deletion Algorithm Delete. Remove index record E from a CR-tree Algorithm Condense Tree. -Eliminate the node if it has too few entries and relocate its entries. -Adjust all MBRs, making them smaller if possible. -Recalculate the QRMBR in the node.

CR-tree Operations-Bulkload Similar with other R-tree variants

Conclusion How to optimize cache behavior of indexes in main memory DB environment? QRMBR – Pack more entries in the node CR-tree based on QRMBR increases the fanout of the R-tree and decreases the index size for cache behavior 2.5 times search faster and 60% less memory use than R-tree

References Jun Rao, Kenneth A. Ross: Making B+-Trees Cache Conscious in Main Memory, ACM SIGMOD 2000 Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin: Fractal Prefetching B+trees: Optimizing Both Cache and Disk Performance. ACM SIGMOD 2002 Kihong Kim, Sang K. Cha, Keunjoo Kwon: Optimizing Multidimensional Index Trees for Main Memory Access. ACM SIGMOD 2001

Q&A