ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

Planar point location -- example
1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
2-dimensional indexing structure
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Chapter 3: Data Storage and Access Methods
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Database Management 9. course. Execution of queries.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:
1 The MV3R-Tree: A Spatio- Temporal Access Method for Timestamp and Interval Queries Yufei Tao and Dimitris Papadias Hong Kong University of Science and.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Efficient Complex Query Support For Multi-version XML Documents Shu-Yao Chien Dept. of CS UCLA Vassilis J. Tsotras Dept. of CS&E UC Riverside.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Session 1 Module 1: Introduction to Data Integrity
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Internal and External Sorting External Searching
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Advanced Database Aggregation Query Processing
CS522 Advanced database Systems
Multiway Search Trees Data may not fit into main memory
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Spatio-Temporal Databases
Joining Interval Data in Relational Databases
Lecture 2- Query Processing (continued)
Indexing 4/11/2019.
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University of California, Riverside Bernhard Seeger University of Marburg, Germany

ICDE 2002, San Jose, CA Contents 4 Problem definition: GTE-Join 4 Straightforward approaches 4 Temporal indexing 4 Proposed join algorithms 4 Performance study 4 Conclusions

ICDE 2002, San Jose, CA Problem Definition 4 Temporal record: (key, start, end, attributes) 4 TE-Join: two records qualify for join if  their time intervals intersect; and  their keys are equal.

ICDE 2002, San Jose, CA TE-Join: “find the locations and Managers of all departments over time”.

ICDE 2002, San Jose, CA Problem Definition 4 GTE-Join: general TE-Join – record keys should be in a certain range r and time intervals should intersect a given interval i.  temporal relations are large;  TE-Join is a special case, when r and i are (- , +  ). Interesting because:

ICDE 2002, San Jose, CA GTE-Join: “find the locations and managers of departments in range [D1, D2] during time [5, 10]”.

ICDE 2002, San Jose, CA Straightforward Solutions 4 Non-indexed join; 4 Unsynchronized join; 4 Synchronized join using B+-trees; 4 Synchronized join using R-trees.

ICDE 2002, San Jose, CA Straightforward Solutions 1. Non-indexed join: existing TE-Join research [Zur97] focuses on non-indexed join; not efficient for GTE-Join due to full scan. 2. Unsynchronized join: separate the selection and join phases; not efficient for: 4 storage of intermediate result; 4 selection in one relation ignores data distribution of the other relation.

ICDE 2002, San Jose, CA 3. Synchronized using B+-trees;  Not efficient: Straightforward Solutions  If cluster on start:  Cluster on end is similar.

ICDE 2002, San Jose, CA  records with keys in r are stored together and are sorted;  focus on these records in each relation and sort-merge join, while skipping those whose intervals not in i.  However, not efficient since records in the query rectangle are scattered. 3. Synchronized using B+-trees; Straightforward Solutions  If cluster on key:

ICDE 2002, San Jose, CA  Store each record as a two-dimensional interval in the R-tree;  Use existing R-tree join algorithms [BKS93, HJR97];  Modification: integrate the selection regarding query rectangle.  However, not efficient since R-trees do not handle long intervals well. 4. Synchronized using R-trees; Straightforward Solutions

ICDE 2002, San Jose, CA Our Solutions 4 Synchronized join using temporal indices. 4 Multi-version B+-tree (MVBT) [BGO+96]: asymptotically optimal space, update, query. 4 We propose: two categories of synchronized, MVBT-based join algorithms. (apply to other temporal indices as well)

ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

ICDE 2002, San Jose, CA

Review of MVBT 4 A “forest”: different trees may overlap; 4 Root nodes correspond to contiguous, non- intersecting time intervals; 4 A record may be stored in multiple pages; end time of all but the last copy is + . 4 Range-Interval selection algorithms [BS96]: avoid duplicate by reporting the first copy.

ICDE 2002, San Jose, CA The Incorrect End Time Problem Solution: report the rightmost copy! [BS96] reports first copy of x (whose end is +  ); would lead GTE-Join algorithms to join x with y.

ICDE 2002, San Jose, CA Top-down Approaches 4 Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT). 4 STT for two trees:  initially, join root nodes;  to join two nodes, join their children;  eventually, join elements in leaf pages. ? join condition?

ICDE 2002, San Jose, CA Balancing Condition Optimization (BCO) 4 To find, Page 3 and page 0 has to join; 4 BCO: balancing two conditions. (1) only intersecting pages join; (2) examine records even if not last copy. E.g. join when joining page 2 with page 0. 4 In general, join two pages even though they do not intersect. Inefficient!

ICDE 2002, San Jose, CA Virtual Height Optimization (VHO) 4 At the middle level, STT joins:,,,,, A1’ 4 With VHO:,

ICDE 2002, San Jose, CA Sideways Approach 1: Link-based 4 In each leaf page, store a pointer to its predecessor; D  find pairs of data pages that intersect with the right border of the query rectangle and with each other;  keep such pairs in priority queue;  sweep left synchronously. 4 For GTE-Join:

ICDE 2002, San Jose, CA Sideways Approach 1: Link-based 4 In each leaf page, store a pointer to its predecessor; D 4 special techniques to avoid duplicates.  find pairs of data pages that intersect with the right border of the query rectangle and with each other;  keep such pairs in priority queue;  sweep left synchronously. 4 For GTE-Join:

ICDE 2002, San Jose, CA Sideways Approach 2: Plane Sweep 4 Similar to link-based; 4 Maintain two priority queues, one for each MVBT; 4 At each step, access the leaf page with the largest end time and add records to buffer; 4 To add records to buffer, join with existing records from the other MVBT; 4 Throw away useless records.

ICDE 2002, San Jose, CA Performance Study Notation:Meaning: mvbt_dfSynchronized MVBT, depth-first mvbt_bfSynchronized MVBT, breadth-first mvbt_linkSynchronized MVBT, link-based mvbt_psSynchronized MVBT, plane-sweep mvbt_smUnsynchronized, sort-merge after selection b+Synchronized B+-tree, index on key r*_dfSynchronized R*-tree, depth-first r*_bfSynchronized R*-tree, breadth-first

ICDE 2002, San Jose, CA Experimental Setup Implemented in GNU C++; Sun Enterprise 250 Server machine with two UltraSPARC-II processors using Solaris 2.8; Page size = 8KB; Buffer size = 10MB; LRU buffer; Each data set: 10 million records; QRS: size ratio between the query rectangle and the whole space. Long intervals: 1/100 of time space; Short intervals: 1/10,000 of time space.

ICDE 2002, San Jose, CA GTE-Join Performance Joining mainly long intervals.

ICDE 2002, San Jose, CA GTE-Join Performance Joining mainly short intervals.

ICDE 2002, San Jose, CA GTE-Join Performance Varying QRS. (Log Scale)

ICDE 2002, San Jose, CA Conclusions 4 We addressed the GTE-Join; 4 Unsynchronized approach not efficient; 4 Synchronized approaches based on traditional indices (B+-tree, R-tree) also not efficient; 4 We proposed synchronized approaches based on temporal indices (MVBT); 4 We also proposed BCO and VHO optimizations; 4 Experiments: link-based is the best.

ICDE 2002, San Jose, CA