TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
On Spatial-Range Closest Pair Query Jing Shan, Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University.
Multidimensional Indexing
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric.
Rectilinear Pattern Recognition Dan J. Nardi Masters Thesis April 11, 2003.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
2-dimensional indexing structure
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
Temporal Indexing Snapshot Index. Transaction Time Environment Assume that when an event occurs in the real world it is inserted in the DB A timestamp.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatio-Temporal Databases
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Spatial Queries Nearest Neighbor and Join Queries.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
SQL Server Indexes Indexes. Overview Indexes are used to help speed search results in a database. A careful use of indexes can greatly improve search.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
1 The MV3R-Tree: A Spatio- Temporal Access Method for Timestamp and Interval Queries Yufei Tao and Dimitris Papadias Hong Kong University of Science and.
Efficient Complex Query Support For Multi-version XML Documents Shu-Yao Chien Dept. of CS UCLA Vassilis J. Tsotras Dept. of CS&E UC Riverside.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Session 1 Module 1: Introduction to Data Integrity
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Internal and External Sorting External Searching
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Advanced Database Aggregation Query Processing
Spatio-Temporal Databases
Module 11: File Structure
CS522 Advanced database Systems
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Spatio-Temporal Databases
Joining Interval Data in Relational Databases
Lecture 2- Query Processing (continued)
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside

TIME 2002, Manchester, UK Contents 4 Background 4 Join problem definition 4 Straightforward approaches 4 Proposed join algorithms 4 Performance study 4 Conclusions

TIME 2002, Manchester, UK Background 4 Temporal record: (key, time interval) and some attributes. 4 TE-Join: two records qualify for join if  their time intervals intersect; and  their keys are equal.

TIME 2002, Manchester, UK Background 4 Our earlier work [ICDE02] solved a general TE-Join (GTE-Join), where portions from each relation are joined:  the portion is selected via a range-interval selection: record keys should be in range r and time intervals should intersect interval i.  interesting because (1) temporal relations are large; (2) TE-Join is a special case, when r and i are (- , +  ).

TIME 2002, Manchester, UK Problem Definition 4 Semi-restrictive joins: records join if their keys are equal (GE-Join), or their intervals intersect (GT-join), but not both. 4 GE-Join: select a subset from X, a subset from Y, and join records from the subsets if their keys are equal. 4 GT-Join: select a subset from X, a subset from Y, and join records from the subsets if their intervals intersect.

TIME 2002, Manchester, UK Problem Definition 4 GT-Join example: find employees whose last names start with ‘B’ and who co-worked during 1995 with the employees whose last names start with ‘S’. 4 GE-Join example: find the 1998 IBM employees who were UC Riverside students in 1995.

TIME 2002, Manchester, UK GT-Join Solutions...

TIME 2002, Manchester, UK Straightforward Solutions for GT-Join 1. Unsynchronized join. 2. Synchronized join using B+-trees. 3. Synchronized join using R-trees.

TIME 2002, Manchester, UK 1. Unsynchronized join: separate the selection and join phases; not efficient because: 4 storing the intermediate result can be large; 4 selection in one relation ignores data distribution of the other relation. Straightforward Solutions for GT-Join

TIME 2002, Manchester, UK 2. Synchronized using B+-trees.  Not efficient: y needs to be checked against every record whose start is before end of y.  If cluster on start:  Cluster on end is similar. Straightforward Solutions for GT-Join

TIME 2002, Manchester, UK  Store each record as a two-dimensional interval in the R-tree;  Use existing R-tree join algorithms [BKS93, HJR97];  Modifications: (1) integrate the selection condition; (2) join index records as long as they intersect in time dimension and ignore key dimension.  However, not efficient since R-trees do not handle long intervals well. 3. Synchronized using R-trees. Straightforward Solutions for GT-Join

TIME 2002, Manchester, UK Our Solutions 4 Synchronized join using temporal indices. 4 Multi-version B+-tree (MVBT) [BGO+96]: asymptotically optimal space, update, query. 4 We propose three synchronized, MVBT- based join algorithms. (apply to other temporal indices as well)

TIME 2002, Manchester, UK Review of MVBT 4 A “forest” of trees: different trees may overlap. 4 Root nodes correspond to contiguous, non- intersecting time intervals. 4 A record may be stored in multiple pages. 4 Efficient range-interval selection algorithms.

TIME 2002, Manchester, UK Top-down GT-Join 4 Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT). 4 STT for two trees: 4 Note that special care is needed to avoid duplicates, since a record has multiple copies.  initially, join root nodes;  to join two nodes, join their children;  eventually, join elements in leaf pages.

TIME 2002, Manchester, UK Link-based GT-Join 4 In each leaf page, store a pointer to its predecessor. D  find pairs of data pages that (1) intersect with the right border of the query rectangle; and (2) intersect with each other in time dimension;  keep such pairs in priority queue;  sweep left synchronously. 4 For GT-Join:

TIME 2002, Manchester, UK Plane Sweep GT-Join 4 Similar to link-based. 4 Maintain two priority queues, one for each MVBT. 4 At each step, access the leaf page with the largest end time and add records to buffer. 4 To add records to buffer, join with existing records from the other MVBT. 4 Throw away useless records.

TIME 2002, Manchester, UK GE-Join Solutions...

TIME 2002, Manchester, UK GE-Join Solutions... Similarly, we have: 4 unsynchronized 4 synchronized using B+-trees 4 synchronized using R-trees 4 top-down using MVBT 4 link-based using MVBT Note: some of them, especially the link-based algorithm, are quite different due to different join condition.

TIME 2002, Manchester, UK Implemented Algorithms Notation:Meaning: mvbt_dfSynchronized MVBT, depth-first mvbt_bfSynchronized MVBT, breadth-first mvbt_linkSynchronized MVBT, link-based r*_dfSynchronized R*-tree, depth-first r*_bfSynchronized R*-tree, breadth-first Common to both GT-Join and GE-Join:

TIME 2002, Manchester, UK Implemented Algorithms mvbt_psSynchronized MVBT, plane-sweep spjspatially partitioned join [LOT94] b+Synchronized B+-tree, index on key mvbt_smUnsynchronized, sort-merge after selection Specific to GE-Join: Specific to GT-Join:

TIME 2002, Manchester, UK Experimental Setup Implemented in GNU C++. Sun Enterprise 250 Server machine with two UltraSPARC-II processors using Solaris 2.8. Page size = 8KB. Buffer size = 10MB; LRU buffer. Each data set: 10 million records. R/I ratio: length of query key range divided by length of query time interval. It describes the shape of query rectangle.

TIME 2002, Manchester, UK GT-Join Performance R/I ratio = 10.

TIME 2002, Manchester, UK GT-Join Performance R/I ratio = 0.1.

TIME 2002, Manchester, UK GE-Join Performance R/I ratio = 10.

TIME 2002, Manchester, UK GE-Join Performance R/I ratio = 0.1.

TIME 2002, Manchester, UK Conclusions 4 We addressed index-based GT-Join and GE-Join. 4 Joins using traditional indices (B+-tree, R-tree) are not efficient. 4 We proposed various synchronized approaches based on temporal indices (MVBT). 4 Experiments: –for GT-Join, link-based and plane-sweep are the best; –for GE-Join, link-based and sort-merge are the best; –overall, link-based is the best: multi-fold improvement over B+-tree/R-tree joins.

TIME 2002, Manchester, UK