Presentation is loading. Please wait.

Presentation is loading. Please wait.

TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.

Similar presentations


Presentation on theme: "TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside."— Presentation transcript:

1 TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside

2 TIME 2002, Manchester, UK Contents 4 Background 4 Join problem definition 4 Straightforward approaches 4 Proposed join algorithms 4 Performance study 4 Conclusions

3 TIME 2002, Manchester, UK Background 4 Temporal record: (key, time interval) and some attributes. 4 TE-Join: two records qualify for join if  their time intervals intersect; and  their keys are equal.

4 TIME 2002, Manchester, UK Background 4 Our earlier work [ICDE02] solved a general TE-Join (GTE-Join), where portions from each relation are joined:  the portion is selected via a range-interval selection: record keys should be in range r and time intervals should intersect interval i.  interesting because (1) temporal relations are large; (2) TE-Join is a special case, when r and i are (- , +  ).

5 TIME 2002, Manchester, UK Problem Definition 4 Semi-restrictive joins: records join if their keys are equal (GE-Join), or their intervals intersect (GT-join), but not both. 4 GE-Join: select a subset from X, a subset from Y, and join records from the subsets if their keys are equal. 4 GT-Join: select a subset from X, a subset from Y, and join records from the subsets if their intervals intersect.

6 TIME 2002, Manchester, UK Problem Definition 4 GT-Join example: find employees whose last names start with ‘B’ and who co-worked during 1995 with the employees whose last names start with ‘S’. 4 GE-Join example: find the 1998 IBM employees who were UC Riverside students in 1995.

7 TIME 2002, Manchester, UK GT-Join Solutions...

8 TIME 2002, Manchester, UK Straightforward Solutions for GT-Join 1. Unsynchronized join. 2. Synchronized join using B+-trees. 3. Synchronized join using R-trees.

9 TIME 2002, Manchester, UK 1. Unsynchronized join: separate the selection and join phases; not efficient because: 4 storing the intermediate result can be large; 4 selection in one relation ignores data distribution of the other relation. Straightforward Solutions for GT-Join

10 TIME 2002, Manchester, UK 2. Synchronized using B+-trees.  Not efficient: y needs to be checked against every record whose start is before end of y.  If cluster on start:  Cluster on end is similar. Straightforward Solutions for GT-Join

11 TIME 2002, Manchester, UK  Store each record as a two-dimensional interval in the R-tree;  Use existing R-tree join algorithms [BKS93, HJR97];  Modifications: (1) integrate the selection condition; (2) join index records as long as they intersect in time dimension and ignore key dimension.  However, not efficient since R-trees do not handle long intervals well. 3. Synchronized using R-trees. Straightforward Solutions for GT-Join

12 TIME 2002, Manchester, UK Our Solutions 4 Synchronized join using temporal indices. 4 Multi-version B+-tree (MVBT) [BGO+96]: asymptotically optimal space, update, query. 4 We propose three synchronized, MVBT- based join algorithms. (apply to other temporal indices as well)

13 TIME 2002, Manchester, UK Review of MVBT 4 A “forest” of trees: different trees may overlap. 4 Root nodes correspond to contiguous, non- intersecting time intervals. 4 A record may be stored in multiple pages. 4 Efficient range-interval selection algorithms.

14 TIME 2002, Manchester, UK Top-down GT-Join 4 Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT). 4 STT for two trees: 4 Note that special care is needed to avoid duplicates, since a record has multiple copies.  initially, join root nodes;  to join two nodes, join their children;  eventually, join elements in leaf pages.

15 TIME 2002, Manchester, UK Link-based GT-Join 4 In each leaf page, store a pointer to its predecessor. D  find pairs of data pages that (1) intersect with the right border of the query rectangle; and (2) intersect with each other in time dimension;  keep such pairs in priority queue;  sweep left synchronously. 4 For GT-Join:

16 TIME 2002, Manchester, UK Plane Sweep GT-Join 4 Similar to link-based. 4 Maintain two priority queues, one for each MVBT. 4 At each step, access the leaf page with the largest end time and add records to buffer. 4 To add records to buffer, join with existing records from the other MVBT. 4 Throw away useless records.

17 TIME 2002, Manchester, UK GE-Join Solutions...

18 TIME 2002, Manchester, UK GE-Join Solutions... Similarly, we have: 4 unsynchronized 4 synchronized using B+-trees 4 synchronized using R-trees 4 top-down using MVBT 4 link-based using MVBT Note: some of them, especially the link-based algorithm, are quite different due to different join condition.

19 TIME 2002, Manchester, UK Implemented Algorithms Notation:Meaning: mvbt_dfSynchronized MVBT, depth-first mvbt_bfSynchronized MVBT, breadth-first mvbt_linkSynchronized MVBT, link-based r*_dfSynchronized R*-tree, depth-first r*_bfSynchronized R*-tree, breadth-first Common to both GT-Join and GE-Join:

20 TIME 2002, Manchester, UK Implemented Algorithms mvbt_psSynchronized MVBT, plane-sweep spjspatially partitioned join [LOT94] b+Synchronized B+-tree, index on key mvbt_smUnsynchronized, sort-merge after selection Specific to GE-Join: Specific to GT-Join:

21 TIME 2002, Manchester, UK Experimental Setup Implemented in GNU C++. Sun Enterprise 250 Server machine with two UltraSPARC-II processors using Solaris 2.8. Page size = 8KB. Buffer size = 10MB; LRU buffer. Each data set: 10 million records. R/I ratio: length of query key range divided by length of query time interval. It describes the shape of query rectangle.

22 TIME 2002, Manchester, UK GT-Join Performance R/I ratio = 10.

23 TIME 2002, Manchester, UK GT-Join Performance R/I ratio = 0.1.

24 TIME 2002, Manchester, UK GE-Join Performance R/I ratio = 10.

25 TIME 2002, Manchester, UK GE-Join Performance R/I ratio = 0.1.

26 TIME 2002, Manchester, UK Conclusions 4 We addressed index-based GT-Join and GE-Join. 4 Joins using traditional indices (B+-tree, R-tree) are not efficient. 4 We proposed various synchronized approaches based on temporal indices (MVBT). 4 Experiments: –for GT-Join, link-based and plane-sweep are the best; –for GE-Join, link-based and sort-merge are the best; –overall, link-based is the best: multi-fold improvement over B+-tree/R-tree joins.

27 TIME 2002, Manchester, UK


Download ppt "TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside."

Similar presentations


Ads by Google