Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.

Similar presentations


Presentation on theme: "ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University."— Presentation transcript:

1 ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University of California, Riverside Bernhard Seeger University of Marburg, Germany

2 ICDE 2002, San Jose, CA Contents 4 Problem definition: GTE-Join 4 Straightforward approaches 4 Temporal indexing 4 Proposed join algorithms 4 Performance study 4 Conclusions

3 ICDE 2002, San Jose, CA Problem Definition 4 Temporal record: (key, start, end, attributes) 4 TE-Join: two records qualify for join if  their time intervals intersect; and  their keys are equal.

4 ICDE 2002, San Jose, CA TE-Join: “find the locations and Managers of all departments over time”.

5 ICDE 2002, San Jose, CA Problem Definition 4 GTE-Join: general TE-Join – record keys should be in a certain range r and time intervals should intersect a given interval i.  temporal relations are large;  TE-Join is a special case, when r and i are (- , +  ). Interesting because:

6 ICDE 2002, San Jose, CA GTE-Join: “find the locations and managers of departments in range [D1, D2] during time [5, 10]”.

7 ICDE 2002, San Jose, CA Straightforward Solutions 4 Non-indexed join; 4 Unsynchronized join; 4 Synchronized join using B+-trees; 4 Synchronized join using R-trees.

8 ICDE 2002, San Jose, CA Straightforward Solutions 1. Non-indexed join: existing TE-Join research [Zur97] focuses on non-indexed join; not efficient for GTE-Join due to full scan. 2. Unsynchronized join: separate the selection and join phases; not efficient for: 4 storage of intermediate result; 4 selection in one relation ignores data distribution of the other relation.

9 ICDE 2002, San Jose, CA 3. Synchronized using B+-trees;  Not efficient: Straightforward Solutions  If cluster on start:  Cluster on end is similar.

10 ICDE 2002, San Jose, CA  records with keys in r are stored together and are sorted;  focus on these records in each relation and sort-merge join, while skipping those whose intervals not in i.  However, not efficient since records in the query rectangle are scattered. 3. Synchronized using B+-trees; Straightforward Solutions  If cluster on key:

11 ICDE 2002, San Jose, CA  Store each record as a two-dimensional interval in the R-tree;  Use existing R-tree join algorithms [BKS93, HJR97];  Modification: integrate the selection regarding query rectangle.  However, not efficient since R-trees do not handle long intervals well. 4. Synchronized using R-trees; Straightforward Solutions

12 ICDE 2002, San Jose, CA Our Solutions 4 Synchronized join using temporal indices. 4 Multi-version B+-tree (MVBT) [BGO+96]: asymptotically optimal space, update, query. 4 We propose: two categories of synchronized, MVBT-based join algorithms. (apply to other temporal indices as well)

13 ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

14 ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

15 ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

16 ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

17 ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.

18 ICDE 2002, San Jose, CA

19

20

21 Review of MVBT 4 A “forest”: different trees may overlap; 4 Root nodes correspond to contiguous, non- intersecting time intervals; 4 A record may be stored in multiple pages; end time of all but the last copy is + . 4 Range-Interval selection algorithms [BS96]: avoid duplicate by reporting the first copy.

22 ICDE 2002, San Jose, CA The Incorrect End Time Problem Solution: report the rightmost copy! [BS96] reports first copy of x (whose end is +  ); would lead GTE-Join algorithms to join x with y.

23 ICDE 2002, San Jose, CA Top-down Approaches 4 Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT). 4 STT for two trees:  initially, join root nodes;  to join two nodes, join their children;  eventually, join elements in leaf pages. ? join condition?

24 ICDE 2002, San Jose, CA Balancing Condition Optimization (BCO) 4 To find, Page 3 and page 0 has to join; 4 BCO: balancing two conditions. (1) only intersecting pages join; (2) examine records even if not last copy. E.g. join when joining page 2 with page 0. 4 In general, join two pages even though they do not intersect. Inefficient!

25 ICDE 2002, San Jose, CA Virtual Height Optimization (VHO) 4 At the middle level, STT joins:,,,,, A1’ 4 With VHO:,

26 ICDE 2002, San Jose, CA Sideways Approach 1: Link-based 4 In each leaf page, store a pointer to its predecessor; D  find pairs of data pages that intersect with the right border of the query rectangle and with each other;  keep such pairs in priority queue;  sweep left synchronously. 4 For GTE-Join:

27 ICDE 2002, San Jose, CA Sideways Approach 1: Link-based 4 In each leaf page, store a pointer to its predecessor; D 4 special techniques to avoid duplicates.  find pairs of data pages that intersect with the right border of the query rectangle and with each other;  keep such pairs in priority queue;  sweep left synchronously. 4 For GTE-Join:

28 ICDE 2002, San Jose, CA Sideways Approach 2: Plane Sweep 4 Similar to link-based; 4 Maintain two priority queues, one for each MVBT; 4 At each step, access the leaf page with the largest end time and add records to buffer; 4 To add records to buffer, join with existing records from the other MVBT; 4 Throw away useless records.

29 ICDE 2002, San Jose, CA Performance Study Notation:Meaning: mvbt_dfSynchronized MVBT, depth-first mvbt_bfSynchronized MVBT, breadth-first mvbt_linkSynchronized MVBT, link-based mvbt_psSynchronized MVBT, plane-sweep mvbt_smUnsynchronized, sort-merge after selection b+Synchronized B+-tree, index on key r*_dfSynchronized R*-tree, depth-first r*_bfSynchronized R*-tree, breadth-first

30 ICDE 2002, San Jose, CA Experimental Setup Implemented in GNU C++; Sun Enterprise 250 Server machine with two UltraSPARC-II processors using Solaris 2.8; Page size = 8KB; Buffer size = 10MB; LRU buffer; Each data set: 10 million records; QRS: size ratio between the query rectangle and the whole space. Long intervals: 1/100 of time space; Short intervals: 1/10,000 of time space.

31 ICDE 2002, San Jose, CA GTE-Join Performance Joining mainly long intervals.

32 ICDE 2002, San Jose, CA GTE-Join Performance Joining mainly short intervals.

33 ICDE 2002, San Jose, CA GTE-Join Performance Varying QRS. (Log Scale)

34 ICDE 2002, San Jose, CA Conclusions 4 We addressed the GTE-Join; 4 Unsynchronized approach not efficient; 4 Synchronized approaches based on traditional indices (B+-tree, R-tree) also not efficient; 4 We proposed synchronized approaches based on temporal indices (MVBT); 4 We also proposed BCO and VHO optimizations; 4 Experiments: link-based is the best.

35 ICDE 2002, San Jose, CA


Download ppt "ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University."

Similar presentations


Ads by Google