Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering.

Similar presentations


Presentation on theme: "ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering."— Presentation transcript:

1 ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering Closest-Point-of-Approach Join for Moving Object Histories 22 nd International Conference on Data Engineering

2 ICDE-2006 2 SELECT distinct (r, s) FROM R as r, S as s, TIME t WHERE dist (r, s, t) < 0.5 AND (r(t).altd - s(t).altd) ≥ -1000 AND (r(t).altd - s(t).altd) ≤ 1000 AND s(t)  C AND r(t)  C AND t ≥ 'JAN-1-2005’ AND t ≤ 'MAR- 31-2005' “Find all commercial airliners that approached within 1000 vertical feet and 0.5 miles of a single engine plane in the BOS/JFK/EWR/LGA corridor C in the first three months of last year” CPA-Join Is Useful For Analysis Of Spatiotemporal Data Commercial airliners R, single engine planes S

3 ICDE-2006 3 Challenges 3-dimensional space + time Large # of objects Massive amount of data

4 ICDE-2006 4 CPA Illustration for Straight Line Trajectories Object p Object q CPA - Position at which two dynamically moving objects attain their closest possible distance

5 ICDE-2006 5 50 40 30 20 10 0 304050201060 70 y x 0 1 2 3 4 5 0 1 2 3 4 5 40,32 38,18 51,27 49,12 5,32 6,26 15,39 59,18 27,38 11,49 5,32 24,65 Time Object P Object Q 0 1 2 3 4 5 50 40 30 20 10 0 304050201060 70 y x 0 1 2 3 4 5 0 1 2 3 4 5 Polyline approximation Sampled Positions Moving Object Trajectories dist cpa

6 ICDE-2006 6 Simple CPA-Join Procedure CPA (Object P, Object Q, distance d) 1. List result = {}; 2. for each pair of segments (p  P, q  Q) 3. if CPA_distance (p,q)  d 4. result += (p,q); 5. return result; Need to compare only those segments whose time interval overlaps Plane sweep Find all object pairs (p  P, q  Q) from relations P and Q such that CPA-distance (p,q)  d

7 ICDE-2006 7 CPA-Join using Simple Plane Sweep - First sort the segments in P and Q along time dimension (external sort) -While there is still some unprocessed data - Read in enough segments from P and Q to fill the main memory buffer -Next, sweep a vertical line along the time dimension. -Maintain a sweepline data structure which keeps tracks of all active segments that intersect the sweep line -As the sweep line progresses, the sweepline data structure is updated with insertions (new segments that became active) and deletions (segments whose time period has expired) -During updates to the sweepline structure, an all-pairs comparison returns valid results’

8 ICDE-2006 8 CPA-Join using Plane Sweep Sweep line has to pause at every new sample point encountered. Processing multi-gigabyte dataset can take a long time memory dis k

9 ICDE-2006 9 Group segments using a bounding box approximation dis k In the best case, just 1 comparison is needed memory dis k

10 ICDE-2006 10 Algorithm: Layered Plane Sweep While there is still some unprocessed data in disk Read in data from relations P and Q to fill in the buffer Construct MBR for the trajectory of every object in the buffer Sort MBRs along one of the spatial dimension and do a plane-sweep in it to identify qualifying MBR pairs Expand the MBRs to obtain the individual segments Sort segments along time dimension and do a plane-sweep along time to obtain the actual results

11 ICDE-2006 11 Layered Plane-Sweep Example But one size doesn’t fit all!

12 ICDE-2006 12 -Indexes can be used to do CPA-Join -But (almost) all indexes use MBR approximation -And MBRs impose predefined granularities p q x y z A Note on Indexing

13 ICDE-2006 13 Layered Plane Sweep..what is the problem? Layered Plane Sweep always processes the entire fraction of data held in memory buffers When objects interact heavily such an approach may lead to no pruning at all In the best case, just one comparison is needed Though less buffer is processed initially, overall efficiency can be better Efficiency of layered technique is not tied to the amount of data processed, but to choosing a granularity that minimizes the # of distance computations

14 ICDE-2006 14 Cost to Process Data in Memory Buffer Cost can be approximated as a function of distance computations (which dominate execution time) cost = (n seg + n MBR ) where n seg is the # of segment level comparisons n MBR is the # of bounding box comparisons In general, cost for a fraction  (0 ≤  ≤ 1) of the buffer cost  = (n seg + n MBR ) * (1/  )

15 ICDE-2006 15 What we have Layered Plane Sweep processes large fraction (  is large) good when there is light interaction bad when there is heavy interaction Simple Plane Sweep processes tiny fraction (  is small) good when there is heavy interaction bad when there is light interaction What we want An Adaptive Algorithm processes a fraction that maximizes performance (  varies) Tunes to the characteristics of underlying data Provide superior performance under all scenarios

16 ICDE-2006 16 Algorithm: Adaptive Plane Sweep While there is still some unprocessed data in disk Read in data from relations P and Q to fill in the buffer Choose a fraction  of the data that maximizes performance Process the chosen fraction of data using Layered Plane Sweep

17 ICDE-2006 17 How many fractions should we consider? How to estimate the cost for a given fraction  ? “Evaluate increasing buffer fractions from 0 to 1 and choose the fraction with the minimum cost” Goal: Choose a fraction  of data that maximizes performance

18 ICDE-2006 18 Exact cost is known only after the fact! To know the cost associated with a given , we need to actually execute the join (layered plane sweep) at that granularity How to estimate Cost  for a given fraction  Estimate cost using a simple online sampling algorithm [HH97]

19 ICDE-2006 19 Cost Estimation through sampling Given: Relations P and Q and alpha Consider segments within  Construct MBRs for the objects in P Until the estimate of cost  is accurate to within +/- 10% –Pick randomly an object q 1 from Q and construct a MBR for its trajectory –Join q 1 with all objects in P –Compute n MBR,q1 and n seg,q1 –Estimate cost  How to estimate Cost  for a given fraction  (Contd.)

20 ICDE-2006 20 How many fractions to consider? –Computing cost for all  not practical..it will offset any benefit that we gain from the adaptive technique..we need a strategy to limit the # of fractions that we process “Evaluate increasing buffer fractions from 0 to 1 and choose the fraction with the minimum cost”

21 ICDE-2006 21 How many fractions to consider?  vs cost graph is not linear, it exhibits convexity Convex region represents the candidate region with the minimum cost We can get-away with evaluating the cost for a small k fractions of  Fraction considered Cost (millions)

22 ICDE-2006 22 How to choose the k fractions? K = 10; t start =32; t end =53 FractionTime rangeCost  1 = 0.11 [32-33.27]90  2 = 0.14 [32-33.61]71  3 = 0.18 [32-34.05]52  4 = 0.23 [32-34.60]37  5 = 0.30 [32-35.31]31  6 = 0.38 [32-36.21]35  7 = 0.48 [32-37.35]41  8 = 0.61 [32-38.80]52  9 = 0.78 [32-40.65]59  10 = 1.0 [32.0-53.0]71 Acceptable candidates r = t end - t start  1 = r (1/k) /r  i = (r.  1 ) i /r Fraction chosen can be fine-tuned through recursive calls

23 ICDE-2006 23 Putting it all together Fill Buffer Optimizer Layered Plane Sweep More data? Relation R, S; distance d; Parameter k Evaluate k fractions, choose best Process join on best fraction Read from relations R and S

24 ICDE-2006 24 Benchmarking Code: Implemented and tested the various alternatives in C/C++ –R-Trees, Simple Sweep, Layered Sweep, Adaptive Sweep with various parameter settings Workload: 2 relations, 100,000 objects (50 GB) –Physics-based Simulation data set –Synthetic data set Hardware: Linux 2.4 GHz pentium Xeon, 1 GB Main memory, 2 IDE drives 15,000 rpm Setup: 64 KB page size, buffer size 10,000 pages

25 ICDE-2006 25 Collision Data Set 100,000 objects, collision occurs during time range [1500 - 2500] Snapshot at timetick 1500

26 ICDE-2006 26 Results - Execution Time for different Strategies % of join completed Execution time (seconds) R-tree simple sweep layered sweep adaptive sweep K=20 K=10 K=5

27 ICDE-2006 27 Buffer Choices made by the optimizer Virtual time line in the data set Fraction of buffer chosen

28 ICDE-2006 28 Discussion  R-trees couldn’t do enough pruning to make a difference  Simple plane-sweep works well when there is heavy interaction among objects  Layered plane-sweep works well when there is light interaction  Adaptive version transitions smoothly between these extremes  Recursive call to fine-tune candidate region doesn’t seem to help much

29 ICDE-2006 29 Conclusion… CPA-Join for spatiotemporal relations Proposed a novel adaptive join algorithm for moving object histories based on extension of the plane-sweep Many practical applications

30 ICDE-2006 30 Questions? Thank You! Subramanian (subi@ufl.edu)subi@ufl.edu


Download ppt "ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering."

Similar presentations


Ads by Google