Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Similar presentations


Presentation on theme: "Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer."— Presentation transcript:

1 Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer Science City University of Hong Kong, Boston University, Hong Kong University of Science and Technology 18, March, 2004

2 Outline Applications and motivation Preliminaries –Aggregate trees and sketch techniques Distinct spatio-temporal aggregation Performance study Extensions Conclusion

3 Traffic Supervision Systems –Monitoring the number of vehicles in a district, the information could be used to identify the traffic jam area etc. Mobile Computing Applications –Allocating bandwidth depending on the usage of each region Spatio-Temporal Aggregate Query -- Applications Example: For wireless companies, they would like to know the number of cell phone users in a particular region in a specified period. In addition, it is also interesting to know the total number of phone calls made by all users who qualified the first query.

4 Spatio-Temporal Aggregate Query Spatio-Temporal Application requires the retrieval of summarized information about moving objects Given an aggregate query region as a rectangle qr and query interval qt, a spatio temporal aggregate query retrieves information about objects that appeared in qr during qt –Spatio-Temporal Count Returns the total number of qualifying objects –Spatio-Temporal Sum Each object associated with a measure, outputs the sum of the measures of the qualifying objects. Existing Approach: multi-tree structures based on R-trees and B-trees –Problem: If an object remains in the query region for several timestamps during the query interval, it will be counted (or summed ) multiple times in the result.

5 Spatio-Temporal Aggregate Query (cont.) Motivation: Distinct Spatio-Temporal Aggregate Query Enable a much richer range of decision-making queries But: There is no way to exactly summarize distinct objects substantially better than by simply enumerating all of them Solution: Spatio-Temporal Aggregation Index Trees Sketch Techniques How to answer “Distinct Aggregate Query” ? e.g: How many cars are present in a district?

6 Example Query retrieve the aggregate sum (during time T 1 -T 3 ) of all rectangles that intersect it. regions 123 5 r 1 r 2 r 3 4 r 150 75 12 150 80 12 145 85 12 90 130135 90 132127125127 12 4 time R 1 r 1 r 4 r 3 R 2 r 2 q r

7 Preliminaries -- Aggregate RB-tree In the aRB-tree, the extents of all regions (in this case r 1,r 2,…,r 4 ) are stored in an R-tree. Each (leaf/non-leaf) entry of the R-tree is associated with a pointer to a B-tree that stores historical aggregate data about the entry

8 Preliminaries – Flajolet-Martin sketches Goal: Small-space representation of a set of items. Sketch of a union of items is the OR of their bitmaps. Prerequisite: Let h be a random, binary hash function. Sketch of an item For each unique item with ID x, For each integer 1 ≤ i ≤ k in turn, Compute h (x, i). Stop when h (x, i) = 1, and set bit i. X 00100 Z 10000 X Z 10100 ∩

9 Preliminaries – Flajolet-Martin sketches (cont.) Estimating COUNT Take the bitmap of a set of N items. Let j be the position of the leftmost zero in the bitmap. j is an estimator of log 2 (0.77 N) Fixable drawbacks: Variance in the estimate is large. 110 1 S 1 Best guess: COUNT ~ 11 j = 3

10 Preliminaries – Flajolet-Martin sketches (cont.) Standard variance reduction methods apply. Compute m independent bitmaps in parallel. Generate m independent estimates of N. Take the mean of the estimates. Provable tradeoffs between m and variance of the estimator.

11 Distinct Spatio-Temporal Aggregation Exact Solution If n is the number of distinct objects and T is the total number of timestamps in history, the exact solution requires  (n∙T) space. Existing Aggregation Approach aRB tree stores only the summarized data, information about individual objects is lost and the problem cannot be solved. Our Solution Combining aRB tree with FM sketch technique! For each region r i and every timestamp t we maintain a sketch s i (t) that captures the (ids of) objects in r i at t. Requires  (m∙R∙T∙logn) space. where R is the number of regions and m is an adjustable constant specifying number of bitmaps used by one sketch. (determines the tradeoff between overhead and approximation accuracy)

12 System Architecture The sketches can be stored in a two dimensional array

13 Sketch Indexing Structures The sketch of a non-leaf entry in B-tree equals to the OR of all the sketches in its sub-trees. R 1 r 1 r 4 r 3 R 2 r 2 q r q t =(1,4)

14 Query Processing Similar to the query processing technique in aRB tree. Basic Idea: The spatial and temporal searching conditions are applied alternatively. The result sketch is incrementally updated. Can be improved by applying some pruning techniques. Heuristic 1: Let RS be the current result sketch, and e a non-leaf B-tree entry whose associated sketch is s e. Then, the sub-tree of e can be pruned if (s e OR RS) = RS. Heuristic 2: Given a set of entries that cannot be pruned by Heuristic 1, we visit their child nodes in descending order of the number of 1’s in their sketches. And more heuristics!

15 Query Processing – Supporting Distinct Sum Query Extending FM sketches FM sketches can handle this : - to insert a value of 500, perform 500 distinct item insertions Our observation: We can simulate a large number of insertions into an FM sketch more efficiently.

16 Performance Dataset settings –Number of cities = 10,000 –Number of buses = 100,000 –History length = 1,00 timestamps –Number of passengers for each bus = [200,300] –At each timestamp, bus reports to its nearest city, Each query contains 2 parameters: (spatial extents and interval length) A count query retrieves the number of distinct buses that report to cities in qr during qt, while a sum query returns the sum of these buses’ passengers Compare the sketch-index to the relational approach: index the 4-tuple table using a B-tree on the time t column

17 Results (Space Consumption) 0 20 40 60 80 100 120 140 160 81632 number of bitmaps per sketch size (mega bytes) database size Size of sketch index could be further reduced by applying simple compression techniques!

18 Results (Sketch Pruning in Query) (a) Cost vs. qrlen (qtlen=10)

19 Results (Sketch Pruning in Query) (b) Cost vs. qtlen (qrlen=0.15)

20 Results (Accuracy of Approximate Results) 32-bitmap 16-bitmap 8-bitmap (a) Error vs. qrlen (qtlen=10, count)

21 Results (Accuracy of Approximate Results) (b) Error vs. qrlen ( qtlen=10, sum) 32-bitmap 16-bitmap 8-bitmap

22 Results (Costs of Indexes) 32-bitmap 16-bitmap 8-bitmap (a) Cost vs. qrlen (qtlen=10)

23 Results (Costs of Indexes) (b) Cost vs. qtlen (qrlen=0.15) 32-bitmap 16-bitmap 8-bitmap

24 Extensions Approximating general moving data Problem: Each object o reports its location at each timestamp t, the size of the database grows continuously!   (n∙T) Solution: Impose a res  res regular grid over the data space, the sketch index is applied by treating the grid cells as the finest aggregate granularity.  O((res) 2 ∙T∙logn) [or, O(T∙logn) when res is a constant ]

25 Conclusion We propose a sketch index that integrates traditional approximate counting techniques with spatio-temporal indexes for efficient distinct aggregation query processing in spatio-temporal database. Sketch index consumes less space and give an order of magnitude faster query process with less aggregate error than a conventional database. Extensions and Future work –Other possible sketches –More sophisticated algorithms for mining association rules


Download ppt "Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer."

Similar presentations


Ads by Google