Presentation is loading. Please wait.

Presentation is loading. Please wait.

HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

Similar presentations


Presentation on theme: "HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo."— Presentation transcript:

1 HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo

2 HKU CSIS DB Seminar Skyline A new operator in database systems Filters out a set of interesting points from a potential large set of data points A data point is interesting if it is not dominated by any other point

3 HKU CSIS DB Seminar Example Find some good places for us to hold the next DB Seminar Dataset (Table Homes): HomeDistance from HKUArea (m 2 ) Kevin1 km10 Ben9 km100 Felix5 km2 K.K Loo8 km250 Good  Close to HKU (Min.) Good  Max. Area (Max.) Return those homes that are not worse than any others in ALL DIMENSIONS

4 HKU CSIS DB Seminar Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion

5 HKU CSIS DB Seminar The Skyline Operator ICDE 2001 S.Borzonyi, D.Kossmann, K.Stocker 1.Define the skyline operator in databases 2.Extension of SQL for skyline 3.Block-nested-loop Algorithm 4.Divide-and-conquer Algorithm

6 HKU CSIS DB Seminar Problem Definition Related to: maximum vector problem contour problem convex hull of a data set Assume the whole dataset fits in the memory

7 HKU CSIS DB Seminar SQL Extensions SKYLINE OF [DISTINCT] d1 [MIN | MAX], … … dm [MIN | MAX] SELECT … FROM … WHERE … GROUP BY … HAVING … SKYLINE OF [DISTINCT] d1 [MIN | MAX], … … dm [MIN | MAX] ORDER BY … d1… dm denote the dimensions participate the Skyline SELECT * FROM HOMES WHERE CITY=‘HK’ SKYLINE OF DIST MIN, AREA MAX;

8 HKU CSIS DB Seminar Naïve Approach for Skyline 1D skyline is equivalent to computing min, max in SQL Naïve 2D skyline: Sort the data according to the 2 dimensions Compare every tuple with its predecessor Sorting may need in 2 or more passes if the data are not fit into memory  use current external sorting techniques

9 HKU CSIS DB Seminar Naïve 2D HomeDistance from HKUArea Kevin1 km10 Felix5 km2 KK8 km250 Ben9 km100 1.Sort by “Distance” 2.Compare “Felix” with “Kevin”  eliminate “Felix” 3.Compare “KK” with “Kevin”  incompatible  part of skyline 4.Compare “Ben” with “KK”  eliminate “Ben”

10 HKU CSIS DB Seminar Naïve 2D not works for > 2Ds If skyline involves more than 2D, sorting does not work HomeDistance from HKUAreaRent Kevin1 km10$9 Felix5 km2$5 KK8 km250$10 Ben9 km100$9 2D3D Cmp Felix, Kevin  eliminated Cmp KK, Kevin  part of skyline Cmp Ben, KK  eliminated Cmp Felix, Kevin  part of skyline Cmp KK, Felix  part of skyline Cmp Ben, KK  part of skyline  No! Ben dominated by Kevin  predecessor not work!

11 HKU CSIS DB Seminar Block-nested-loops Algorithm A straightforward approach: Compare each point p with every other point If p is not dominated  part of skyline Scan the data file and keeping a list of candidate skyline points in main memory

12 HKU CSIS DB Seminar BNL cont. 1. Insert the 1 st data point into the list 2. For each subsequent point p: 1. If p is dominated by any point in the list, it is discarded 2. If p dominates any point in the list, insert it into the list and remove all points dominated by p 3. If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list

13 HKU CSIS DB Seminar BNL cont. The candidate list is self-organizing: Points that have been dominated other points are moved to the top of list Reduces the number of comparisons E.g. the self-organizing list holdings the partial skyline like: HomeDistance from HKUAreaRent Kevin1 km249$1 K.K8 km250$100... … Other skylines which is not as strong as Kevin except a few dimensions

14 HKU CSIS DB Seminar More on BNL Point 3 in BNL: If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list. If the are no more space in the list, write p on a temporary file on disk. Tuples in tmp file will be further processed in next iteration of algorithm

15 HKU CSIS DB Seminar More on BNL (cont.) A E F H A B C D E F G H I J Dom. by A Incompatible with A Incompatible with A,E Dom On F, replace F G Incompatible with A,E,G Incompatible with A,E,G,H, but full! Incompatible with A,E,G,H, but full! J has not compare with I I J After 1 st Iteration, A,E,G,H are output as skylines, then clear up the list and treat I,J… as new data set and perform BNL again

16 HKU CSIS DB Seminar Short summary on BNL Easy to implement Any dimension without using index or sorting Relies on main memory  may have many iterations Not adequate for on-line processing  it has to read the entire data file before it returns the first skyline point (not progressively…)

17 HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist 0.24 0.84 0.43 0.32 0.12 0.61 0.83 0.23 0.33 0.52 1)Find the median of some dimension, say price, Price(med)=0.3 2)Split the input into 2 partitions according to Price(med) 3)Compute Skyline S1 in P1( =0.3) respectively by recursive partitioning. [Note: S1 is better than S2 on price] 4)Recursive partitioning until a partition contains very few (or 1) tuples 5)If only a few tuples, find out skyline is very easy 6)Merging the skylines of each partitions by eliminating those S2 which are dominated by S1 [Note: None of the tuples in S1 can be dominated by S2 as all tuples in S1 are better than S2 on price i.e. tuples in UPPER never be eliminated]

18 HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist 0.24 0.84 0.43 0.32 0.12 0.61 0.83 0.23 0.33 0.52 0.32 3 0.24 3 0.12 1)Find the median of some dimension, say price, Price(med)=0.3 2)Split the input into 2 partitions according to Price(med) 0.84 0.61 0.52 0.43 0.83

19 HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist 0.24 0.84 0.43 0.32 0.12 0.61 0.83 0.23 0.33 0.52 0.32 3 0.24 3 0.12 0.84 0.61 0.52 0.43 0.83 0.43 0.52 0.61 0.83 4 0.24 3 0.12 0.32 3 S1 S2 S3 S4 S5 S6 S7

20 HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist 0.24 0.84 0.43 0.32 0.12 0.61 0.83 0.23 0.33 0.52 0.32 3 0.24 3 0.12 0.84 0.61 0.52 0.43 0.83 0.43 0.52 0.84 3 0.61 0.24 3 0.12 0.32 3 S1 S2 S3 S4 S5 S6 S7 S1, S2 S4, S5, S7 S1,S2, S7 i.e. tuples in UPPER never be eliminated

21 HKU CSIS DB Seminar Efficient Progressive Skyline Computation VLDB 2001 K.L. Tan, P.K. Eng, B.C. Ooi Previous approach require at least one pass over the dataset to return the first interesting point, We propose: 1.Bitmap-based Algorithm 2.B+-tree-based Algorithm They can return the first interesting point once they identified.

22 HKU CSIS DB Seminar Progressive? Both bitmap and tree-base returns skyline very quickly Maybe useful if you are not willing to wait so long for the first few interesting homes out of the large dataset Also outperform BNL and D-&-C in overall response time

23 HKU CSIS DB Seminar Skyline by Bitmap Main idea: Given a point p, if “something” can tell you: p is not dominated by any other points in DB  skyline! p is dominated by some points in DB  throw away Non-blocking! Can return the skyline points immediately

24 HKU CSIS DB Seminar Bitmap All information requires to decide whether a point is in skyline are encoded in bitmaps A data point p = (p 1, p 2, …, p d ) where d is no. of dimensions, is mapped to a m-bit vector, m is number of distinct values over all dimensions

25 HKU CSIS DB Seminar Bitmap Distinct values on price and distance is 7 and 4 m = 11 PriceDist 0.24 0.84 0.43 0.32 0.12 0.61 0.83 0.23 0.33 0.52

26 HKU CSIS DB Seminar Bitmap representation Distinct value on x: 10 Distinct value on y: 10 m=20 20-bit vector E.g (4,8): 4 is 4-th smallest on dimension x, set 4-th to the leftmost be 1 (starting from right) 8 is 8-th smallest on y, set 8-th to the leftmost be 1) PointBitmap Representation (1,9)(1111111111, 1100000000) (2,10)(1111111110, 1000000000) (4,8 )(1111111000, 1110000000) (6,7 )(1111100000, 1111000000) (9,10 )(1100000000, 1000000000) (7,5 )(1111000000, 1111110000) (5,6 )(1111110000, 1111100000) (4,3 )(1111111000, 1111111100) (3,2 )(1111111100, 1111111110) (9,1 )(1100000000, 1111111111) (10,4 )(1000000000, 1111111000) (6,2 )(1111100000, 1111111110) (8,3)(1110000000, 1111111100)

27 HKU CSIS DB Seminar Bitmap representation Do (4,8) is a skyline point? (min x, y) Create bit-strings Cx and Cy (Not CY Ng!) Cx= 1110000110000 Cy= 0011011111111 Cx & Cy = 0010000110000 If Cx&Cy has more than one ‘1’, dominated by some points PointBitmap Representation (1,9)(1111111111, 1100000000) (2,10)(1111111110, 1000000000) (4,8 )(1111111000, 1110000000) (6,7 )(1111100000, 1111000000) (9,10 )(1100000000, 1000000000) (7,5 )(1111000000, 1111110000) (5,6 )(1111110000, 1111100000) (4,3 )(1111111000, 1111111100) (3,2 )(1111111100, 1111111110) (9,1 )(1100000000, 1111111111) (10,4 )(1000000000, 1111111000) (6,2 )(1111100000, 1111111110) (8,3)(1110000000, 1111111100)

28 HKU CSIS DB Seminar Bitmap representation Do (3,2) is a skyline point? (min x, y) Create bit-strings Cx and Cy Cx= 1100000010000 Cy= 0000000011010 Cx & Cy = 0000000010000 If Cx&Cy has only 1, it is a skyline PointBitmap Representation (1,9)(1111111111, 1100000000) (2,10)(1111111110, 1000000000) (4,8 )(1111111000, 1110000000) (6,7 )(1111100000, 1111000000) (9,10 )(1100000000, 1000000000) (7,5 )(1111000000, 1111110000) (5,6 )(1111110000, 1111100000) (4,3 )(1111111000, 1111111100) (3,2 )(1111111100, 1111111110) (9,1 )(1100000000, 1111111111) (10,4 )(1000000000, 1111111000) (6,2 )(1111100000, 1111111110) (8,3)(1110000000, 1111111100)

29 HKU CSIS DB Seminar Short summary on Bitmap Need to pre-compute bitmap representation of every point Each point retrieve all bitmaps in order to get the juxtapositions (Cx and Cy) Large storage if the domain of each attributes are large

30 HKU CSIS DB Seminar Some other progressive algorithms B+-tree index (also proposed by BOC) Organizes the points into d lists (d is no. of dimensions in data) Build B+tree on the lists for retrieving skylines Suffer similar problem as bitmap approach

31 HKU CSIS DB Seminar Some other progressive algorithms (cont.) NN algorithm (by Donald Kossmann again) [VLDB 02]

32 HKU CSIS DB Seminar NN skyline

33 HKU CSIS DB Seminar Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion

34 HKU CSIS DB Seminar An Optimal and Progressive Algorithm for Skyline Queries SIGMOD 2003 D.Papadias, Y. Tao, G. Fu, B. Seeger We propose: 1.A NN algorithm which is more efficient and I/O Optimal 2.Ranked skyline queries 3.Constrained skyline queries 4.Dynamic skyline queries 5.K-dominating queries

35 HKU CSIS DB Seminar Ranked Skyline A ranked skyline returns K skyline points that have minimum/max score according to a function f In our example, f = 3*Dist + 7*Area Return the top K homes Though skylines are returning interesting points, we may want the most interesting points according to our own preferences, especially the data set is large(say hotels) and skyline is also large!

36 HKU CSIS DB Seminar Constrained Skyline Returning the most interesting points in a specific data space

37 HKU CSIS DB Seminar Dynamic Skyline Returning update skyline dynamically E.g. Ask for hotels with minimum distance and price (again?) Minimum distance is now depends on my current location

38 HKU CSIS DB Seminar Enumerating Skyline Enumerate queries return, for each skyline point p, the number of points dominated by p Sometime useful if you want to know this skyline hotel C has dominated 1000 hotels, and another hotel Y dominated only 1 hotel maybe C is better than Y in many properties (e.g. price, dist, etc), but Y has only 1 properties better than C, e.g. with PS2

39 HKU CSIS DB Seminar Experimental Evaluation Running time comparison on progressive algorithms without NN approaches Index Bitmap D&C BNL

40 HKU CSIS DB Seminar Conclusion Introduction the skyline queries How to implement (support) the skyline operator in DBMS? Variation of skyline queries If the information are placed in different places, how to answer skyline queries on a mobile device?

41 HKU CSIS DB Seminar References S.Borzonyi, D.Kossmann, K.Stocker. The Skyline Operator. ICDE 2001. K.L. Tan, P.K. Eng, B.C. Ooi. Efficient Progressive Skyline Computation. VLDB 2001. D.Kossmann, F.Ramsak, S. Rost. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. VLDB 2002. D.Papadias, Y. Tao, G. Fu, B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. SIGMOD 2003.


Download ppt "HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo."

Similar presentations


Ads by Google