Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.

Similar presentations


Presentation on theme: "6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades."— Presentation transcript:

1 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

2 6/15/20152 Top-k query Given –a relation R (id, x1, x2, x3) and –a query Q: sum(x1, x2, x3) Find k tuples with highest grades according to Q. idx1x2x3 a0.30.60.7 b0.20.30.4 c 0.50.9 d0.70.60.1 R Top-2 tuples sum 1.6 0.9 1.8 1.4

3 6/15/20153 Problem formulation 1 Given –A relational table R (id, x 1, x 2, …, x m ) –A query Q (monotone function) Find top-k tuples according to Q

4 6/15/20154 Problem formulation 2 Given –A relational table R (id, x 1, x 2, …, x m ) –A materialized view V (id, score v ) over R –A query Q (monotone function) Find top-k tuples according to Q

5 6/15/20155 Topics of Discussion Fagin’s algorithm (FA) Threshold algorithm (TA) –No Random Accesses algorithm (NRA) Prefer

6 6/15/20156 Topics of Discussion Fagin’s algorithm (FA) Threshold algorithm (TA) –No Random Accesses algorithm (NRA) Prefer

7 6/15/20157 Finding top –k with FA Do sorted access (in parallel) to each of the lists X i until at least k objects are seen in each of the lists For each object t seen, do random accesses to the rest of the lists Compute Q (t) for each object seen. Y is the set having k objects seen with the highest grades

8 6/15/20158 FA example Find top-2 with Q: min(x 1, x 2 ) (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ Sorted Χ 1 Sorted Χ 2 R IDX1X1 X2X2 a0.90.85 b0.80.7 c0.720.2........................ d0.60.9

9 6/15/20159 FA example STEP 1 – Read attributes from every sorted list – Stop when k objects have been seen in common from all lists (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ Χ1Χ1 Χ2Χ2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ c ID Χ1Χ1 Χ2Χ2 min(x 1,x 2 ) a d 0.9 0.85 b 0.8 0.72 0.7

10 6/15/201510 FA example STEP 2 – Random access to find missing grades (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ Χ1Χ1 Χ2Χ2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ c ID Χ1Χ1 Χ2Χ2 min(x 1,x 2 ) a d 0.9 0.85 b 0.8 0.72 0.7 0.6 0.2

11 6/15/201511 c IDΧ1Χ1 Χ2Χ2 min(x 1,x 2 ) (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ Χ1Χ1 Χ2Χ2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 b 0.8 0.72 0.7 0.6 0.2 0.85 0.6 0.7 0.2 FA example STEP 3 – Compute the grades of the seen objects. – Return the k highest graded objects.

12 6/15/201512 Topics of Discussion Fagin’s algorithm (FA) Threshold algorithm (TA) –No Random Accesses algorithm (NRA) Prefer

13 6/15/201513 Finding top –k with TA Do sorted access (in parallel) to each of the lists X i and random accesses to the other lists. Compute Q (t) for every object t seen. Remember k highest objects. For each list X i let x i be the last grade seen. Compute threshold value τ = Q(x 1, …, x m ). Halt when at least k objects have grade ≥ τ Y is the set having k objects seen with the highest grades

14 6/15/201514 TA example Find top-2 with Q: min(x 1, x 2 ) (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ Sorted Χ 1 Sorted Χ 2 R IDX1X1 X2X2 a0.90.85 b0.80.7 c0.720.2........................ d0.60.9

15 6/15/201515 IDΧ1Χ1 Χ2Χ2 min(x 1,x 2 ) Step 1: - parallel sorted access to each list (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ X1X1 X2X2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 0.6 For each object seen: - get all grades by random access - determine min(x 1,x 2 ) - amongst 2 highest seen ? keep in buffer TA example

16 6/15/201516 IDX1X1 X2X2 min(x 1,x 2 ) (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ Χ1Χ1 Χ2Χ2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ Step 2: - Determine threshold value based on objects currently seen under sorted access. τ = min(x 1, x 2 ) a d 0.9 0.85 0.6 T = min(0.9, 0.9) = 0.9 - 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1 TA example

17 6/15/201517 IDX1X1 X2X2 min(X 1,X 2 ) Step 1 (Again): - parallel sorted access to each list (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ X1X1 X2X2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 0.6 For each object seen: - get all grades by random access - determine min(x 1,x 2 ) - amongst 2 highest seen ? keep in buffer b0.80.7 TA example

18 6/15/201518 IDX1X1 X2X2 min(x 1,x 2 ) (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ X1X1 X2X2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ Step 2 (Again): - Determine threshold value based on objects currently seen. τ =min(X 1, X 2 ) a b 0.9 0.7 0.85 0.8 0.7 τ = min(0.8, 0.85) = 0.8 - 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1 TA example

19 6/15/201519 IDΧ1Χ1 Χ2Χ2 min(x 1,x 2 ) (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ Χ1Χ1 Χ2Χ2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ Situation at stopping condition a b 0.9 0.7 0.85 0.8 0.7 τ = min(0.72, 0.7) = 0.7 TA example

20 6/15/201520 Topics of Discussion Fagin’s algorithm (FA) Threshold algorithm (TA) –No Random Accesses algorithm (NRA) Prefer

21 6/15/201521 Finding top –k with NRA Do sorted access (in parallel) to each of the lists –Maintain last grades seen x i –For every object t compute W t and B t –Topk = {k objects with highest W} and M k = k th highest W –Viable object when B t >M k, t belongs in R Halt when B t ≤ M k for all objects not in Topk

22 6/15/201522 Define W and B Lower bound W=(x 1, x 2,…,x l,0,…0) Upper bound B=(x 1, x 2, …,x l,x l+1,..) E.g. f(x 1, x 2, x 3 )=x 1 +x 2 +x 3 x1 a:0.7...... x2 a:0.8...... x3 d:0.9...... W a =(0.7, 0.8, 0) = 1.5 B a =(0.7, 0.8, 0.9) = 2.4

23 6/15/201523 NRA example lists sorted by score f:0.6d:0.6q:0.9 n:0.5g:0.6d:0.7 q:0.4c:0.6j:0.3 d:0.3a:0.6p:0.2 e:0.2q:0.5m:0.1 r:0.1e:0.3b:0.1 h:0.1 Χ1Χ1 Χ3Χ3 Χ2Χ2 Find top-2 with Q: sum(x1, x2, x3)

24 6/15/201524 NRA example lists sorted by score f:0.6d:0.6q:0.9 n:0.5g:0.6d:0.7 q:0.4c:0.6j:0.3 d:0.3a:0.6p:0.2 e:0.2q:0.5m:0.1 r:0.1e:0.3b:0.1 h:0.1 Χ1Χ1 Χ3Χ3 Χ2Χ2 N k =2.1 ≤ M k =0.6 IDBW q d f 0.9 0.6 2.1 0.6 2.1 Topk

25 6/15/201525 NRA example lists sorted by score f:0.6d:0.6q:0.9 n:0.5g:0.6d:0.7 q:0.4c:0.6j:0.3 d:0.3a:0.6p:0.2 e:0.2q:0.5m:0.1 r:0.1e:0.3b:0.1 h:0.1 Χ1Χ1 Χ3Χ3 Χ2Χ2 N k =1.9 ≤ M k =0.9 IDBW q d f 1.3 0.9 1.8 0.6 2.0 1.9 g n 0.6 1.8 0.5 Topk

26 6/15/201526 NRA example lists sorted by score f:0.6d:0.6q:0.9 n:0.5g:0.6d:0.7 q:0.4c:0.6j:0.3 d:0.3a:0.6p:0.2 e:0.2q:0.5m:0.1 r:0.1e:0.3b:0.1 h:0.1 Χ1Χ1 Χ3Χ3 Χ2Χ2 N k =1.5 ≤ M k =1.3 IDBW q d f 1.3 1.9 0.6 1.7 1.5 n g 0.6 1.4 1.3 0.5 c j 1.30.6 1.30.3 Topk

27 6/15/201527 NRA example lists sorted by score f:0.6d:0.6q:0.9 n:0.5g:0.6d:0.7 q:0.4c:0.6j:0.3 d:0.3a:0.6p:0.2 e:0.2q:0.5m:0.1 r:0.1e:0.3b:0.1 h:0.1 Χ1Χ1 Χ3Χ3 Χ2Χ2 N k =1.4 ≤ M k =1.3 IDBW q d f 1.6 1.3 1.6 0.6 1.9 1.4 a g0.6 1.1 0.6 c n 1.1 0.6 1.3 0.5 p j0.3 0.2 1.2 1.1 Topk

28 6/15/201528 NRA example lists sorted by score f:0.6d:0.6q:0.9 n:0.5g:0.6d:0.7 q:0.4c:0.6j:0.3 d:0.3a:0.6p:0.2 e:0.2q:0.5m:0.1 r:0.1e:0.3b:0.1 h:0.1 Χ1Χ1 Χ3Χ3 Χ2Χ2 N k =1.2 ≤ M k =1.6 IDBW d q f 1.8 1.6 1.8 0.6 1.6 1.2 a g0.6 0.9 0.6 c n 0.9 0.6 1.1 0.5 p j0.3 0.2 1.0 0.9 e m 0.2 0.8 0.10.8 Topk

29 6/15/201529 Topics of Discussion Fagin’s algorithm (FA) Threshold algorithm (TA) –No Random Accesses algorithm (NRA) Prefer

30 6/15/201530 Finding top –k with PREFER Step1: View selection algorithm –materializes a number of ranked views V of the relation R and uses them to efficiently answer preference queries Q. Step2: Pipelined algorithm –Define 1 st watermark –Output first tuples according to 1 st watermark –Define 2 nd watermark –Output second tuples according to 2 nd watermark –…

31 6/15/201531 Finding top –k with PREFER Determine watermark –How deep in V we must go to output the top result tuple t q 1 such that – if t in V is below then t can’t be t q 1 since t v 1 has higher score over Q tf v (t) V Watermark

32 6/15/201532 Finding top –k with PREFER Determine t q 1 according to –Scan V from top and retrieve prefix [t v 1, t v 2,…, t v w ) where t v w first tuple in V with score less than –Order prefix according to Q, [t q 1,…, t q w-1 ]. Let t q s be the position of t v 1 according to Q. tf v (t) a0.9 b0.8 c0.7 d0.5 V Watermark =0.65 tv1tv1 tv2tv2 tv3tv3 Order according to Q tq1tq1 tq2tq2 tq3tq3 a=t v 1 c=t v 3 b=t v 1 =t q s

33 6/15/201533 PREFER example X3X2X1ID f q (t)f v (t) g f e d 5512 51015 12105 8 15 5.76.4 99 10.19.8 9.910.2 c121817 16.115.4 b1120 17.316.4 a201710 17.216.8 Find top-4 with: f v (t)=0.2*X1+0.4*X2+0.4*X3 f q (t)=0.1*X1+0.6*X2+0.3*X3 t1t1 Watermark=14.26 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t1 a20171017.216.8 b a ID b1120 17.316.4

34 6/15/201534 PREFER example X3X2X1ID f q (t)f v (t) g f e d 5512 51015 12105 8 15 5.76.4 99 10.19.8 9.910.2 c121817 16.115.4 b1120 17.316.4 a201710 17.216.8 a20171017.216.8 b a ID b1120 17.316.4 t1t1 1.Calculate Watermark for t 1, which is 13.1 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t1 4.Repeat using first unprocessed as t1 Find top-4 with: f v (t)=0.2*X1+0.4*X2+0.4*X3 f q (t)=0.1*X1+0.6*X2+0.3*X3

35 6/15/201535 PREFER example X3X2X1ID f q (t)f v (t) g f e d 5512 51015 12105 8 15 5.76.4 99 10.19.8 9.910.2 c121817 16.115.4 b1120 17.316.4 a201710 17.216.8 a20171017.216.8 b a c ID b1120 17.316.4 t1t1 1.Calculate Watermark for t 1, which is 13.1 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t1 4.Repeat using first unprocessed as t1 Find top-4 with: f v (t)=0.2*X1+0.4*X2+0.4*X3 f q (t)=0.1*X1+0.6*X2+0.3*X3

36 6/15/201536 PREFER example X3X2X1ID f q (t)f v (t) g f e d 5512 51015 12105 8 15 5.76.4 99 10.19.8 9.910.2 c121817 16.115.4 b1120 17.316.4 a201710 17.216.8 a20171017.216.8 b a c ID b1120 17.316.4 t1t1 1.Calculate Watermark for t 1, which is 8.3 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t1 4.Repeat using first unprocessed as t1 Find top-4 with: f v (t)=0.2*X1+0.4*X2+0.4*X3 f q (t)=0.1*X1+0.6*X2+0.3*X3

37 6/15/201537 PREFER example X3X2X1ID f q (t)f v (t) g f 5512 51015 5.76.4 99 e12105 10.1 9.8d81015 9.910.2 c121817 16.115.4 b1120 17.316.4 a201710 17.216.8 a20171017.216.8 b a c d e ID b1120 17.316.4 t1t1 1.Calculate Watermark for t 1, which is 8.3 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t1 4.Repeat using first unprocessed as t1 Find top-4 with: f v (t)=0.2*X1+0.4*X2+0.4*X3 f q (t)=0.1*X1+0.6*X2+0.3*X3

38 6/15/201538 Citations Ronald Fagin, Amnon Lotem, Moni Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), pp. 614-656, 2003. Ronald Fagin. Combining fuzzy information from multiple systems. In Proc. of the 15th ACM Symposium on principles of database systems, pp. 216-226, Montreal Canada, 1996. Ronald Fagin. Fuzzy queries in multimedia database systems. In Proc. of the 17th ACM Symposium on principles of database systems, pp. 1-10, Seattle USA, 1998. Ulrich Güntzer, Wolf-Tilo Balke, Werner Kießling. Optimizing Multi- Feature Queries for Image Databases. In proc. of the 26th VLDB conference, pp. 419-428, Cairo Egypt, 2000. Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou. PREFER a system for the efficient execution of multi-parametric ranked queries. In Proc. of the ACM Special Interest Group on Management of Data Conference (SIGMOD), pp. 259-270, Santa Barbara USA, 2001 Vagelis Hristidis, Yannis Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDB journal, 13(1), pp. 49-70, 2004. Surya Nepal, M. V. Ramakrishna. Query Processing Issues in Image (Multimedia) Databases. In Proc. 15th International Conference on Data Engineering (ICDE), pp. 22-29, Sydney Australia, March 1999.

39 6/15/201539 Questions


Download ppt "6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades."

Similar presentations


Ads by Google