Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Similar presentations


Presentation on theme: "Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC."— Presentation transcript:

1 Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

2 10/16/20152 Why ranking in query answering? 1/3 Mutimedia data – fuzzy querying: e.g., “find top 2 red objects with a soft texture”. ObjScore D0.85 B0.80 A0.75 E0.65 C0.60 ObjScore A0.9 D0.8 C0.4 B0.3 E0.1 Combine scores Overall score

3 10/16/20153 Why ranking? 2/3 IR: “find top 5 documents relevant to `computational’, `neuroscience’ and `brain theory’. –IR systems maintain full text indexes; inverted lists of docs w.r.t. each keyword. –Same Q/A paradigm as before. Buying a home: several criteria – price, location, area, #BRs, school district. ORDER BY query in SQL. Finding hotels while traveling.

4 10/16/20154 Why ranking? 3/3 Data stream, e.g., of network flow data: “find 10 users with the max. BW consumption and max. #packets communicated”. – score may be complex aggregation of these two measures. In a social net, find 5 items tagged as most relevant to “lawn mowing” and blonging to users socially close to the seeker. And now, find top-k recs (recommender systems). etc. Fagin et al. – pioneering papers PODS’96, 01, JCSS 2003. Burgeoned into a field now. Focus on middleware algorithm, which given a score combo. function, computes top-k answers by probing diff. subsystems (or ranked lists).

5 10/16/20155 Computational model Naïve method. How to compute top-K efficiently? Access methods: –Sorted access (sequential access) [SA]. –Random access [RA]. Diff. optimization metrics: –Overall running time of algorithm. –SA < RA: minimize RAs. –RA not possible  # : avoid RAs. –Combined optimization. Has led to a variety of algorithms. Memory vs. disk model. For the most part, assume score agg. is a monotone function; use SUM in examples. #: typical in IR systems.

6 10/16/20156 Fagin’s Algorithm (FA) m lists sorted by descending scores. Access (SA) all lists in parallel. –For each new object seen, fetch scores from other lists by RA. Overall score t(x) = t(x1, …, xm). Store (obj, score) in set Y. –Remember each object seen (under SA) in all lists in set H. Repeat until |H| >= K. Sort Y in descending order of scores, breaking ties arbitrarily, and output top K.

7 10/16/20157 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ Answers seen in >=1 list, i.e., Y unsorted. B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen (under SA) in all 4 lists, i.e., H.

8 10/16/20158 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H.

9 10/16/20159 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ 3.30 B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H.

10 10/16/201510 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ 3.30 B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H. 2.65

11 10/16/201511 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ 3.30 B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H. 2.65 3.40 3.05

12 10/16/201512 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H. 3.05 3.40 3.05 3.15 3.30 2.65

13 10/16/201513 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H. 3.05 3.40 3.05 3.15 3.30 2.65 2.55

14 10/16/201514 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H. 3.05 3.40 3.05 3.15 3.30 2.65 2.55 H

15 10/16/201515 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H. 3.05 3.40 3.05 3.15 3.30 2.65 2.55 H, G

16 10/16/201516 Example of FA L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) Answers seen in >=1 list, i.e., Y unsorted. Answers seen (under SA) in all 4 lists, i.e., H. 3.05 3.40 3.05 3.15 3.30 2.65 2.55 H, G, B, C 2.05 |H| = 4.

17 10/16/201517 FA Example concluded A, F – not seen in any list. Yet, we are sure they can’t make it to top-4. Why? Based on where the cursors are now, what’s the max. possible score for A, F? What assumptions are being made about t()? FA is shown to be optimal with very high probability [Fagin: PODS 1996]. But can be beaten by other algorithms on specific inputs. What about buffer size?

18 10/16/201518 Threshold Algorithm Do parallel SA on all m lists. For each object x seen under SA in a list, fetch its scores from other lists by RA and compute overall score. If |Buffer| < K add x to Buffer; Else if score(x) <= k-th score in buffer, toss; Else replace bottom of buffer with (x, score(x)) & resort. Stop when threshold <= k-th score in buffer. Threshold := t(worst score seen on L1, …, worst score seen on Lm). Output the top-K objects & scores (in buffer).

19 10/16/201519 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30)

20 10/16/201520 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30)

21 10/16/201521 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) 3.30 Threshold Bar: x1 x2 x3 x4 0.95 1.00

22 10/16/201522 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) 3.30 Threshold Bar: T = 3.90. x1 x2 x3 x4 0.95 1.00 3.40 3.05 2.65

23 10/16/201523 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) 3.30 Threshold Bar: T=3.60. x1 x2 x3 x4 0.90 0.95 0.80 0.95 3.40 3.05 2.65 X 3.05 X 3.15

24 10/16/201524 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) 3.30 Threshold Bar: T=3.30. x1 x2 x3 x4 0.85 0.85 0.70 0.90 3.40 3.05 2.65 X 3.05 X 3.15 2.55 X

25 10/16/201525 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) 3.30 Threshold Bar: T=3.10. x1 x2 x3 x4 0.80 0.80 0.65 0.85 3.40 3.05 2.65 X 3.05 X 3.15 2.55 X

26 10/16/201526 TA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) 3.30 Threshold Bar: T=2.90. ==> can stop! x1 x2 x3 x4 0.75 0.75 0.60 0.80 3.40 3.05 2.65 X 3.05 X 3.15 2.55 X

27 10/16/201527 TA Remarks

28 TA is Instance Optimal 10/16/201528

29 TA IO Proof (contd.) 10/16/201529

30 Proof (contd.) 10/16/201530

31 Proof (contd.) 10/16/201531

32 Proof (contd.) 10/16/201532

33 Proof (concluded) 10/16/201533

34 10/16/201534 No Random Access Algorithm What if RA > SA or RA wasn’t allowed? Do SA on all lists in parallel. At depth d: –Maintain worst scores x1, …, xm. –x any object seen in lists {1, …, i}. Best(x) = t(x1, …, xi, xi+1, …, xm). Worst(x) = t(x1, …, xi, 0, …, 0). –TopK contains K objects with max worst scores at depth d. Break ties using Best. M = k-th Worst score in TopK. –Object y is viable if Best(y) > M. Stop when TopK contains >=K distinct objects and no object outside TopK is viable. Return TopK.

35 10/16/201535 NRA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) [0.95, 3.90] [1.00, 3.90] [0.95, 3.90] [1.00, 3.90]

36 10/16/201536 NRA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) [0.95, 3.65] [1.80, 3.65] [1.90, 3.75] [1.00, 3.65] [0.90, 3.60] [0.95, 3.60]

37 10/16/201537 NRA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) [1.85, 3.40] [1.80, 3.55] [1.90, 3.65] [1.85, 3.40] [0.90, 3.35] [1.80, 3.35] [0.70, 3.30]

38 10/16/201538 NRA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) [3.30, 3.30] [1.80, 3.45] [2.70, 3.55] [1.85, 3.30] [1.75, 3.20] [1.80, 3.25] [0.70, 3.15]

39 10/16/201539 NRA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) [3.30, 3.30] [1.80, 3.35] [2.70, 3.50] [2.60, 3.20] [1.75, 3.10] [3.15, 3.15] [1.50, 3.00]

40 10/16/201540 NRA Example L1L2L3L4 H(0.95) C(0.80 ABCDEFGHIJABCDEFGHIJ B(0.90) E(0.85) G(0.75) I(0.70) D(0.65) A(0.60) J(0.55) F(0.50) J(1.00) C(0.95) G(0.85) H(0.80) E(0.75) B(0.75) F(0.60) A(0.50) D(0.40) I(0.30) C(0.95) J(0.80) D(0.70) H(0.65) G(0.60) B(0.55) I(0.50) E(0.45) F(0.40) A(0.30) E(1.00) G(0.95) H(0.90) B(0.85) D(0.80) C(0.70) A(0.65) I(0.55) F(0.45) J(0.30) [3.30, 3.30] [1.80, 3.20] [3.40, 3.40] [2.60, 3.15] [3.05, 3.05] [3.15, 3.15] [1.50, 2.95] [0.70, 2.70]

41 10/16/201541 NRA Features What sort of t() do we need to assume, for NRA to work correctly? How large can the buffers get? How does the amount of bookkeeping compare with TA? NRA is instance optimal over algo’s not making RA (and of course, not making wild guesses).

42 10/16/201542 Combined optimization What if we are told cost(RA) = .cost(SA)? Can we find algo’s better than NRA and TA in this case? Combined algorithm = CA. (See Fagin et al.’s paper for details.)

43 10/16/201543 Worrying about I/O cost Based on Bast et al. VLDB 2006. Inverted lists of (itemID, score) entries in desc. score order, as usual, but on disk. Blocks sorted by itemID; across blocks still in desc. score order.  Inverted Block Index (IBI) Algorithm. What is an IBI?

44 10/16/201544 A Motivating Example List 1 List 2 List 3 Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7. Doc14 : 0.5 Doc61 : 0.3 · Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8, 2.4] Doc25 : [0.7, 2.4] Doc83 : [0.9, 2.4] unseen: ≤ 2.4

45 10/16/201545 A Motivating Example List 1 List 2 List 3 Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7. Doc14 : 0.5 Doc61 : 0.3 · Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8, 2.4] Doc25 : [0.7, 2.4] Doc83 : [0.9, 2.4] unseen: ≤ 2.4 Round 2 (SA on 1,2,3) Doc17 : [1.5, 2.0] Doc25 : [0.7, 1.6] Doc83 : [0.9, 1.6] unseen: ≤ 1.4

46 10/16/201546 A Motivating Example List 1 List 2 List 3 Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7. Doc14 : 0.5 Doc61 : 0.3 · Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8, 2.4] Doc25 : [0.7, 2.4] Doc83 : [0.9, 2.4] unseen: ≤ 2.4 Round 2 (SA on 1,2,3) Doc17 : [1.5, 2.0] Doc25 : [0.7, 1.6] Doc83 : [0.9, 1.6] unseen: ≤ 1.4 Round 3 (SA on 2,2,3!) Doc17 : [1.5, 2.0] Doc83 : [1.4, 1.6] unseen: ≤ 1.0

47 10/16/201547 A Motivating Example List 1 List 2 List 3 Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7. Doc14 : 0.5 Doc61 : 0.3 · Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8, 2.4] Doc25 : [0.7, 2.4] Doc83 : [0.9, 2.4] unseen: ≤ 2.4 Round 2 (SA on 1,2,3) Doc17 : [1.5, 2.0] Doc25 : [0.7, 1.6] Doc83 : [0.9, 1.6] unseen: ≤ 1.4 Round 3 (SA on 2,2,3!) Doc17 : [1.5, 2.0] Doc83 : [1.4, 1.6] unseen: ≤ 1.0 Round 4 (RA for Doc17) Doc17 : 1.7 all others < 1.7 done! Note deviation from round-robin.

48 10/16/201548 IBI Algorithm Same setting as NRA/CA, except use IBI. Maintain two lists: Top-K items (T = d1, …, dk) and StillHaveASHot (SHASH) (S = dk+1, …, dk+q) items. Pos_i = curr cursor position on list Li. high_i = score in Li at curr cursor position (upper bounds score of unseen items). For items d in S: –Which attr scores are known E(d). –Which attr scores are unknown E~(d). –Worst(d) = total score from E(d). –Best(d) = Worst(d) +  {high_i(d) | i  E~(d)}. (Exactly as Fagin.)

49 10/16/201549 IBI Algorithm (contd.) In each round, compute: –min-k = min{Worst(d) | d  T}. –bestscore that any unseen doc can have = sum of all high_i’s. –For dj  S: def_j = min-k – worst(d_j). [denotes deficit below qualification level for top-k.] T sorted in desc. Worst(); S sorted in desc. Best(). [sorting on (score, ItemID) for fast processing.] Invatiant: min-k >= max{Worst(d) | d  S}. Termination: when min-k >= max{Best(d) | d  S}. Can remove an obj from S whenever its Best <= min-k.  stop when S = {}. Early termination AND minimal bookkeeping are BOTH important for performance.

50 10/16/201550 More on IBI Framework Instead of scheduling SAs using RR, use a differential approach for diff. lists based on expected score reductions at future cursor positions (Knapsack). Do SA*RA*. Order RAs based on estimated Prob[dj can get into top-k answers].


Download ppt "Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC."

Similar presentations


Ads by Google