Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laks V.S. Lakshmanan Depf. of CS UBC

Similar presentations


Presentation on theme: "Laks V.S. Lakshmanan Depf. of CS UBC"— Presentation transcript:

1 Laks V.S. Lakshmanan Depf. of CS UBC
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

2 Why ranking in query answering? 1/3
Mutimedia data – fuzzy querying: e.g., “find top 2 red objects with a soft texture”. Obj Score D 0.85 B 0.80 A 0.75 E 0.65 C 0.60 Obj Score A 0.9 D 0.8 C 0.4 B 0.3 E 0.1 Overall score Combine scores 11/29/2018

3 Why ranking? 2/3 IR: “find top 5 documents relevant to `computational’, `neuroscience’ and `brain theory’. IR systems maintain full text indexes; inverted lists of docs w.r.t. each keyword. Same Q/A paradigm as before. 11/29/2018

4 Why ranking? 3/3 Data stream, e.g., of network flow data: “find 10 users with the max. BW consumption and max. #packets communicated”. In a social net, find 5 items tagged as most relevant to “lawn mowing” by user’s friends. etc. Fagin et al. – pioneering papers PODS’96, 01, TODS Burgeoned into a field now. Focus on middleware algorithm, which given a score combo. function, computes top-K answers by probing diff. subsystems (or ranked lists). 11/29/2018

5 Computational model Naïve method. How to compute top-K efficiently?
Access methods: Sorted access (sequential access) [SA]. Random access [RA]. Diff. optimization metrics: Overall running time of algorithm. SA < RA: minimize RAs. RA not possible#: avoid RAs. Combined optimization. Has led to a variety of algorithms. Memory vs. disk model. #: typical in IR systems. 11/29/2018

6 Fagin’s Algorithm (FA)
m lists sorted by descending scores. Access (SA) all lists in parallel. For each new object seen, fetch scores from other lists by RA. Overall score t(x) = t(x1, …, xm). Store (obj, score) in set Y. Remember each object seen (under SA) in all lists in set H. Repeat until |H| >= K. For each seen object, do RA on lists as needed to find “missing” scores. Compute score of x as t(x) = t(x1, …, xm). Sort Y in descending order of scores, breaking ties arbitrarily, and output top K. 11/29/2018

7 Example of FA L1 L2 L3 L4 A B C D E F G H I J
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) I(0.70) B(0.75) B(0.55) C(0.70) D(0.65) F(0.60) I(0.50) A(0.65) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

8 Example of FA L1 L2 L3 L4 A B C D E F G H I J
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) I(0.70) B(0.75) B(0.55) C(0.70) D(0.65) F(0.60) I(0.50) A(0.65) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

9 Example of FA L1 L2 L3 L4 A B C D E F G H I J 3.30
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

10 Example of FA L1 L2 L3 L4 A B C D E F G H I J 3.30 2.65
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

11 Example of FA L1 L2 L3 L4 A B C D E 3.40 F G H I 3.05 J 3.30 2.65
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) 3.40 B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) 3.05 H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

12 Example of FA L1 L2 L3 L4 A B C 3.05 D E 3.40 F G H I 3.05 J 3.15 3.30
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) 3.05 3.40 B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) 3.05 H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) 3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

13 Example of FA L1 L2 L3 L4 A B C 3.05 D E 3.40 F G 2.55 H I 3.05 J 3.15
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) 3.05 3.40 B(0.90) C(0.95) J(0.80) G(0.95) 2.55 E(0.85) G(0.85) D(0.70) H(0.90) 3.05 H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) 3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

14 Example of FA L1 L2 L3 L4 A B C 3.05 D E 3.40 F G 2.55 H I 3.05 J 3.15
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) 3.05 3.40 B(0.90) C(0.95) J(0.80) G(0.95) 2.55 E(0.85) G(0.85) D(0.70) H(0.90) 3.05 H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) 3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) H 11/29/2018

15 Example of FA L1 L2 L3 L4 A B C 3.05 D E 3.40 F G 2.55 H I 3.05 J 3.15
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) 3.05 3.40 B(0.90) C(0.95) J(0.80) G(0.95) 2.55 E(0.85) G(0.85) D(0.70) H(0.90) 3.05 H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) 3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) H, G 11/29/2018

16 Example of FA L1 L2 L3 L4 A B C 3.05 D E 3.40 F G 2.55 H I 3.05 J 3.15
Answers seen in >=1 list, i.e., Y unsorted. L1 L2 L3 L4 H(0.95) C(0.80 A B C D E F G H I J J(1.00) C(0.95) E(1.00) 3.05 3.40 B(0.90) C(0.95) J(0.80) G(0.95) 2.55 E(0.85) G(0.85) D(0.70) H(0.90) 3.05 H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) 3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 2.05 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Answers seen (under SA) in all 4 lists, i.e., H. F(0.50) I(0.30) A(0.30) J(0.30) H, G, B, C 11/29/2018 |H| = 4.

17 FA Example concluded A, F – not seen in any list. Yet, we are sure they can’t make it to top-4. Why? Based on where the cursors are now, what’s the max. possible score for A, F? What assumptions are being made about t()? FA is shown to be optimal with very high probability [Fagin: PODS 1996]. But can be beaten by other algorithms on specific inputs. What about buffer size? 11/29/2018

18 Threshold Algorithm Do parallel SA on all m lists.
For each new object x, fetch its scores from other lists and compute overall score. If |Buffer| < K add x to Buffer; Else if score(x) <= k-th score in buffer, toss; Else replace bottom of buffer with (x, score(x)). Stop when threshold <= k-th score in buffer. Threshold := t(worst score seen on L1, …, worst score seen on Lm). Output the top-K objects & scores (in buffer). 11/29/2018

19 TA Example L1 L2 L3 L4 A B C D E F G H I J H(0.95) C(0.80 J(1.00)
11/29/2018

20 TA Example L1 L2 L3 L4 A B C D E F G H I J H(0.95) C(0.80 J(1.00)
11/29/2018

21 TA Example L1 L2 L3 L4 A B C D E F G H I J 3.30 H(0.95) C(0.80 J(1.00)
Threshold Bar: J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) A(0.30) J(0.30) x x x3 x4 11/29/2018

22 TA Example L1 L2 L3 L4 A B C D E 3.40 F G H I 3.05 J 3.30 2.65 H(0.95)
Threshold Bar: T = 3.90. J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) A(0.30) J(0.30) x x x3 x4 11/29/2018

23 TA Example L1 L2 L3 L4 A B C 3.05 X D E 3.40 F G H I 3.05 J 3.15 3.30
Threshold Bar: T=3.60. J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) A(0.30) J(0.30) x x x3 x4 11/29/2018

24 TA Example L1 L2 L3 L4 A B C 3.05 X D E 3.40 F G 2.55 X H I 3.05 J
3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 X A(0.60) A(0.50) E(0.45) I(0.55) Threshold Bar: T=3.30. J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) A(0.30) J(0.30) x x x3 x4 11/29/2018

25 TA Example L1 L2 L3 L4 A B C 3.05 X D E 3.40 F G 2.55 X H I 3.05 J
3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 X A(0.60) A(0.50) E(0.45) I(0.55) Threshold Bar: T=3.10. J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) A(0.30) J(0.30) x x x3 x4 11/29/2018

26 TA Example L1 L2 L3 L4 A B C 3.05 X D E 3.40 F G 2.55 X H I 3.05 J
3.15 I(0.70) B(0.75) B(0.55) C(0.70) 3.30 D(0.65) F(0.60) I(0.50) A(0.65) 2.65 X A(0.60) A(0.50) E(0.45) I(0.55) Threshold Bar: T=2.90. ==> can stop! J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) A(0.30) J(0.30) x x x3 x4 11/29/2018

27 TA Remarks What properties do we require of t() for TA to be correct?
How large does the buffer ever get with TA? What happened with FA? Performance guarantee of TA (instance optimality): D – class of DBs; A – class of algorithms; A A is instance optimal provided BA, DD, cost(A,D) = c.cost(B,D) + c’, for some fixed constants c, c’. c = optimality ratio. TA is instance optimal over algo’s not making wild guesses. 11/29/2018

28 No Random Access Algorithm
What if RA > SA or RA wasn’t allowed? Do SA on all lists in parallel. At depth d: Maintain worst scores x1, …, xm. x any object seen in lists {1, …, i}. Best(x) = t(x1, …, xi, xi+1, …, xm). Worst(x) = t(x1, …, xi, 0, …, 0). TopK contains K objects with max worst scores at depth d. Break ties using Best. M = k-th Worst score in TopK. Object y is viable if Best(y) > M. Stop when TopK contains >=K distinct objects and no object outside TopK is viable. Return TopK. 11/29/2018

29 NRA Example L1 L2 L3 L4 A B C D E F G H I J H(0.95) C(0.80 J(1.00)
[0.95, 3.90] B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) [1.00, 3.90] H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) G(0.60) D(0.80) [0.95, 3.90] I(0.70) B(0.75) B(0.55) C(0.70) D(0.65) F(0.60) I(0.50) A(0.65) [1.00, 3.90] A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) A(0.30) J(0.30) 11/29/2018

30 NRA Example L1 L2 L3 L4 A B C D E F G H I J H(0.95) C(0.80 J(1.00)
[0.90, 3.60] [1.90, 3.75] B(0.90) C(0.95) J(0.80) G(0.95) E(0.85) G(0.85) D(0.70) H(0.90) [1.00, 3.65] H(0.80) H(0.65) B(0.85) G(0.75) E(0.75) [0.95, 3.60] G(0.60) D(0.80) [0.95, 3.65] I(0.70) B(0.75) B(0.55) C(0.70) D(0.65) F(0.60) I(0.50) A(0.65) [1.80, 3.65] A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) A(0.30) J(0.30) 11/29/2018 F(0.50) I(0.30)

31 NRA Features What sort of t() do we need to assume, for NRA to work correctly? How large can the buffers get? How does the amount of bookkeeping compare with TA? NRA is instance optimal over algo’s not making RA 11/29/2018

32 Combined optimization
What if we are told cost(RA) = .cost(SA)? Can we find algo’s better than NRA and TA in this case? Combined algorithm = CA. (See Fagin et al.’s paper for details.) 11/29/2018

33 Worrying about I/O cost
Based on Bast et al. VLDB 2006. Inverted lists of (itemID, score) entries in desc. score order, as usual, but on disk. Blocks sorted by itemID; across blocks still in desc. score order.  Inverted Block Index (IBI) Algorithm. What is an IBI? 11/29/2018

34 A Motivating Example List 1 List 2 List 3
Doc17 : Doc25 : Doc83 : 0.9 Doc78 : Doc38 : Doc17 : 0.7 Doc14 : Doc61 : 0.3 · Doc83 : · · · · · Doc17 : · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 2.4] unseen: ≤ 2.4 11/29/2018

35 A Motivating Example List 1 List 2 List 3
Doc17 : Doc25 : Doc83 : 0.9 Doc78 : Doc38 : Doc17 : 0.7 Doc14 : Doc61 : 0.3 · Doc83 : · · · · · Doc17 : · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 2.4] unseen: ≤ 2.4 Round 2 (SA on 1,2,3) Doc17 : [1.5 , 2.0] Doc25 : [0.7 , 1.6] Doc83 : [0.9 , 1.6] unseen: ≤ 1.4 11/29/2018

36 A Motivating Example List 1 List 2 List 3
Doc17 : Doc25 : Doc83 : 0.9 Doc78 : Doc38 : Doc17 : 0.7 Doc14 : Doc61 : 0.3 · Doc83 : · · · · · Doc17 : · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 2.4] unseen: ≤ 2.4 Round 2 (SA on 1,2,3) Doc17 : [1.5 , 2.0] Doc25 : [0.7 , 1.6] Doc83 : [0.9 , 1.6] unseen: ≤ 1.4 Round 3 (SA on 2,2,3!) Doc17 : [1.5 , 2.0] Doc83 : [1.4 , 1.6] unseen: ≤ 1.0 11/29/2018

37 A Motivating Example List 1 List 2 List 3
Doc17 : Doc25 : Doc83 : 0.9 Doc78 : Doc38 : Doc17 : 0.7 Doc14 : Doc61 : 0.3 · Doc83 : · · · · · Doc17 : · · · · Round 2 (SA on 1,2,3) Doc17 : [1.5 , 2.0] Doc25 : [0.7 , 1.6] Doc83 : [0.9 , 1.6] unseen: ≤ 1.4 Round 1 (SA on 1,2,3) Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 2.4] unseen: ≤ 2.4 Round 3 (SA on 2,2,3!) Doc17 : [1.5 , 2.0] Doc83 : [1.4 , 1.6] unseen: ≤ 1.0 Note deviation from round-robin. Round 4 (RA for Doc17) Doc17 : 1.7 all others < 1.7 done! 11/29/2018

38 IBI Algorithm Same setting as NRA/CA, except use IBI.
Maintain two lists: Top-K items (T = d1, …, dk) and StillHaveASHot (SHASH) (S = dk+1, …, dk+q) items. Pos_i = curr cursor position on list Li. high_i = score in Li at curr cursor position (upper bounds score of unseen items). For items d in S: Which attr scores are known E(d). Which attr scores are unknown E~(d). Worst(d) = total score from E(d). Best(d) = Worst(d) +  {high_i(d) | i E~(d)}. (Exactly as Fagin.) 11/29/2018

39 IBI Algorithm (contd.) In each round, compute:
min-k = min{Worst(d) | d  T}. bestscore that any unseen doc can have = sum of all high_i’s. For dj  S: def_j = min-k – worst(d_j). [denotes deficit below qualification level for top-k.] T sorted in desc. Worst(); S sorted in desc. Best(). [sorting on (score, ItemID) for fast processing.] Invatiant: min-k >= max{Worst(d) | d  S}. Termination: when min-k >= max{Best(d) | d  S}. Can remove an obj from S whenever its Best <= min-k.  stop when S = {}. Early termination AND minimal bookkeeping are BOTH important for performance. 11/29/2018

40 More on IBI Framework Instead of scheduling SAs using RR, use a differential approach for diff. lists based on expected score reductions at future cursor positions (Knapsack). Do SA*RA*. Order RAs based on estimated Prob[dj can get into top-k answers]. 11/29/2018


Download ppt "Laks V.S. Lakshmanan Depf. of CS UBC"

Similar presentations


Ads by Google