Presentation is loading. Please wait.

Presentation is loading. Please wait.

NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider.

Similar presentations


Presentation on theme: "NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider."— Presentation transcript:

1 NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider dj at position posi in Li; 3) 3) E(dj) := E(dj) Є {i}; highi := si(q,dj); 4) 4) bestscore(dj) := aggr{x1,..., xm)   with xi := si(q,dj) for i Є E(dj), highi for i Є E(dj); 5) 5) worstscore(dj) := aggr{x1,..., xm)   with xi := si(q,dj) for i Є E(dj), 0 for i Є E(dj); 6) 6) top-k := k docs with largest worstscore; 7) 7) threshold := bestscore{d | d not in top-k}; 8) 8) if min worstscore top-k ≥ threshold then exit;

2 item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 item 83 [0.9, 2.1] item 17 [0.6, 2.1] item 25 [0.6, 2.1] worst score best-score Min top-2 score : 0.6 Threshold (Max of unseen tuples): 2.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? List 1 List 2 List 3 Candidates 0.6+0.6+0.9=2.1 NRA

3 item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 worst score best-score Min top-2 score : 0.9 Threshold (Max of unseen tuples): 1.8 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? item 17 [1.3, 1.8] item 83 [0.9, 2.0] item 25 [0.6, 1.9] item 38 [0.6, 1.8] item 78 [0.5, 1.8] List 1 List 2 List 3 Candidates NRA

4 item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 worst score best-score item 83 [1.3, 1.9] item 17 [1.3, 1.9] item 25 [0.6, 1.5] item 78 [0.5, 1.4] Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.3 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue List 1 List 2 List 3 Candidates NRA

5 item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 worst score best-score Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue item 17 1.6 item 83 [1.3, 1.9] item 25 [0.6, 1.4] List 1 List 2 List 3 Candidates NRA

6 item 25 0.6 item 17 0.6 item 83 0.9 item 78 0.5 item 38 0.6 item 17 0.7 item 83 0.4 item 14 0.6 item 61 0.3 item 17 0.3 item 5 0.6 item 81 0.2 item 21 0.2 item 83 0.5 item 65 0.1 item 91 0.1 item 21 0.3 item 10 0.1 item 44 0.1 Min top-2 score : 1.6 Threshold (Max of unseen tuples): 0.8 Pruning Candidates: Min top-2 < best score of candidate item 83 1.8 item 17 1.6 List 1 List 2 List 3 Candidates NRA

7  NRA performs only sorted accesses (SA) (No Random Access)  Random access (RA)  lookup actual (final) score of an item  costlier than SA (100 – 100,000 times), cR/cS := (cost of RA)/(cost of SA)  often very useful  CA (Combined Algorithm), (Fagin et al., 2001)  one RA after every cR/cS SAs  total cost of SA ~ total cost of RA  Measure of effectiveness (access cost): #SA + cR/cS x #RA  Full-merge: compute scores for all items followed by partial sort  simple and efficient  important baseline for any top-k algorithm  Problems with NRA, CA  high bookkeeping overhead  for “high” values of k, gain in even access cost not significant NRA

8 References  IO-Top-k: Index-access Optimized Top-k Query Processing Debapriyo Majumdar Max-Planck-Institut f ü r Informatik Saarbr ü cken, Germany Joint work with Holger Bast, Ralf Schenkel, Martin Theobald, Gerhard Weikum   Top-k Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel


Download ppt "NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider."

Similar presentations


Ads by Google