NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider dj at position posi in Li; 3) 3) E(dj) := E(dj) Є {i}; highi := si(q,dj); 4) 4) bestscore(dj) := aggr{x1,..., xm) with xi := si(q,dj) for i Є E(dj), highi for i Є E(dj); 5) 5) worstscore(dj) := aggr{x1,..., xm) with xi := si(q,dj) for i Є E(dj), 0 for i Є E(dj); 6) 6) top-k := k docs with largest worstscore; 7) 7) threshold := bestscore{d | d not in top-k}; 8) 8) if min worstscore top-k ≥ threshold then exit;
item item item item item item item item item item item item item item item item item item item item 83 [0.9, 2.1] item 17 [0.6, 2.1] item 25 [0.6, 2.1] worst score best-score Min top-2 score : 0.6 Threshold (Max of unseen tuples): 2.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? List 1 List 2 List 3 Candidates =2.1 NRA
item item item item item item item item item item item item item item item item item item item worst score best-score Min top-2 score : 0.9 Threshold (Max of unseen tuples): 1.8 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? item 17 [1.3, 1.8] item 83 [0.9, 2.0] item 25 [0.6, 1.9] item 38 [0.6, 1.8] item 78 [0.5, 1.8] List 1 List 2 List 3 Candidates NRA
item item item item item item item item item item item item item item item item item item item worst score best-score item 83 [1.3, 1.9] item 17 [1.3, 1.9] item 25 [0.6, 1.5] item 78 [0.5, 1.4] Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.3 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue List 1 List 2 List 3 Candidates NRA
item item item item item item item item item item item item item item item item item item item worst score best-score Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue item item 83 [1.3, 1.9] item 25 [0.6, 1.4] List 1 List 2 List 3 Candidates NRA
item item item item item item item item item item item item item item item item item item item Min top-2 score : 1.6 Threshold (Max of unseen tuples): 0.8 Pruning Candidates: Min top-2 < best score of candidate item item List 1 List 2 List 3 Candidates NRA
NRA performs only sorted accesses (SA) (No Random Access) Random access (RA) lookup actual (final) score of an item costlier than SA (100 – 100,000 times), cR/cS := (cost of RA)/(cost of SA) often very useful CA (Combined Algorithm), (Fagin et al., 2001) one RA after every cR/cS SAs total cost of SA ~ total cost of RA Measure of effectiveness (access cost): #SA + cR/cS x #RA Full-merge: compute scores for all items followed by partial sort simple and efficient important baseline for any top-k algorithm Problems with NRA, CA high bookkeeping overhead for “high” values of k, gain in even access cost not significant NRA
References IO-Top-k: Index-access Optimized Top-k Query Processing Debapriyo Majumdar Max-Planck-Institut f ü r Informatik Saarbr ü cken, Germany Joint work with Holger Bast, Ralf Schenkel, Martin Theobald, Gerhard Weikum Top-k Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel