Download presentation

Presentation is loading. Please wait.

Published byCrystal Tankard Modified about 1 year ago

1
1 Best-Effort Top-k Query Processing Under Budgetary Constraints Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman Chen LiRalf Schenkel, Gerhard Weikum

2
2 Motivating Example Engine Top-k resultsqueries Michal Shmueli-Scheuer Top-k Mobile Applications Highly impatient users, need fast results. Mediation Systems Achieve high query throughput. Online Analytics (e.g. logs) Achieve high query throughput.

3
3 Pre-computed lists over multiple attributes. Combine scores by some monotonic aggregation function. Two accesses modes: – sorted access (Cs) – random access (Cr) Objective: Compute k objects with highest scores. Traditional top-k query RmRm c0.9 b0.6 g0.5 ….. a0.4 R1R1 a0.9 b0.6 c0.5 ….. d0.4 n m sorted R2R2 d0.87 a0.85 f0.5 ….. c0.2 Michal Shmueli-Scheuer

4
4 NRA algorithm (Fagin et al.) a [0.9,1.77] d [0.87,1.77] Top-2 R1R1 a0.9 b0.6 c0.5 ….. d0.4 R2R2 d0.87 a0.85 f0.5 …... c0.2 Worst score Best score high i mink candidates mink > best-score of candidates f = SUM Michal Shmueli-Scheuer

5
5 NRA algorithm (Fagin et al.) a [1.75,1.75] d [0.87,1.47] Top-2 R1R1 a0.9 b0.6 c0.5 ….. d0.4 R2R2 d0.87 a0.85 f0.25 …... c0.2 Worst score Best score high i mink b [0.6,1.45] candidates mink > best-score of candidates Michal Shmueli-Scheuer

6
6 NRA algorithm (Fagin et al.) a [1.75,1.75] d [0.87,1.37] Top-2 R1R1 a0.9 b0.6 c0.5 ….. d0.4 R2R2 d0.87 a0.85 f0.25 …... c0.2 Worst score Best score high i mink b [0.6,0.85] c[ 0.5,0.75] f[ 0.25,0.75] candidates mink > best-score of candidates Michal Shmueli-Scheuer

7
7 Top-k with Budget Constraints R1R1 s0.95 u0.93 t0.92 d0.9 x0.5 y0.4 z0.2 … R2R2 a1.0 b0.9 c0.85 d0.8 e0.7 t0.6 f0.4.. d1.7 t1.52 Top-2 NRA: 12Cs = 12 precision =0.5 Cs=1, Cr =3 f = SUM Access Costs Sorted access cost- Cs Random access cost- Cr Budget =10 ? TA: 7Cs +7Cr = 28 precision =0 Given budget B, maximize result quality Michal Shmueli-Scheuer

8
8 Contributions Sorted Accesses –Efficient Plan –Solution with Adaptive Sorted and Random Accesses –Efficient Plan –Solution with Adaptive Experiments Michal Shmueli-Scheuer

9
9 Results Under Limited Budget Michal Shmueli-Scheuer K results for unlimited Results for limited budget budget

10
10 Efficient Plan- Sorted Accesses Assume that we know the k results for unlimited budget (R EXACT ). Plan – {L1,4} {L2,2} o5 o1 Top-2 P1 P2 Q1 Q2 Interesting positions- where the k objects appear in the lists. L1L2 o1, S L1 o1, S L2 o5, S L1 o2, S L2 o5, S L2 o4, S L2 o8, S L1 o6, S L1 o3, S L2 Michal Shmueli-Scheuer

11
11 Efficient Plan- Sorted Accesses Goal: find plan t, such that : P1 P2 Q1 Q2 L1L2 o1, S L1 o1, S L2 o5, S L1 o2, S L2 o5, S L2 o4, S L2 o8, S L1 o6, S L1 o3, S L2 Denoted as R OPT Plans for B=5 Plan: {L1,2} {L2,3} Michal Shmueli-Scheuer

12
12 Sorted Accesses Observations: Prefer high scores L1L2L3 O2, S L1 O2, S L2 O2, S L3 O1, S L1 O1, S L2 Michal Shmueli-Scheuer

13
13 Observations – contd. Prefer large score reductions title=“war” description=“weapon” Michal Shmueli-Scheuer

14
14 Score Utilities Score gain:Score reduction: o1, 0.6 o2, 1 o5, 0.8 o4, 0.9 o3, 0.7 y =3 Michal Shmueli-Scheuer

15
15 Optimization Problem Where m is the number of lists Bi-objective optimization problem: util(L i,x) = * gain +(1- )* reduction Heuristics: Fair Heuristic Rank Heuristic Michal Shmueli-Scheuer

16
16 Adaptive gain reduction )))) (1- ( time Michal Shmueli-Scheuer

17
17 Adaptive candidates top-k o4 [0.6,bs] o1 [ws,bs] o2 [ws,bs] o3 [0.8,bs] L1L2L3 O1, S L1 O1, S L2 O1, S L3 o6 [ws,bs] high t1 high t2 Theobald et al. VLDB04 (o4) = 0.8-0.6=0.2 Michal Shmueli-Scheuer

18
18 Adaptive TREC query, k=100 Michal Shmueli-Scheuer

19
19 Efficient Plan- Random Accesses Observations: –random accesses occur always after sorted accesses have been finished. schedule 1: {SA……RA……SA….} schedule 2: {SA……SA……RA….} precision(schedule1) = precision(schedule2) Michal Shmueli-Scheuer

20
20 Observations- contd. Random accesses are only useful to objects in R EXACT. L2 o1, SL2 o2, SL2 o5, SL2 o5, Not in R EXACT top-k o1 [ws,bs] o5 [ws,bs] o2 [ws,bs] candidates o4 [ws,bs] o5 [ws,bs] Precision remains the same Precision reduced o1 [ws,bs] o2 [ws,bs] o3 [ws,bs] Michal Shmueli-Scheuer

21
21 Random Accesses Gathering with Sorted Probing with Random When to switch from SA to RA? (1- ( )()( Not enough RAs to prune the candidates Not enough good candidates, RA is wasted time Michal Shmueli-Scheuer

22
22 Random Accesses Switch from Sorted to Random: R= (1- )*S S – total cost of sorted accesses. R – total cost for random accesses. Which items to access ? – maximize expected score. S+R > B Michal Shmueli-Scheuer

23
23 Experimental Data TREC Terabyte –25M webpages –50 queries with average length of 3 words. IMDB –375,000 movies –20 queries, each with 4 attributes: {Title, Genre, Actors, Description} Synthetic data – Zipf, #lists =[2,6], #objects =[10000,1000000] Aggregate Function : Sum Michal Shmueli-Scheuer

24
24 Evaluation Methods percentage of optimal precision Michal Shmueli-Scheuer SME R alg R opt R exact

25
25 Results- Sorted Accesses TREC, k=100 Less budget, more improvement Michal Shmueli-Scheuer

26
26 Varied k IMDB, B=400 Lower K, more improvement. Michal Shmueli-Scheuer

27
27 Number of Lists Zipf, K=100, B=4000 More lists, more improvement. Michal Shmueli-Scheuer

28
28 Results- Random Accesses TREC, k=100, Cr=10 TREC, K=100, Cr=100

29
29 Related Works Minimize budget for optimal results: – the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02) –Dual problem. Anytime top-k : –The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07) –Do not do any optimizations. Approximate top-k: – approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001) Michal Shmueli-Scheuer

30
30 Conclusions First attempt to deal with budget constraints. For SA only, average precision around 70%. Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved. Michal Shmueli-Scheuer

31
31 Thank You !

32
32

33
33 Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f top-k: a set T of k objects such that f(r j1,…,r jm ) ≤ f(r i1,…,r im ) for every object X i in T and every object X j not in T Assumption: The scoring function f is monotone –f(r 1,…,r m ) ≤ f(r 1 ’,…,r m ’) if r i ≤ r i ’ for all I –Two accesses modes: sorted access – Cs random access - Cr Objective: Compute top-k with the minimum cost Top-k query

34
34 Sorted Accesses Observations: – object with high scores has higher potential to be part of the top-k. – object with “mediocre” scores does not help. Prefer high scores L1L2L3 O1, S L1 O1, S L2 O1, S L3

35
35 Example useless Q Wireless zone

36
36 Applications Mobile Applications –Highly impatient users, need fast results. Mediation Systems –Achieve high query throughput. Online analytics (e.g. logs) –Achieve high query throughput. Michal Shmueli-Scheuer

37
37 Motivating Example Query throughput Mediator Servers User query Engine Given #queries per time unit Allocate time for each query

38
38 Terminology 1.Sorted Access 2.Random Access 3.high i 4.Top-k queue 5.Candidates queue 6.mink 7.worstScore(d) 8.bestScore(d)

39
39 Efficient Offline Solution- Sorted Goal: find trace t, such that : P1 P2 P1 P2 L1L2 o1, S L1 o1, S L2 o5, S L1 o2, S L2 o5, S L2 o4, S L2 o8, S L1 o6, S L1 o3, S L2 Denoted as R OPT t105 t214 t323 t432 t541 t650 L1L2 B=5

40
40 Efficient Offline Solution- Sorted Goal: find trace t, such that : P1 P2 P1 P2 L1L2 o1, S L1 o1, S L2 o5, S L1 o2, S L2 o5, S L2 o4, S L2 o8, S L1 o6, S L1 o3, S L2 Feasible for K up to 100, and m up to 10. B =5 t105 t214 t323 t432 t541 t650 L1L2

41
41 Efficient Offline Solution- Sorted Proof: (in negation) –Assume that t does not exists, and chose trace s that within the budget and has optimal precision. Assume s` with traces s` i that are largest position of Pi less or equal to s i. –By construction the score of any object in S is the same to S`

42
42 Fair Heuristic Assume budget =b Runs in batches

43
43 Efficient Offline Solution- Random Budget for RAs =(B-|t|*Cs) Top-k o1, S o4, S o2, S o3, S d R exact o9, S o5, S o7, S o8, S …. best(o)-mink (best(o) = wosrt(o)+RA) o10, S o14, S ….

44
44 Motivation Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries. Mediator Servers User query Engine Budget-aware Query processing

45
45 Future work Different access costs for different lists Time-aware top-k Top-k with budget constraints for P2P

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google