Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best-Effort Top-k Query Processing Under Budgetary Constraints

Similar presentations


Presentation on theme: "Best-Effort Top-k Query Processing Under Budgetary Constraints"— Presentation transcript:

1 Best-Effort Top-k Query Processing Under Budgetary Constraints
Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weikum

2 Motivating Example Mediation Systems Achieve high query throughput.
Top-k Top-k queries results Engine Mobile Applications Highly impatient users, need fast results. Online Analytics (e.g. logs) Achieve high query throughput. Michal Shmueli-Scheuer

3 Traditional top-k query
0.9 b 0.6 c 0.5 .. d 0.4 R2 d 0.87 a 0.85 f 0.5 .. c 0.2 Rm c 0.9 b 0.6 g 0.5 .. a 0.4 Pre-computed lists over multiple attributes. Combine scores by some monotonic aggregation function. Two accesses modes: sorted access (Cs) random access (Cr) Objective: Compute k objects with highest scores. sorted n m Michal Shmueli-Scheuer

4 NRA algorithm (Fagin et al.)
0.9 b 0.6 c 0.5 .. d 0.4 R2 d 0.87 a 0.85 f 0.5 …. .. c 0.2 Top-2 Best score Worst score highi a [0.9,1.77] d [0.87,1.77] f = SUM mink candidates Add summation mink > best-score of candidates Michal Shmueli-Scheuer

5 NRA algorithm (Fagin et al.)
0.9 b 0.6 c 0.5 .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.47] highi mink candidates b [0.6,1.45] mink > best-score of candidates Michal Shmueli-Scheuer

6 NRA algorithm (Fagin et al.)
0.9 b 0.6 c 0.5 .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.37] highi mink candidates b [0.6,0.85] c [0.5,0.75] f [0.25,0.75] mink > best-score of candidates Michal Shmueli-Scheuer

7 Top-k with Budget Constraints
Access Costs Sorted access cost- Cs Random access cost- Cr R1 s 0.95 u 0.93 t 0.92 d 0.9 x 0.5 y 0.4 z 0.2 R2 a 1.0 b 0.9 c 0.85 d 0.8 e 0.7 t 0.6 f 0.4 .. d 1.7 t 1.52 NRA: 12Cs = 12 precision =0.5 Given budget B, maximize result quality Cs=1, Cr =3 f = SUM TA: 7Cs +7Cr = 28 precision =0 -change green - First NRA (then TA) Budget =10 ? Michal Shmueli-Scheuer

8 Contributions Sorted Accesses Sorted and Random Accesses Experiments
Efficient Plan Solution with Adaptive a Sorted and Random Accesses Experiments -title” out contributions Michal Shmueli-Scheuer

9 Results Under Limited Budget
Results for limited budget K results for unlimited budget =remove lemma Michal Shmueli-Scheuer

10 Efficient Plan- Sorted Accesses
Assume that we know the k results for unlimited budget (REXACT). L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 Plan – {L1,4} {L2,2} o5 o1 Top-2 P1 P2 Q1 Q2 Interesting positions- where the k objects appear in the lists. Sorted accesRemove offline - plan instead of trace P and Q - add animation what is a plan (allocation of resource) Michal Shmueli-Scheuer

11 Efficient Plan- Sorted Accesses
Goal: find plan t, such that : Plans for B=5 P1 P2 Q1 Q2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 =remove lemma Plan: {L1,2} {L2,3} Denoted as ROPT Michal Shmueli-Scheuer

12 Sorted Accesses Observations: Prefer high scores L1 L2 L3 O1, SL1
- Remove the sentences add another object Prefer high scores Michal Shmueli-Scheuer

13 Prefer large score reductions
Observations – contd. title=“war” description=“weapon” observation Prefer large score reductions Michal Shmueli-Scheuer

14 Score Utilities Score gain: Score reduction: o2, 1 o4, 0.9 y =3
Remove formula -split it into 2 slides Michal Shmueli-Scheuer

15 Optimization Problem Bi-objective optimization problem:
util(Li,x) = a* gain +(1-a)* reduction Different color Remove icde add name Put num of slides out of Remove formula -split it into 2 slides Heuristics: Fair Heuristic Rank Heuristic Where m is the number of lists Michal Shmueli-Scheuer

16 Adaptive  gain reduction )) (1-( time Michal Shmueli-Scheuer

17 Adaptive  d(o4) = 0.8-0.6=0.2 top-k o1 [ws,bs] L1 L2 L3 O1, SL1
o3 [0.8,bs] d(o4) = =0.2 candidates hight1 o4 [0.6,bs] hight2 o6 [ws,bs] Theobald et al. VLDB04 Michal Shmueli-Scheuer

18 Adaptive  TREC query, k=100 Michal Shmueli-Scheuer

19 Efficient Plan- Random Accesses
Observations: random accesses occur always after sorted accesses have been finished. schedule 1: {SA……RA……SA….} schedule 2: {SA……SA……RA….} Add access precision(schedule1) = precision(schedule2) Michal Shmueli-Scheuer

20 Observations- contd. Random accesses are only useful to objects in REXACT. top-k L2 o1 [ws,bs] o2 [ws,bs] o3 [ws,bs] o1 [ws,bs] o2, SL2 Precision reduced o5 [ws,bs] o5, Not in REXACT o2 [ws,bs] o5, SL2 candidates o4 [ws,bs] Precision remains the same o5 [ws,bs] o1, SL2 Michal Shmueli-Scheuer

21 Random Accesses When to switch from SA to RA? Gathering with Sorted
Probing with Random )( Not enough good candidates, RA is wasted Stress that RA is much more expensive then SA. Why we do last (1-( Not enough RAs to prune the candidates time Michal Shmueli-Scheuer

22 Random Accesses Switch from Sorted to Random: R= (1- )*S
S – total cost of sorted accesses. R – total cost for random accesses. S+R > B Which items to access ? Do one 1 RA on each candidate. maximize expected score. Michal Shmueli-Scheuer

23 Experimental Data Zipf, #lists =[2,6], #objects =[10000,1000000]
TREC Terabyte 25M webpages 50 queries with average length of 3 words. IMDB 375,000 movies 20 queries , each with 4 attributes: {Title, Genre, Actors, Description} Synthetic data Zipf, #lists =[2,6], #objects =[10000, ] Aggregate Function : Sum Aggregate function: Sum Michal Shmueli-Scheuer

24 Evaluation Methods percentage of optimal precision SME Ropt Rexact
Ralg Ropt SME Michal Shmueli-Scheuer

25 Results- Sorted Accesses
TREC, k=100 Less budget, more improvement Michal Shmueli-Scheuer

26 Varied k IMDB, B=400 Lower K, more improvement. Michal Shmueli-Scheuer

27 Number of Lists More lists, more improvement. Zipf, K=100, B=4000
Michal Shmueli-Scheuer

28 Results- Random Accesses
TREC, k=100,Cr=10 TREC, K=100, Cr=100

29 Related Works Minimize budget for optimal results: the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02) Dual problem. Anytime top-k : The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07) Do not do any optimizations. Approximate top-k: approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001) -move it to later Michal Shmueli-Scheuer

30 Conclusions First attempt to deal with budget constraints.
For SA only, average precision around 70%. Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved. Michal Shmueli-Scheuer

31 Thank You !

32

33 Top-k query Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim) for every object Xi in T and every object Xj not in T Assumption: The scoring function f is monotone f(r1,…,rm) ≤ f(r1’,…,rm’) if ri ≤ ri’ for all I Two accesses modes: sorted access – Cs random access - Cr Objective: Compute top-k with the minimum cost

34 Sorted Accesses Observations:
object with high scores has higher potential to be part of the top-k. object with “mediocre” scores does not help. L1 L2 L3 O1, SL1 O1, SL2 O1, SL3 - Remove the sentences add another object Prefer high scores

35 Example Wireless zone Q useless

36 Applications Mobile Applications Mediation Systems
Highly impatient users, need fast results. Mediation Systems Achieve high query throughput. Online analytics (e.g. logs) Michal Shmueli-Scheuer

37 Motivating Example Query throughput Given #queries per time unit
Mediator Servers User query Engine Query throughput Allocate time for each query Given #queries per time unit

38 Terminology Sorted Access Random Access highi Top-k queue
Candidates queue mink worstScore(d) bestScore(d)

39 Efficient Offline Solution- Sorted
Goal: find trace t, such that : L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 B=5 t1 5 t2 1 4 t3 2 3 t4 t5 t6 =remove lemma Denoted as ROPT

40 Efficient Offline Solution- Sorted
Goal: find trace t, such that : B =5 L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 t1 5 t2 1 4 t3 2 3 t4 t5 t6 Feasible for K up to 100, and m up to 10.

41 Efficient Offline Solution- Sorted
Proof: (in negation) Assume that t does not exists, and chose trace s that within the budget and has optimal precision. Assume s` with traces s`i that are largest position of Pi less or equal to si. By construction the score of any object in S is the same to S`

42 Fair Heuristic Assume budget =b Runs in batches
Explain the “absolute value”. Explain here the batches

43 Efficient Offline Solution- Random
Budget for RAs =(B-|t|*Cs) Top-k d Rexact o9, S o5, S o7, S o8, S …. best(o)-mink (best(o) = wosrt(o)+RA) o1, S o2, S o3, S o4, S o10, S o14, S ….

44 Motivation Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries. Servers Budget-aware Query processing Mediator Engine User query

45 Future work Different access costs for different lists
Time-aware top-k Top-k with budget constraints for P2P


Download ppt "Best-Effort Top-k Query Processing Under Budgetary Constraints"

Similar presentations


Ads by Google