Download presentation

Presentation is loading. Please wait.

Published byCrystal Tankard Modified over 2 years ago

1
**Best-Effort Top-k Query Processing Under Budgetary Constraints**

Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weikum

2
**Motivating Example Mediation Systems Achieve high query throughput.**

Top-k Top-k queries results Engine Mobile Applications Highly impatient users, need fast results. Online Analytics (e.g. logs) Achieve high query throughput. Michal Shmueli-Scheuer

3
**Traditional top-k query**

0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.5 … .. c 0.2 Rm c 0.9 b 0.6 g 0.5 … .. a 0.4 Pre-computed lists over multiple attributes. Combine scores by some monotonic aggregation function. Two accesses modes: sorted access (Cs) random access (Cr) Objective: Compute k objects with highest scores. sorted n m Michal Shmueli-Scheuer

4
**NRA algorithm (Fagin et al.)**

0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.5 …. .. c 0.2 Top-2 Best score Worst score highi a [0.9,1.77] d [0.87,1.77] f = SUM mink candidates Add summation mink > best-score of candidates Michal Shmueli-Scheuer

5
**NRA algorithm (Fagin et al.)**

0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.47] highi mink candidates b [0.6,1.45] mink > best-score of candidates Michal Shmueli-Scheuer

6
**NRA algorithm (Fagin et al.)**

0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.37] highi mink candidates b [0.6,0.85] c [0.5,0.75] f [0.25,0.75] mink > best-score of candidates Michal Shmueli-Scheuer

7
**Top-k with Budget Constraints**

Access Costs Sorted access cost- Cs Random access cost- Cr R1 s 0.95 u 0.93 t 0.92 d 0.9 x 0.5 y 0.4 z 0.2 … R2 a 1.0 b 0.9 c 0.85 d 0.8 e 0.7 t 0.6 f 0.4 .. d 1.7 t 1.52 NRA: 12Cs = 12 precision =0.5 Given budget B, maximize result quality Cs=1, Cr =3 f = SUM TA: 7Cs +7Cr = 28 precision =0 -change green - First NRA (then TA) Budget =10 ? Michal Shmueli-Scheuer

8
**Contributions Sorted Accesses Sorted and Random Accesses Experiments**

Efficient Plan Solution with Adaptive a Sorted and Random Accesses Experiments -title” out contributions Michal Shmueli-Scheuer

9
**Results Under Limited Budget**

Results for limited budget K results for unlimited budget =remove lemma Michal Shmueli-Scheuer

10
**Efficient Plan- Sorted Accesses**

Assume that we know the k results for unlimited budget (REXACT). L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 Plan – {L1,4} {L2,2} o5 o1 Top-2 P1 P2 Q1 Q2 Interesting positions- where the k objects appear in the lists. Sorted accesRemove offline - plan instead of trace P and Q - add animation what is a plan (allocation of resource) Michal Shmueli-Scheuer

11
**Efficient Plan- Sorted Accesses**

Goal: find plan t, such that : Plans for B=5 P1 P2 Q1 Q2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 =remove lemma Plan: {L1,2} {L2,3} Denoted as ROPT Michal Shmueli-Scheuer

12
**Sorted Accesses Observations: Prefer high scores L1 L2 L3 O1, SL1**

- Remove the sentences add another object Prefer high scores Michal Shmueli-Scheuer

13
**Prefer large score reductions**

Observations – contd. title=“war” description=“weapon” observation Prefer large score reductions Michal Shmueli-Scheuer

14
**Score Utilities Score gain: Score reduction: o2, 1 o4, 0.9 y =3**

Remove formula -split it into 2 slides Michal Shmueli-Scheuer

15
**Optimization Problem Bi-objective optimization problem:**

util(Li,x) = a* gain +(1-a)* reduction Different color Remove icde add name Put num of slides out of Remove formula -split it into 2 slides Heuristics: Fair Heuristic Rank Heuristic Where m is the number of lists Michal Shmueli-Scheuer

16
Adaptive gain reduction )) (1-( time Michal Shmueli-Scheuer

17
**Adaptive d(o4) = 0.8-0.6=0.2 top-k o1 [ws,bs] L1 L2 L3 O1, SL1**

o3 [0.8,bs] d(o4) = =0.2 candidates hight1 o4 [0.6,bs] hight2 o6 [ws,bs] Theobald et al. VLDB04 Michal Shmueli-Scheuer

18
Adaptive TREC query, k=100 Michal Shmueli-Scheuer

19
**Efficient Plan- Random Accesses**

Observations: random accesses occur always after sorted accesses have been finished. schedule 1: {SA……RA……SA….} schedule 2: {SA……SA……RA….} Add access precision(schedule1) = precision(schedule2) Michal Shmueli-Scheuer

20
Observations- contd. Random accesses are only useful to objects in REXACT. top-k L2 o1 [ws,bs] o2 [ws,bs] o3 [ws,bs] o1 [ws,bs] o2, SL2 Precision reduced o5 [ws,bs] o5, Not in REXACT o2 [ws,bs] o5, SL2 candidates o4 [ws,bs] Precision remains the same o5 [ws,bs] o1, SL2 Michal Shmueli-Scheuer

21
**Random Accesses When to switch from SA to RA? Gathering with Sorted**

Probing with Random )( Not enough good candidates, RA is wasted Stress that RA is much more expensive then SA. Why we do last (1-( Not enough RAs to prune the candidates time Michal Shmueli-Scheuer

22
**Random Accesses Switch from Sorted to Random: R= (1- )*S**

S – total cost of sorted accesses. R – total cost for random accesses. S+R > B Which items to access ? Do one 1 RA on each candidate. maximize expected score. Michal Shmueli-Scheuer

23
**Experimental Data Zipf, #lists =[2,6], #objects =[10000,1000000]**

TREC Terabyte 25M webpages 50 queries with average length of 3 words. IMDB 375,000 movies 20 queries , each with 4 attributes: {Title, Genre, Actors, Description} Synthetic data Zipf, #lists =[2,6], #objects =[10000, ] Aggregate Function : Sum Aggregate function: Sum Michal Shmueli-Scheuer

24
**Evaluation Methods percentage of optimal precision SME Ropt Rexact**

Ralg Ropt SME Michal Shmueli-Scheuer

25
**Results- Sorted Accesses**

TREC, k=100 Less budget, more improvement Michal Shmueli-Scheuer

26
Varied k IMDB, B=400 Lower K, more improvement. Michal Shmueli-Scheuer

27
**Number of Lists More lists, more improvement. Zipf, K=100, B=4000**

Michal Shmueli-Scheuer

28
**Results- Random Accesses**

TREC, k=100,Cr=10 TREC, K=100, Cr=100

29
Related Works Minimize budget for optimal results: the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02) Dual problem. Anytime top-k : The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07) Do not do any optimizations. Approximate top-k: approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001) -move it to later Michal Shmueli-Scheuer

30
**Conclusions First attempt to deal with budget constraints.**

For SA only, average precision around 70%. Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved. Michal Shmueli-Scheuer

31
Thank You !

33
Top-k query Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim) for every object Xi in T and every object Xj not in T Assumption: The scoring function f is monotone f(r1,…,rm) ≤ f(r1’,…,rm’) if ri ≤ ri’ for all I Two accesses modes: sorted access – Cs random access - Cr Objective: Compute top-k with the minimum cost

34
**Sorted Accesses Observations:**

object with high scores has higher potential to be part of the top-k. object with “mediocre” scores does not help. L1 L2 L3 O1, SL1 O1, SL2 O1, SL3 - Remove the sentences add another object Prefer high scores

35
Example Wireless zone Q useless

36
**Applications Mobile Applications Mediation Systems**

Highly impatient users, need fast results. Mediation Systems Achieve high query throughput. Online analytics (e.g. logs) Michal Shmueli-Scheuer

37
**Motivating Example Query throughput Given #queries per time unit**

Mediator Servers User query Engine Query throughput Allocate time for each query Given #queries per time unit

38
**Terminology Sorted Access Random Access highi Top-k queue**

Candidates queue mink worstScore(d) bestScore(d)

39
**Efficient Offline Solution- Sorted**

Goal: find trace t, such that : L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 B=5 t1 5 t2 1 4 t3 2 3 t4 t5 t6 =remove lemma Denoted as ROPT

40
**Efficient Offline Solution- Sorted**

Goal: find trace t, such that : B =5 L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 t1 5 t2 1 4 t3 2 3 t4 t5 t6 Feasible for K up to 100, and m up to 10.

41
**Efficient Offline Solution- Sorted**

Proof: (in negation) Assume that t does not exists, and chose trace s that within the budget and has optimal precision. Assume s` with traces s`i that are largest position of Pi less or equal to si. By construction the score of any object in S is the same to S`

42
**Fair Heuristic Assume budget =b Runs in batches**

Explain the “absolute value”. Explain here the batches

43
**Efficient Offline Solution- Random**

Budget for RAs =(B-|t|*Cs) Top-k d Rexact o9, S o5, S o7, S o8, S …. best(o)-mink (best(o) = wosrt(o)+RA) o1, S o2, S o3, S o4, S o10, S o14, S ….

44
Motivation Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries. Servers Budget-aware Query processing Mediator Engine User query

45
**Future work Different access costs for different lists**

Time-aware top-k Top-k with budget constraints for P2P

Similar presentations

Presentation is loading. Please wait....

OK

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Avian anatomy and physiology ppt on cells Ppt on child labour act 1986 Ppt on hard disk drive download Ppt on various types of web browser and their comparative features Ppt on two step equations Ppt on credit default swaps and derivatives Ppt on bank lending jobs Ppt on company profile of hdfc bank Ppt on kingdom of dreams Ppt on chemical properties of metals and nonmetals