Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.

Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign

Ranked queries return top-k results, unlike Boolean Crucial for retrieving data by “soft” conditions – relevance: e.g., text search engines – similarity: e.g., multimedia databases – preference: e.g., e-commerce product search Example scenario: preference query for finding house: – select h.id from house h where new(age), cheap(price, size), large(size) order by min(new,cheap,large) stop after 5  Observation: Crucial to support expensive predicates Context: Top-k Queries predicate k: retrieval sizescoring function

Problem: Expensive Predicates Expensive predicates – no pre-computed indexes for zero-time sorted-access – need a probe to evaluate each object (similar to sequential scan) Unified abstraction for: – user-defined functions: functional extensibility query conditions can be arbitrary, user-specific e.g., cheap(price,size) – external predicates: data extensibility source interface may require one probe per object e.g., safe(zip) access crime rate from apbnews.com – fuzzy joins associations of relations can be arbitrary e.g., close(house.zip, park.zip)

Require sorted access of search predicates. To “simulate” sorted access, require complete probing – are these probes necessary? Goal: Minimize probe cost Current Limitations: “Sort-Merge” Framework d:0.90, a:0.85, b:0.78, c:0.75, e:0.70 b:0.90, d:0.90, e:0.80, a:0.75, c:0.20 a:0.90, b:0.80, c:0.70, d:0.60, e:0.50 b:0.78 Merge Algorithm F = min(new,cheap,large) k = 1 Sort stepMerge stepTop-k output new (search predicate) cheap (expensive predicate) large (expensive predicate)  

Motivation: Solution Space Assume sequential probing: Algorithm skeleton: do: schedule next obj o, pred p probe pr(o,p) until (top-k identified) predicates p1p2p3 object a b c

Our framework: Separate, Global Predicate Scheduling Two important decisions on framework: Separate predicate scheduling – scheduling as separate “optimization” phase before probing – avoid run-time scheduling overhead Global predicate scheduling – scheduling based on global info (predicate selectivities) – lack of per-object information to justify per-object scheduling – avoid per-object scheduling overhead  Simple framework and algorithm – and efficient! – allow essentially A* framework, for given predicate schedule – enable formal analysis: optimality, scalability

Separate, global predicate scheduling Simple Framework Algorithm skeleton: find global schedule H do: schedule next obj o probe pr(o, next(o,H)) until (top-k identified) predicates H=(p1,p2,p3) p1p2p3 object a b c

Challenges for Minimizing Probing Predicate scheduling before probing – how to identify the best H? Object scheduling during probing – how to find next object to probe, for achieving “minimal probing” with respect to H? Algorithm skeleton: find global schedule H do: schedule next obj o probe pr(o, next(o,H)) until (top-k identified) ? ?

Challenge 1 : Object Scheduling Goal: Perform only necessary probes Necessary probes: – A probe is necessary if top-k answers cannot be determined by any algorithm without it, regardless of the outcomes of other probes.  Question 1: Given a probe pr(o, next(o,H)), how to determine if it is necessary? Probe-optimal algorithm – An algorithm is probe-optimal if it performs only the necessary probes.  Question 2 : How to identify necessary probes in order to design such an algorithm?

k=1, F=min(x,p1,p2); suppose H=(p1,p2) Question 1: Is this Probe Necessary? OIDxp1p2 F=min(x,p1,p2) a0.9 b0.8 c0.7 d0.6 e 0.5 ? ? ? ? ? 110.9 110.7 110.6 110.5 top 1 Maybe Not!  0.8

k=1, F=min(x,p1,p2); suppose H=(p1,p2)  Theorem: Probe pr(o,p) is absolutely necessary, if o is among the current top-k in terms of ceiling scores. Question 1: Is this Probe Necessary? OIDxp1p2 F=min(x,p1,p2) a0.9 b0.8 c0.7 d0.6 e 0.5 ?  0.9 110.7 110.6 110.5 top 1? Necessary! 110.8

Question 2: Probe-optimal object scheduling Objects in current top-k must be further probed Probe-optimal object scheduling: Algorithm MPro – use a priority queue with ceiling scores as priorities a:0.9 b:0.8 c:0.7 d:0.6 e:0.5 a:0.85 b:0.8 c:0.7 d:0.6 e:0.5 b:0.8 a:0.75 c:0.7 d:0.6 e:0.5 a:0.75 c:0.7 d:0.6 e:0.5 b:0.78 a:0.75 c:0.7 d:0.6 e:0.5 b:0.78 pr(a,p1) =0.85 pr(a,p2) =0.75 pr(b,p1) =0.78 pr(b,p2) =0.90 top 1

Challenge 2: Predicate Scheduling Scheduling problem – find minimal cost schedule from permutations Challenges – selectivity estimation: dynamic predicates aggregate selectivities (context-dependent) – scheduling computation: NP-hard Our approach: – on-line sampling to estimate selectivities – greedy selection to schedule predicates 0.1% sampling achieves almost the best schedule

Experiment Results Practical performance of MPro – proportional cost to the retrieval size k – significant speedup for small k Impact of performance factors – database size: sublinear cost scalability – score distribution and scoring function: see paper 6 hour 2 min

Demo : House Search Data: All houses on sale in Illinois (N=20990) – from www.realtor.com. – objects: house(id, price, size, bed, bath, zip, city) Query: F = Average(n, c, r) – n nearcity: close to Chicago – c cheap: “reasonable” price for its size – r roomy: prefer 4-6 rooms

Summary of Contributions (more in the paper) Abstraction: – for user-defined, external, and fuzzy join predicates Framework and algorithm: – sampling-based global scheduling – probe-optimal algorithm MPro – extensions of MPro: fuzzy joins, parallel MPro, approximation Principles/Theorems: – necessary-probe principle – probe-optimality of MPro – analytical scalability of MPro Extensive experiments

Thank You!

Parallel MPro: Overview Probe-parallel MPro – Probe k necessary probes concurrently – Up to k-fold speedup Data-parallel MPro – Partition data into s chunks – Up to s-time speedup top-k MPro Merge

Scalability k=100 N=1000 k=1000 N=10000 k=10000 N=100000 N=1000 N=10000 N=100000

Comparison TTT OOO

Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.

Similar presentations

Presentation on theme: "Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.

Similar presentations

Presentation on theme: "Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback