Presentation is loading. Please wait.

Presentation is loading. Please wait.

Factorizing Complex Predicates in Queries to Exploit Indexes Prasanna Ganesan* Stanford University Surajit Chaudhuri Sunita Sarawagi* Microsoft Research.

Similar presentations


Presentation on theme: "Factorizing Complex Predicates in Queries to Exploit Indexes Prasanna Ganesan* Stanford University Surajit Chaudhuri Sunita Sarawagi* Microsoft Research."— Presentation transcript:

1 Factorizing Complex Predicates in Queries to Exploit Indexes Prasanna Ganesan* Stanford University Surajit Chaudhuri Sunita Sarawagi* Microsoft Research IIT Bombay *Work done at Microsoft Research

2 Motivation Complex, redundant WHERE clauses –Application-generated decision-support queries May result in unsatisfactory plans –So many candidate plans, so little time –Conversion to normal form doesn’t work –Often end up with a table scan Goal: Techniques for efficiently identifying access paths for such complex WHERE clauses

3 Outline Why is the problem challenging? –CNF or DNF does not avoid redundancy –Plan space is large –Formal problem statement –Challenges Factorization –Basic factorization: “largest common conjunctive factor” –Factorization involving union Approximate Factorization Experiments

4 Basic Primitives(Index Intersection) SELECT addr FROM consumers WHERE (income>100000) AND (zipcode=94305) Index intersect (  ) Seek(income>100000)Seek(zipcode=94305) Lookup(addr)

5 Basic Primitives(Index Union) SELECT addr FROM consumers WHERE (income>100000) OR (zipcode=94301) Index union(  ) Seek(income>100000)Seek(zipcode=94301) Lookup(addr) AB AB

6 Index Intersection and Union (A AND B) OR (A AND C)  AB+AC Seek(A)Seek(B)Seek(A)Seek(C)   Data Lookup    A(B+C)

7 Index Intersection and Union (A OR B) AND (A OR C)  (A+B)(A+C) Seek(A)Seek(B)Seek(A)Seek(C)  Data Lookup   A+BC  

8 The Problem Given a relation R, find a plan to retrieve π S (σ P ( R )) using one or more of –Table scan –Index seeks/scans –Index intersections –Index unions –Data lookup from RID lists Focus on single-table selection –Naturally extends to arbitrary queries

9 Challenges Understanding the set of all feasible plans –Many equivalent rewritings –Can we rewrite to retrieve a superset? Identifying the “best” plan –Different index characteristics Impacts access cost –Different selectivities Impacts intersection/union costs as well

10 Roadmap Plan Complexity Expn. Format Exact vs Approximate DNF CNF Intersection+One Union Arbitrary

11 Basic Factorization AND BEDDFCAA OR AND OR C {A,B,C}{A,C} {D,E} {D,F} {A,C}{D} {A,C,D}

12 Basic Factorization(Contd.) We now have a conjunctive factor (A AND C AND D) Use standard optimizer module to find plan for this factor –Table scan, index seek or index intersection –Typically a greedy algorithm based on index costs and selectivity Evaluate remaining conditions as a filter

13 Introducing the Union If query has conjunctive factors, simple factorization usually suffices Many queries don’t have such factors –Need to explore index unions Consider plans with at most one union –No index intersection above it –Sufficient for large set of practical queries –Limited space allows optimal algorithms

14 Single-Union Plans ---(1) Assume expression in Disjunctive Normal Form(DNF) E.g. E = ABC+ACD+ADG+DGH Consider factorizing E as f.Q+R –Find intersection plan for f –Recursively find single-union plan for R –Merge the two plans (re-use R’s union if it exists) AC DG A

15 Single-Union Plans---(2) E=ABC+ACD+ADG+DGH = f.Q+R Say f=AC. E=AC(B+D)+ADG+DGH Recursively factorize R into DG(A+H) Q R Seek(A) Seek(G)Seek(D)   Lookup(Filter E) Lookup(Filter C(B+D)) Lookup(Filter A+H)

16 Single-Union Plans ---(3) Cost(E)~min E=f.Q+R ( cost(f.Q)+cost(R)) –Natural dynamic-programming formulation –Real equation slightly more complex –Cost is exponential Use a greedy alternative –Choose the f that provides greatest cost reduction without further factorizing R

17 Other Expression Forms Conjunctive Normal Form(CNF) –Can just use one term –Multiply terms E.g. (A+B)(A+C) => A+BC –Recursive algorithm in paper General AND-OR trees –Bottom-up algorithm –Applies DNF algorithm to OR nodes and CNF algorithm to AND nodes

18 Approximate Factoring Often, predicates are similar but not identical –A(X BETWEEN 1 AND 100) + B(X BETWEEN 10 AND 110) –Like to exploit similarity of X predicates Relax both X predicates to (X BETWEEN 1 AND 110) –Resulting query is more general (assuming no NOTty problems)

19 Challenge What predicates do we relax? –Trade-off between factoring benefit and cost of false positives Rule 1 of relaxation is: –We do not talk about Fight Club. Find “best” set of range predicates to relax for each attribute –Then select the “best” attribute Don’t relax irrelevant predicates

20 Finding predicates to relax Given expression with range predicates involving attribute X. –Find which predicates to relax for greatest plan improvement. Turns out a greedy algorithm is optimal for many cost functions –Proof in paper appendix –Useful as a heuristic even otherwise

21 Key Idea Relax a pair of predicates if computed to be beneficial Repeat treating the relaxed query as the original query Trick is in figuring out when a relaxation is beneficial –Original predicates are treated slightly differently from relaxed predicates –Details in paper

22 Experiments Experiments on SQL Server 2000 Factorizing done in stand-alone module –Did I hear someone say SQL is declarative? Queries on UCI Machine Learning and UCI KDD data. –Table sizes ~ 1 million rows 15 workloads –Mostly DNF queries (#terms:1 to >100)

23 Reduction in Running Time

24 Impact of Factorization

25 Related Work Optimization of complex WHERE clause –Convert to CNF/DNF [Selinger79, Dayal87] –Using multiple indexes [Mohan90] No factorization –Using smarter indexes [Leslie95] Factorization a popular idea in other domains –Compilers [Reinwald66], VLSI Design [Brayton87]

26 Conclusion Our contributions –Using factorization to optimize queries Efficient algorithms requiring no normalization Staged to reduce compile-time overhead –Introduced approximate factoring Algorithm for optimal relaxation –Integration into overall optimization framework


Download ppt "Factorizing Complex Predicates in Queries to Exploit Indexes Prasanna Ganesan* Stanford University Surajit Chaudhuri Sunita Sarawagi* Microsoft Research."

Similar presentations


Ads by Google