Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.

Similar presentations


Presentation on theme: "1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research."— Presentation transcript:

1 1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research *Work done at Microsoft Research

2 2 Motivation Workload: Set of SQL Statements Many tasks exploit workload information –DB Admin, Index Tuning, Statistics building, Approximate Query Processing DBMS profilers produce large workloads (+additional info) Most tasks need small workloads Goal: Summarization - Find a “representative” subset of a given, large workload. –Sometimes a weighted subset

3 3 Why Not Random Sampling? One Size does not fit all –Different definitions of “representative subset” –Random sampling may lose valuable info Ignores additional info associated with statements Shown to work poorly, e.g., for Index Selection [chaudhuri02] –May oversample queries on some tables, while ignoring less frequent queries on other tables

4 4 Our Solution 1.Treat input as a relation Each SQL statement (+associated info) is a tuple 2.Extend SQL with new language primitives Allow declarative specification of desired subset Usable on arbitrary relations, not just workloads 3.Implement extensions inside query engine Why? Primitives appear widely applicable Other implementation options available

5 5 The Architecture SELECT *, DOMSUM(Count) FROM WkldTbl DOMINATE WITH PARTITIONING BY FromTables, JoinConds, WhereCols (SLAVE.GroupByCols  MASTER.GroupByCols) AND (SLAVE.OrderByCols PREFIX MASTER.OrderByCols) REPRESENT WITH PARTITIONING BY FromTables, JoinConds, WhereCols MAXIMIZING SUM(DOM_Count) GLOBAL CONSTRAINT Count(*) ≤ 200 LOCAL CONSTRAINT Count(*) ≥ int(200*LOCAL.Count(*)/GLOBAL.Count(*)) Execution Engine Summary Application

6 6 Outline New Primitives for Summarization (Subsetting) –Dominance –Representation Implementing summarization primitives in SQL Experiments

7 7 Dominance Idea: Filter and aggregate using a partial order on tuples Specify condition for one tuple to dominate another –Transitive condition –Encapsulates application knowledge Output: Keep throwing away tuples that are dominated –Retain aggregate info about dominated tuples

8 8 A Graphical Representation 2 3 6 23 Buono7525 Cattivo50 Vendor Quality Price

9 9 Applying Dominance to Workloads Example: Index Selection –An index useful for Q1 likely to be useful for Q2 SELECT... FROM R GROUP BY A, B, C SELECT … FROM R GROUP BY A, B dominates Q1 Q2 MASTER.FromTables=SLAVE.FromTables AND MASTER.GroupByCols  SLAVE.GroupByCols AND MASTER.OrderByCols PREFIX SLAVE.OrderByCols

10 10 Outline New Primitives for Summarization (Subsetting) –Dominance –Representation Implementing Summarization Primitives in SQL Experiments

11 11 Representation Dominance only gets us so far –Need a “lossier” way to select a subset Idea: Pick a subset that solves a Linear Program –Optimize some criterion –Satisfy lots of constraints –Support concept of partitioning

12 12 Details Partition tuples by a set of attributes Criterion: Maximize/Minimize Aggregate –E.g., Minimize Count(*) Global Constraints –E.g., Sum(B) in chosen subset > 60% Sum(B) in input Local Constraints - apply to each partition –E.g., Sum(B) in chosen subset > 40% Sum(B) in that partition

13 13 An Index Selection Example Partition by Tables, Join Conditions and attributes in WHERE clause Criterion: Maximize Sum(ExecutionCost) –Need best “coverage” Global Constraint: Count(*) ≤ 200 Local Constraint: Proportionate representation –A partition with 20% of input should have 20% of output –Count(*) ≥int(200*LOCAL.Count(*)/GLOBAL.Count(*))

14 14 Putting it all together 1.Apply dominance criterion (as earlier). 2.Apply representation (as earlier, but maximize SUM(DOM_Count) ). 3.Weight each tuple by the number of tuples it dominates. SELECT SqlString, DOMSUM(Count) FROM WkldTbl DOMINATE WITH PARTITIONING BY FromTables, JoinConds, WhereCols (SLAVE.GroupByCols  MASTER.GroupByCols) AND (SLAVE.OrderByCols PREFIX MASTER.OrderByCols) REPRESENT WITH PARTITIONING BY FromTables, JoinConds, WhereCols MAXIMIZING SUM(DOM_Count) GLOBAL CONSTRAINT Count(*) ≤ 200 LOCAL CONSTRAINT Count(*) ≥ int(200*LOCAL.Count(*)/GLOBAL.Count(*))

15 15 Outline New Primitives for Summarization (Subsetting) –Dominance –Representation Implementing Summarization Primitives in SQL Experiments

16 16 Implementing Summarization Primitives in SQL Assume set and sequence support in SQL –The mills of the standards bodies… Partitioning useful for both primitives –Hashing, Sort-based, Index-based… Implementing Dominance –Naïve O(n 2 ) algorithm –Techniques from group-wise processing –Leverage Skyline optimizations

17 17 Representation Implementing directly is LP-hard Many queries are much simpler –Fall into one of two special cases Other queries are handled by a simple heuristic –User-guided search Implement as multiple operators

18 18 User-Guided Search Scan tuples in a specific order –User-specified, or heuristically chosen Will always minimize/maximize Count(*) –Use ordering to transform other objectives –Slightly different algorithms for the two cases

19 19 A Minimization Example Satisfied Violated Output A B D C E F

20 20 Two Special Cases Maximize SUM(Attr) –All constraints are on Count(*) –Use partitioning and sort-order access Minimize Count(*) –Single constraint: Again easily solved –More special cases also solvable –Multiple constraints: Approximation algorithm

21 21 Experiments Evaluate utility for index selection Compare to sophisticated Wkld. Compression [chaudhuri02] –Clusters using a complex distance function Simple query as described earlier –Constrained to output same number of statements as Workload Compression –Orders of magnitude faster TPC-H 1GB database –Multiple synthetic workloads introduced in [chaudhuri02]

22 22 Experiments (Contd.) Workload Compress Tuning Wizard Evaluate Total Estimated Cost

23 23 Comparing Estimated Costs

24 24 Conclusion Our contributions –Summarization can be expressed declaratively –Introduction of new operators for summarization –Discussion of SQL implementation The Future –An automatic monitoring and tuning infrastructure? –More workload-sensitive tasks?


Download ppt "1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research."

Similar presentations


Ads by Google