Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Physical Design Tuning: Workload as a Sequence

Similar presentations


Presentation on theme: "Automatic Physical Design Tuning: Workload as a Sequence"— Presentation transcript:

1 Automatic Physical Design Tuning: Workload as a Sequence
Sanjay Agrawal, Microsoft Research Eric Chu, University of Wisconsin-Madison Vivek Narasayya, Microsoft Research

2 Automatic Physical Design Tuning
DB applications more complex and varied. Considerable time spent on tuning. Reduce cost of ownership of RDBMS. Automatically recommend physical design. Supported by DB vendors. Database Engine Tuning Advisor, Microsoft Design Advisor, IBM SQL Access Advisor, Oracle 11/21/2018 SIGMOD 2006

3 Microsoft Database Engine Tuning Advisor
Set of queries, updates Applications Workload Query Optimizer (extended) Database Engine Tuning Advisor “What-if” Set of indexes, materialized views, horizontal partitions Microsoft SQL Server 2005 Recommendation 11/21/2018 SIGMOD 2006

4 Workload as a Sequence: Motivation
Data warehousing Query by day, update at night. Set: No index recommended when update costs outweigh benefits. Sequence: May exploit benefits of indexes without incurring update costs. Insert “create” and “drop” of indexes to workload. Exploit order of statements. Create Indexes Drop Indexes Updates Night Queries Day 11/21/2018 SIGMOD 2006

5 Set VS Sequence Set-based Outputs are different
Recommendation is robust to changes in order of statement arrival. Can miss good recommendations compared to sequenced-based approach. Outputs are different Set: what indexes to create or drop? Sequence: what indexes to create or drop and where? Create Indexes Drop Indexes Queries Updates Queries 11/21/2018 SIGMOD 2006

6 Model Workload as a Sequence
Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy-SEQ Experiments 11/21/2018 SIGMOD 2006

7 Problem Setting Cost(Si,Ci) – cost of executing Si with Ci.
Workload: S = [S1, S2, …, SN] CN+1 C0 C1 C2 C3 CN S2 S1 S3 SN Si {Select, Insert, Delete, Update} Cost(Si,Ci) – cost of executing Si with Ci. TC(C1, C2) – transition cost Sequence execution cost Nk=1((Cost(Sk,Ck) + TC(Ck-1,Ck)) + TC (CN,CN+1) 11/21/2018 SIGMOD 2006

8 Problem Definition Given:
Database D, workload W = [S1, …, SN], initial configuration C0, and storage bound M. Find configurations C1, C2, …, CN+1 such that Minimize sequence execution cost: Nk=1((Cost(Sk,Ck) + TC(Ck-1,Ck)) + TC (CN,CN+1) Storage of Ci ≤ M, for all i. 11/21/2018 SIGMOD 2006

9 Search Space Given N statements and M indexes Sequence-based tuning
2M distinct configurations for each statement. 2M(N+1) possible execution sequences. Set-based tuning 2M configurations. 11/21/2018 SIGMOD 2006

10 Model Workload as a Sequence
Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments 11/21/2018 SIGMOD 2006

11 Optimal Algorithm for Single-Index Case
{ } {I} S1 { } {I} S2 { } {I} SN Id Ic SOURCE { } Id { } DESTINATION Ic DAG for single index, N statements Node costs: Cost(Si, { }) and Cost(Si,{I}). Edge costs: 0, IC, and ID. Cost of shortest path includes node and edge costs. 11/21/2018 SIGMOD 2006

12 General Case – Multiple Indexes
SN EXHAUSTIVE CF1 CF2 CFN C0 Ci1 Ci2 CiN CN+1 C11 C12 C1N C01 C02 C0N At each stage, enumerate all possible configurations from the set of indexes. Algorithm linear in the number of nodes and edges of DAG. However, number of nodes in DAG is exponential in the number of indexes. M indexes => O(N*2M) nodes and O(N*2M) edges. 11/21/2018 SIGMOD 2006

13 Solve sequence using EXHAUSTIVE
Optimal Solution Recommendation Candidate set of structures Solve sequence using EXHAUSTIVE Sequence, Constraints 11/21/2018 SIGMOD 2006

14 Search-Space Pruning Techniques to reduce number of nodes:
Cost-based Pruning Leverages shortest-path solutions of individual indexes. Prunes configurations at each stage without loss of optimality. Disjoint Sequences Divide-and-conquer approach. Splits the input sequence and candidate index set. Greedy-SEQ Guarantees a polynomial number of nodes. 11/21/2018 SIGMOD 2006

15 Model Workload as a Sequence
Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments 11/21/2018 SIGMOD 2006

16 Exploiting Disjoint Sequences
Two sequences X and Y are disjoint if they do not share any statements AND indexes. Disjoint sequences are common E.g., server hosts multiple applications that touch different databases. Approach: Split workload into disjoint sequences. Solve each sequence independently. Merge to get final solution. Idea: DAG for each disjoint sequence has fewer nodes. 11/21/2018 SIGMOD 2006

17 Efficiency Gain with Disjoint Sequences
{I1,I2,I3} W 8 nodes at each stage S1 S3 S4 {I1} S2 S5 S6 {I2} S7 {I3} W1 W2 W3 2 nodes at each stage for each sequence 11/21/2018 SIGMOD 2006

18 Merge solutions of W1, W2, and W3: No storage violations
DEST I1c S1 S3 SRC {I1} S4 { } I1d W1 = [S1,S3,S4] S2 DEST S5 S6 I2d W2 = [S2,S5,S6] I2c {I2} { } SRC DEST S7 I3c {I3} { } W3 = [S7] SRC Pu is optimal when there are no storage violations. S2 {I1,I2} S3 S1 SRC {I1} S4 {I2} S5 S6 { } S7 {I3} DEST 11/21/2018 SIGMOD 2006

19 Merge in the presence of storage violation
Suppose storage bound allows only 1 index. Pu is not a valid solution as it has configurations with storage violation. S2 {I1,I2} S3 S1 SRC {I1} S4 {I2} S5 S6 { } S7 {I3} DEST S4 {I2} S5 S6 { } {I3} DEST S7 S1 SRC {I1} { } S2 S3 {I1} {I2} Pu’ = Merge P1, P2 and P3 to get a valid solution. Note that cost of Pu is a lower bound on cost of any valid solution. 11/21/2018 SIGMOD 2006

20 Solution with Split and Merge
Sequence, Constraints Candidate set of structures Apply Split operator to get disjoint sequences Solve each sequence independently using EXHAUSTIVE or GREEDY-SEQ Merge results of disjoint sequences Recommendation 11/21/2018 SIGMOD 2006

21 Model Workload as a Sequence
Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments 11/21/2018 SIGMOD 2006

22 Greedy Approach Goal: Explore a polynomial number of good configurations. Run shortest path over the DAG constructed with these configurations. Solution close to optimal. Greedy-SEQ: adaptation of existing greedy technique for the sequence model. 11/21/2018 SIGMOD 2006

23 Greedy-SEQ Steps of Greedy-SEQ:
Get optimal solution for each index. Record configurations. Initialize current best to be the lowest-cost solution seen so far. Improve current best by combining with other solutions and resetting current best. Record new configurations of current best. Repeat until no more improvement. Run shortest-path over configurations collected. 11/21/2018 SIGMOD 2006

24 Combining Two Single-Index Solutions
SN SK SL S0 SN+1 {I1} {} I1 I2 {I2} {I1} {} {I2} I1,I2 {I1,I2} 11/21/2018 SIGMOD 2006

25 Combining Two Single-Index Solutions
SN SK SL S0 SN+1 {I1} {} I1 I2 {I2} {I1} {I1} {I1} {} {} {} {I2} {I2} {} I1,I2 {I2} {} {I1,I2} {I1,I2} 11/21/2018 SIGMOD 2006

26 Greedy-SEQ: Greedy Approach
Get optimal solution for each index. Record configurations. Initialize current best to be the lowest-cost solution seen so far. Improve current best by combining with other solutions and resetting current best. Record new configurations of current best. Repeat Step 3 until no more improvement. Run shortest-path over configurations collected. 11/21/2018 SIGMOD 2006

27 End-to-End Solution Candidate set of structures
Sequence, Constraints Candidate set of structures Recommendation Apply split operator to get disjoint sequences Solve each sequence independently using EXHAUSTIVE or GREEDY-SEQ Merge results of disjoint sequences Apply cost-based pruning on each sequence 11/21/2018 SIGMOD 2006

28 Model Workload as a Sequence
Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments 11/21/2018 SIGMOD 2006

29 Sequence VS Set-based approaches
% improvement relative to the optimal set-based solution. Sequence is better in the presence of updates and/or storage bound is low. Workload M = 1.2 GB M = 3 GB TPCH-22 19% 0% TPCH-22-I-10-MID 22% 16% TPCH-22-I-10-END 25% 28% 11/21/2018 SIGMOD 2006

30 Greedy-SEQ VS Exhaustive
Greedy-SEQ’s much faster with minimal degradation in quality. Workload % reduction in running time % reduction in quality TPCH-3 50% <1% TPCH-5-M-5 98.4% 2.3% TPCH-22 Exhaustive was terminated after 24 hours Not available 11/21/2018 SIGMOD 2006

31 Effectiveness of Split and Merge
With split and merge (SPMR) VS without (WO-SPMR) Workload % reduction in running time compared to WO-SPMR % reduction in quality compared to WO-SPMR TPCH-22 <0.1% 0% WKLD1 89.9% WKLD1-LOW 71.4% 3.0% 11/21/2018 SIGMOD 2006

32 Conclusion Sequence model allows more optimization opportunities than set model. Model the problem as finding the shortest path over a DAG. Heuristics give nearly optimal solutions with much better performance. 11/21/2018 SIGMOD 2006


Download ppt "Automatic Physical Design Tuning: Workload as a Sequence"

Similar presentations


Ads by Google