Presentation is loading. Please wait.

Presentation is loading. Please wait.

TUNING DATABASE CONFIGURATION PARAMETERS WITH ITUNED Vamsidhar Thummala Collaborators: Songyun Duan, Shivnath Babu Duke University.

Similar presentations

Presentation on theme: "TUNING DATABASE CONFIGURATION PARAMETERS WITH ITUNED Vamsidhar Thummala Collaborators: Songyun Duan, Shivnath Babu Duke University."— Presentation transcript:

1 TUNING DATABASE CONFIGURATION PARAMETERS WITH ITUNED Vamsidhar Thummala Collaborators: Songyun Duan, Shivnath Babu Duke University

2 2 Performance Tuning of Database Systems Physical design tuning Indexes [SIGMOD98, VLDB04] Materialized views [SIGMOD00, VLDB04] Partitioning [SIGMOD82, SIGMOD88, SIGMOD89] Statistics tuning [ICDE00, ICDE07] SQL Query tuning [VLDB04 ] Configuration parameter or Server parameter tuning [This talk] 2

3 3 Database Configuration Parameters Parameters that control Memory distribution shared_buffers, work_mem I/O optimization wal_buffers, checkpoint_segments, checkpoint_timeout, fsync Parallelism max_connections Optimizers cost model effective_cache_size, random_page_cost default_statistics_target, enable_indexscan 3

4 4 Need for Automated Configuration Parameter Tuning (1/2) 4

5 5 Need for Automated Configuration Parameter Tuning (2/2) 5 Number of threads related to configuration parameter tuning vs. other under PostgreSQL performance mailing list Recently, there has been some effort from community to summarize the important parameters [PgCon08]

6 6 Typical Approach: Trial and Error User*: Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune before and everything worked fine for me. And now i have ~93% cpu load. Here's changed values of config: default_statistics_target = 50 maintenance_work_mem = 1GB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 22GB work_mem = 192MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only postgresql working on it. What parameters I should give more attention on? 1 Response: All the values seem quite reasonable to me. What about the _cost variables? I guess one or more queries are evaluated using a different execution plan, probably sequential scan instead of index scan, hash join instead of merge join, or something like that. Try to log the "slow" statements - see "log_min_statement_duration". That might give you slow queries (although not necessarily the ones causing problems), and you can analyze them. What is the general I/O activity? Is there a lot of data read/written to the disks, is there a lot of I/O wait? PS: Was the database analyzed recently? 2 6 * /msg00323.php, 30 th Jul 2009 /msg00323.php

7 7 Doing Experiments to Understand the Underlying Response Surface TPC-H 4 GB database, 1 GB memory, Query 18 7

8 8 Challenges Large number of configuration parameters Total ~ are important depending on OLTP vs. OLAP Brute-Force will not work Results in exponential number of experiments Parameters can have complex interactions Sometimes non-monotonic and counterintuitive Limits the one-parameter-at-a-time approach No holistic configuration tuning tools Existing techniques focus on specific memory related parameters or recommend default settings 8

9 9 Our Solution: iTuned Practical tool that uses planned experiments to tune configuration parameters An adaptive sampling algorithm to plan the sequence of experiments (Planner) A novel workbench for conducting experiments in enterprise systems (Executor) Features for scalability like sensitivity analysis and use of parallel experiments to reduce Total number of experiments Per experiment running time/cost 9

10 10 Outline of the talk iTuned Planner iTuned Executor Evaluation Conclusion 10

11 11 Problem Abstraction Given A database D and workload W Configuration Parameter Vector X = Cost Budget R Goal: Find high performance setting X * subject to the budget constraint Problem: Response surface y =(X) is unknown Solution Approach: Conduct experiments to learn about the response surface Each experiment has some cost and gives sample 11

12 12 iTuned Planner Uses an adaptive sampling algorithm Boot Strapping: Conduct initial set of experiments Latin Hypercube Sampling k-Furthest First 1 Sequential Sampling: 1.Select NEXT experiment, X NEXT based on previous samples A.Calculate the improvement, IP(X) of each candidate sample and select the sample with highest improvement as X NEXT 2 Stopping Criteria: Based on cost budget, R 12

13 13 Improvement of an Experiment Improvement IP(X) is defined as: y(X * ) – y(X) if y(X) < y(X * ) 0 otherwise Issue: IP(X) is known only after y(X) is known, i.e., an experiment has to be conducted at X to measure y(X) We estimate IP(X) by calculating the Expected Improvement, EIP(X) To calculate EIP(X), we need to approximate 13 Improvement at each configuration setting Probability density function of (Uncertainty estimate)

14 14 Conducting Experiment at X NEXT using Expected Improvement EIP(X ) Conduct NEXT experiment here Projection on 1D 14

15 15 Generating pdf through Gaussian Process We estimate the performance as Where is a regression model, is the residual of the model, captured through Gaussian Process Gaussian Process, captures the uncertainty of the surface is specified by mean and covariance functions We use zero-mean Gaussian process Covariance is a kernel function that inversely depends on the distance between two samples X i and X j Residuals at nearby points exhibit higher correlation 15

16 16 Calculating Expected Improvement using Gaussian Process Lemma: Gaussian Process models as a uni- variate Gaussian with mean and variance as Theorem: There exists a closed form for EIP(X) [See paper for proof and details] 16

17 17 Tradeoff between Exploration vs. Exploitation Settings X with high EIP are either Close to known good settings Assists in exploitation In highly uncertain regions Assists in exploration EIP(X ) 17 Gaussian Process tries to achieve the balance between exploration vs. exploitation

18 18 Outline of the talk iTuned Planner iTuned Executor Evaluation Conclusion 18

19 19 Goal of the Executor To conduct experiments Without impacting production system As close to real production runs as possible Traditional choices Production system itself May impact running applications Test system Hard to replicate exact production settings Manual set-up 19

20 20 iTuned Executor Exploits the underutilized resources to conduct experiments Production systems, Stand-by systems, Test systems, On the cloud Design: Mechanisms: Home & garage containers, efficient snapshots of data Policies: Specified by admins If CPU, memory, disk utilization is below 20% for the past 10 minutes, then 70% resources can be taken for experiments 20

21 21 Example Mechanism set-up on Stand-by System using ZFS, Solaris, and PITR Standby Environment Database DBMS Production Environment Clients Database Write Ahead Log shipping Standby Machine Middle Tier Interface Engine Policy Manager Experiment Planner & Scheduler 21 Home DBMS Apply WAL Garage DBMS Workbench for conducting experiments Home DBMS Apply WAL continuously Copy on Write

22 22 Outline of the talk iTuned Planner iTuned Executor Evaluation Conclusion 22

23 23 Empirical Evaluation (1) Two database systems, PostgreSQL v8.2 and MySQL v5.0 Cluster of machines with 2GHz processor and 3GB RAM Mixture of workloads OLAP: Mixes of TPC-H queries Varying #queries, #query_types, and MPL Varying scale factors (SF = 1 to SF = 10) OLTP: TPC-W and RuBIS Number of parameters varied: up to 30 23

24 24 Empirical Evaluation (2) Techniques compared Default parameter settings shipped (D) Manual rule-based tuning (M) Smart Hill Climbing (S) State-of-the-art technique Brute-Force search (B) Run many experiments to find approximation to optimal setting iTuned (I) Evaluation metrics Quality: workload running time after tuning Efficiency: time needed for tuning 24

25 25 Comparison of Tuning Quality 25 Simple Workload with one TPC-H Query (Q1) Complex Workload with mix of TPC-H Queries (Q1+Q18)

26 26 iTuneds Efficiency and Scalability 26 Run experiments in parallel Abort low-utility experiments early

27 27 iTuneds Sensitivity Analysis Identify important parameters quickly Use Sensitivity Analysis to reduce experiments 27

28 28 Related work Parameter tuning 1. Focus on specific classes of parameters (mainly memory related buffer pools) [ACM TOS08, VLDB06] 2. Statistical Approach for Ranking Parameters [SMDB08] 3. Brute force approach to experiment design Tools like DB2 Configuration advisor and pg_tune recommend default settings Adaptive approaches to sampling [SIGMETRICS06] Work related to iTuneds executor Oracle SQL Performance Analyzer [SIGMOD09, ICDE09] Virtualization, snapshots, suspend-resume 28

29 29 Conclusion iTuned automates the tuning process by adaptively conducting experiments Our initial results are promising Future work Apply database-specific knowledge to keep optimizer in loop for end-to-end tuning Query plan information Workload compression Experiments in cloud 29

30 30 Questions? 30 Thank You

Download ppt "TUNING DATABASE CONFIGURATION PARAMETERS WITH ITUNED Vamsidhar Thummala Collaborators: Songyun Duan, Shivnath Babu Duke University."

Similar presentations

Ads by Google