An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server

An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server
Surajit Chaudhuri, Vivek Narasayya Presented by Robert Chen

Motivation Automate the choice of indexes in the physical design of SQL database input: a workload of SQL queries suggest a set of suitable indexes achieve performance competitive with that of systems cares by administrators.

Basic approaches Textbook solutions Expert system approach
take semantic information, produce a design ignore workload info. Expert system approach knowledge are encoded as rules disconnect from the query optimizer Cost-driven approach use optimizer’s cost estimates to compare goodness of hypothetical designs

Efficiency Want to reduce Reduce Cost Evaluation module
The number of indexes considered The number of configurations enumerated Index Selection Tool iteratively picks more complex index structures Reduce total number of optimizer invocation Cost Evaluation module Queries Cost Config

Architecture of Index Selection Tool
Workload What-if Index Creation Candidate index selection SQL Server Configuration Enumeration Cost Evaluation Multi-column Index Generation Final indexes

Starting Point admissible index batch invocation on the optimizer
indexable columns: columns that appear in the where clause of a query. Index can be multi-column. Starting points for the index selection tool batch invocation on the optimizer invocation is expensive since it requires communication across process boundaries.

Cost Evaluation efficiency goal: reduce the number of optimizer calls, in addition to batching. Atomic Configurations: a configuration C is automatic if a query uses all indexes in C. Derive cost of a Configuration from Atomic configuration: Cost(Q, C) = Mini{ Cost(Q, Ci)} (for select) Cost(Q, C) = T (for insert/delete, it’s an overestimation)

Identifying Atomic Config
Reduce the number of atomic configurations limit the number of indexes per table(j) limit the number of tables per configuration(t) (2, 2), called single-join atomic configurations two tier Reduce the cost of evaluation Relevant index set optimization Cost(Q, C) = Cost(Q, C’), C’ is a subset of C that only consists of indexable columns.

Relevant Atomic Configurations
Decide the atomic configs to evaluate (adaptive Detection) 1. N = 2, A = {atomic configuration of size <=2} 2. Evaluate everything in A. A’ = {} 3. For each C in A, tests if the evaluated cost of C differs significantly from the derived cost. If so add all atomic configurations of size N+1 that are supersets of C to A’ 4. If A’ = {}, exit else A = A’, N = N+1, goto 2.

Candidate Index Selection
Goal: Pick the set of indexes from admissible indexes Approach:Determine the best configuration for each query, the candidate index set is just the union of all such best configurations. Determine the best configuration for each query? Enumerate(Ii, Wi)

Configuration Enumeration
The Problem: Given an index set (I) and a Workload(W), determine the best subset of I with K or less indexes. Enumerate(K, I, W) Greedy(M, K) In our system, the value K is constrained by the single-join atomic configuration pruning.

Greedy(M, k) let S be the best M index configuration using naïve enumeration. If M= k exit while(|S| < k){ Pick a new index I s.t. Cost(S {I}) < = Cost(S {I’}) for all I’ <> I; if(Cost(S {I}) >= Cost(S) ) exit else S = S {I}; }

Multi-Column Index Generation
Goal: Choose a set of admissible multi(two) column indexes M(a, b) MC-LEAD: leading column (a) should be from the output of the configurations enumeration step. MC-ALL: Both columns (a, b) are important

Putting it all together
Candidate index set = admissible index set; Repeat{ index-selection; Greedy(2, k), subject to (2, 2) constraints; select multi-column indexes using MC-LEAD; Candidate index set = Union of the above 2 steps; }until( no considerable improvements are made)

Performance It works Overall running time of algorithms improves by a factor of 4 to 10 over baseline algorithms Drop in quality is small, below 10%.

Summary 3 “novel” techniques:
remove spurious indexes from consideration by taking into account syntax and cost info an iterative approach to handle multi-column indexes reduce the number of atomic configurations (thus the number of optimizer calls) that must be evaluated for a workload.

An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server

Similar presentations

Presentation on theme: "An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server

Similar presentations

Presentation on theme: "An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server"— Presentation transcript:

Similar presentations

About project

Feedback