CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning.

CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Physical Database Design and Tuning

Outline Chapter: “Physical Database Design and Tuning” in Elmasri and Navathe  Database design: going from logical design to physical layout – Which indexes to create? When to apply denormalization? – Informed by expected statistics on query frequency  Database tuning: modifying the design from experience – Which indexes to add/remove? How to rewrite stored queries?  Why? – Fill in the gaps between database design and database use – See compromises between a clean design and practical issues CS346 Advanced Databases 2

Physical Database Design and Tuning  So far, we assumed we are given indexes and file organization – But someone has to determine which indexes to create...  Physical Database Design: – How is the data organized? – What indexes are available?  Determined by expert knowledge and heuristics – Controlled by the Database Administrator (DBA)  Database Tuning: modifying the database design – In response to usage history – Based on statistics (size of tables, frequency of access) – May add or remove indexes, change their type etc. CS346 Advanced Databases 3

Physical Database Design  Physical design includes what data goes in what relation – Recall database normalization from Database Systems module – Distinct from logical design of entities, relationships, attributes  It also includes structuring data to allow good performance – From a given schema, there are many possible physical designs  Choosing which is best depends on how the database will be used – Depends on the workload of transactions, queries: the job mix – Need to know how often queries executed, timing requirements – Need to know frequency of updates – Need to know any additional constraints on attributes CS346 Advanced Databases 4

Information to collect  Stored queries: Often there are standard queries fixed up front – A payroll query run every month to produce salary amounts – A query to produce all active jobs in a job management system  Queries of two types: retrieval queries and update transactions  Retrieval queries: 1. Which files are accessed by the query? 2. Which attributes is selection applied to? 3. Is the selection equality, inequality or range? 4. Which attributes are subjects of join conditions? 5. Which attributes are retrieved by the query?  Candidates for index: Attributes selected (2.), join conditions (4.) CS346 Advanced Databases 5 SELECTLNAME, FNAME FROMEMPLOYEE WHERE SALARY > 30000

Information to collect  For update transactions, we need to know: 1. The files that will be updated 2. The type of operation on each file (insert, modify, delete) 3. The attributes for selection of a record to delete/modify 4. The attributes whose values will be changed by a modify  Attributes in selection conditions (3.) are candidates for index – Used to locate the records to update more quickly  Attributes whose values change (4.) are candidates for no index! – Because of the extra cost of maintaining an index under updates CS346 Advanced Databases 6 UPDATE Customers SET Contact=‘Jo Bloggs’, City=‘Warwick’ WHERE CustomerName=‘John Smith’;

Frequency of Queries  How are often do we think each query will be invoked? – Monthly (payroll, sales reports) – Daily (job lists, delivery routes) – Every minute (“dashboard” apps) – Many times a second (web-facing search queries)  Generates expected access frequency for each attribute in file – Heuristic: 80-20 rule, 80% of processing on only 20% of data – Application: optimize only the most frequently accessed 20% CS mantra: “Optimize for the most common case” CS346 Advanced Databases 7

Time, Update, Uniqueness constraints  Some queries have performance constraints – “Should terminate with 5 seconds” (95% of the time) – “Should never take more than 20 seconds” – Affected attributes are high priority for access paths  Some attributes are known to be frequently updated – Avoid too many indexes: slow to update  Attributes with uniqueness (key) constraints should have indexes – Allows DBMS to check uniqueness when inserting new record – E.g. ensure at most one record per student id number CS346 Advanced Databases 8

Decisions to Make  What indexes should we create? – Which relations should have indexes? – What fields should be the search key? – Should we build multiple indexes for the relation?  For each index, what kind of index is it? – Clustering? Primary? Secondary?  Should we modify the schema? – Consider alternative normalized forms? – Maybe undo some normalization for better performance? CS346 Advanced Databases 9

Creating an index in SQL  Index creation is not part of the SQL standard – But similar syntax supported by most DBMSs  CREATE [ UNIQUE ] INDEX ON ( [ ] {, [ ]} ) [CLUSTER] ; – CLUSTER means index should also sort data based on the attribute Creates a clustering index for non-key Creates a primary index if attribute is key – UNIQUE means indexed attribute values must be unique – ORDER is either ASC (ascending, default) or DESC (descending) CS346 Advanced Databases 10 CREATE INDEX DnoIndex ON EMPLOYEE (Dno) CLUSTER; CREATE INDEX EmpIndex ON EMPLOYEE (Lname, Fname);

Denormalization to Speed up queries  Why normalize? [Third normal form 3NF, Boyce-Codd BCNF] – Split attributes across tables to minimize redundancy, errors – Ensures every row has a unique key, avoid update anomalies  Normalization can sometimes work against efficiency – Data that “logically” belongs together can be split across tables  Denormalization: store database in weaker normal form (eg 2NF) – Faster access time, for more expensive updates – E.g. store the join of tables R and S, rather than R and S separately  Denormalization (re)introduces redundancy in the base tables – E.g. now there are functional dependencies in the data CS346 Advanced Databases 11

Denormalization Example  Consider three relations in 3NF: – EMP (Emp_id, Emp_name, Emp_job_title) – PROJ (Proj_id, Proj_name, Proj_mgr_id) – EMP_PROJ (Emp_id, Proj_id, Percent_assigned)  Create a table of employee assignments from joins: ASSIGN (Emp_id, Proj_id, Emp_name, Emp_Job_title, Percent_Assigned, Proj_name, Proj_mgr_id, Proj_mgr_name) – Avoids performing the joins if ASSIGN is repeatedly queried  Only meets 1NF as there are (non-key) functional dependencies: – E.g. Proj_id  Proj_name, Proj_mgr_id  Can create ASSIGN as a view [result of a stored query] on tables – Materialized view: ASSIGN is stored on disk (vs created on demand) CS346 Advanced Databases 12

Database Tuning  Database Tuning: – Revising/adjusting the physical database design – Monitor resource utilization and internal DBMS processing – Reveal bottlenecks such as contention for data or devices  Goals of database tuning can include: – To make a particular application run faster – To lower the response time of queries/transactions – To improve the overall throughput of transactions  Considerations of tuning are very close to those for design – But additional statistics are available to those mentioned before CS346 Advanced Databases 13

Tuning Statistics  For tuning, the DBMS can maintain more statistics: – Size of individual tables – Number of distinct values in a column – How often a particular query or transaction is submitted/executed – Time required for different phases of query/transaction processing  Further statistics obtained from monitoring: – Storage statistics: how many disk blocks are used – I/O and device performance statistics – Query/transaction processing: how long query optimization takes – Locking/logging statistics: how long these steps take – Index statistics (levels, blocks etc.)  Can try to optimize many things through tuning: – Optimize buffer size, processor scheduling, disk/RAM usage – Avoid lock contention, minimize logging overhead [see later] CS346 Advanced Databases 14

Tuning indexes  A main focus of tuning is on the use of indexes – Certain queries may take too long to run for lack of an index – Some indexes may not get used at all – An index may be too costly as attribute is frequently updated  A few options to tune indexes – Drop or/and build new indexes – Change a non-clustered index to a clustered index (or vice versa) – Rebuild the index / reorganize the file  May need to pause operations while doing this work – Part of “scheduled maintenance” CS346 Advanced Databases 15

Tuning Database Design  If processing requirements change, so may database design – Denormalization: existing tables may be joined – Alternate design: switch to a different design, still in BCNF/3NF – Vertical partitioning: break one relation into many, with same key – Repeat attribute(s) from one table to another – Horizontal partitioning: split data by attribute values E.g. break SALES into UK_SALES, FR_SALES, DE_SALES… CS346 Advanced Databases 16

Tuning Queries  Recall: often databases have stored queries that are run often  Indications we need to tune stored queries: – A query causes too many disk accesses – The query plan shows that relevant indexes are not being used  Many complex considerations in tuning queries – Should we use temporary result tables between queries? Can be better than recomputing results many times – Are multiple join conditions possible? [E.g. SSN or name and d.o.b.] Pick those that use cluster indexes, avoid strings – Ordering of tables in a FROM clause may affect join processing – Some query optimizers perform worse on nested queries – Queries posed against a view could be faster on the base tables CS346 Advanced Databases 17

Tuning Query Guidelines  Query optimizer may not make use of indexes in some cases – E.g. A query with many conditions ORed together – Split into the union of multiple queries that will use index  Example of query splitting to encourage use of indexes: – With ‘OR’: SELECT Fname, Lname, Salary, Age FROM EMPLOYEE WHERE Age > 45 OR Salary < 50000; – With ‘UNION’: SELECT Fname, Lname, Salary, Age FROM EMPLOYEE WHERE Age > 45 UNION SELECT Fname, Lname, Salary, Age FROM EMPLOYEE WHERE Salary < 50000; CS346 Advanced Databases 18

Query Tuning guidelines:  Convert NOT condition into a positive expression: – NOT Age = 18  If an equijoin has a range condition on the join attribute in one table, repeat it for the other table  Ensure WHERE conditions make use of composite indexes – E.g. suppose we have an index on (Region, Product) – Q1: SELECT Region, Product, Month, Sales FROM SalesStats WHERE Region=3 AND ((Product=3) OR (Product=8)); – Q2: SELECT Region, Product, Month, Sales FROM SalesStats WHERE (Region=3 AND Product=3) OR (Region=3 AND Product=8); CS346 Advanced Databases 19

Summary CS346 Advanced Databases 20  Database design: going from logical design to physical layout – Which indexes to create? When to apply denormalization? – Informed by expected statistics on query frequency  Database tuning: modifying the design from experience – Which indexes to add/remove? How to rewrite stored queries? Chapter: “Physical Database Design and Tuning” in Elmasri and Navathe

CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning.

Similar presentations

Presentation on theme: "CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning.

Similar presentations

Presentation on theme: "CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning."— Presentation transcript:

Similar presentations

About project

Feedback