CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Database Tuning. Overview v After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for our database.
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Tuning in Relational Systems 2012/06/04. Index The performance of queries largely depends upon what indexes or hashing scheme exist. – Efficiency of queries.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Topic Denormalisation S McKeever Advanced Databases 1.
Manajemen Basis Data Pertemuan 7 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Physical Database Design and Tuning Module 5, Lecture 3.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Physical Database Design CIT alternate keys - named constraints - indexes.
Physical Database Monitoring and Tuning the Operational System.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Chapter 5 Normalization of Database Tables
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
Week 6 Lecture Normalization
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CSC271 Database Systems Lecture # 30.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 15-1 Query Processing and.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Database Systems: Design, Implementation, and Management Tenth Edition
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Functional Dependencies and Normalization for Relational Databases.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
© Pearson Education Limited, Chapter 13 Physical Database Design – Step 4 (Choose File Organizations and Indexes) Transparencies.
Physical Database Design Transparencies. ©Pearson Education 2009 Chapter 11 - Objectives Purpose of physical database design. How to map the logical database.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Management COP4540, SCS, FIU Physical Database Design (2) (ch. 16 & ch. 6)
©NIIT Normalizing and Denormalizing Data Lesson 2B / Slide 1 of 18 Objectives In this section, you will learn to: Describe the Top-down and Bottom-up approach.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
University of Sunderland COM 220 Lecture Ten Slide 1 Database Performance.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
CSC314 DAY 8 Introduction to SQL 1. Chapter 6 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SQL OVERVIEW  Structured Query Language  The.
SQL Basics Review Reviewing what we’ve learned so far…….
Views / Session 3/ 1 of 40 Session 3 Module 5: Implementing Views Module 6: Managing Views.
1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.
10/3/2017 Chapter 6 Index Structures.
Practical Database Design and Tuning
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Physical Database Design and Performance
Physical Database Design for Relational Databases Step 3 – Step 8
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Physical Database Design
Practical Database Design and Tuning
Chapter 17 Designing Databases
Presentation transcript:

CS346: Advanced Databases Graham Cormode Physical Database Design and Tuning

Outline Chapter: “Physical Database Design and Tuning” in Elmasri and Navathe  Database design: going from logical design to physical layout – Which indexes to create? When to apply denormalization? – Informed by expected statistics on query frequency  Database tuning: modifying the design from experience – Which indexes to add/remove? How to rewrite stored queries?  Why? – Fill in the gaps between database design and database use – See compromises between a clean design and practical issues CS346 Advanced Databases 2

Physical Database Design and Tuning  So far, we assumed we are given indexes and file organization – But someone has to determine which indexes to create...  Physical Database Design: – How is the data organized? – What indexes are available?  Determined by expert knowledge and heuristics – Controlled by the Database Administrator (DBA)  Database Tuning: modifying the database design – In response to usage history – Based on statistics (size of tables, frequency of access) – May add or remove indexes, change their type etc. CS346 Advanced Databases 3

Physical Database Design  Physical design includes what data goes in what relation – Recall database normalization from Database Systems module – Distinct from logical design of entities, relationships, attributes  It also includes structuring data to allow good performance – From a given schema, there are many possible physical designs  Choosing which is best depends on how the database will be used – Depends on the workload of transactions, queries: the job mix – Need to know how often queries executed, timing requirements – Need to know frequency of updates – Need to know any additional constraints on attributes CS346 Advanced Databases 4

Information to collect  Stored queries: Often there are standard queries fixed up front – A payroll query run every month to produce salary amounts – A query to produce all active jobs in a job management system  Queries of two types: retrieval queries and update transactions  Retrieval queries: 1. Which files are accessed by the query? 2. Which attributes is selection applied to? 3. Is the selection equality, inequality or range? 4. Which attributes are subjects of join conditions? 5. Which attributes are retrieved by the query?  Candidates for index: Attributes selected (2.), join conditions (4.) CS346 Advanced Databases 5 SELECTLNAME, FNAME FROMEMPLOYEE WHERE SALARY > 30000

Information to collect  For update transactions, we need to know: 1. The files that will be updated 2. The type of operation on each file (insert, modify, delete) 3. The attributes for selection of a record to delete/modify 4. The attributes whose values will be changed by a modify  Attributes in selection conditions (3.) are candidates for index – Used to locate the records to update more quickly  Attributes whose values change (4.) are candidates for no index! – Because of the extra cost of maintaining an index under updates CS346 Advanced Databases 6 UPDATE Customers SET Contact=‘Jo Bloggs’, City=‘Warwick’ WHERE CustomerName=‘John Smith’;

Frequency of Queries  How are often do we think each query will be invoked? – Monthly (payroll, sales reports) – Daily (job lists, delivery routes) – Every minute (“dashboard” apps) – Many times a second (web-facing search queries)  Generates expected access frequency for each attribute in file – Heuristic: rule, 80% of processing on only 20% of data – Application: optimize only the most frequently accessed 20% CS mantra: “Optimize for the most common case” CS346 Advanced Databases 7

Time, Update, Uniqueness constraints  Some queries have performance constraints – “Should terminate with 5 seconds” (95% of the time) – “Should never take more than 20 seconds” – Affected attributes are high priority for access paths  Some attributes are known to be frequently updated – Avoid too many indexes: slow to update  Attributes with uniqueness (key) constraints should have indexes – Allows DBMS to check uniqueness when inserting new record – E.g. ensure at most one record per student id number CS346 Advanced Databases 8

Decisions to Make  What indexes should we create? – Which relations should have indexes? – What fields should be the search key? – Should we build multiple indexes for the relation?  For each index, what kind of index is it? – Clustering? Primary? Secondary?  Should we modify the schema? – Consider alternative normalized forms? – Maybe undo some normalization for better performance? CS346 Advanced Databases 9

Creating an index in SQL  Index creation is not part of the SQL standard – But similar syntax supported by most DBMSs  CREATE [ UNIQUE ] INDEX ON ( [ ] {, [ ]} ) [CLUSTER] ; – CLUSTER means index should also sort data based on the attribute Creates a clustering index for non-key Creates a primary index if attribute is key – UNIQUE means indexed attribute values must be unique – ORDER is either ASC (ascending, default) or DESC (descending) CS346 Advanced Databases 10 CREATE INDEX DnoIndex ON EMPLOYEE (Dno) CLUSTER; CREATE INDEX EmpIndex ON EMPLOYEE (Lname, Fname);

Denormalization to Speed up queries  Why normalize? [Third normal form 3NF, Boyce-Codd BCNF] – Split attributes across tables to minimize redundancy, errors – Ensures every row has a unique key, avoid update anomalies  Normalization can sometimes work against efficiency – Data that “logically” belongs together can be split across tables  Denormalization: store database in weaker normal form (eg 2NF) – Faster access time, for more expensive updates – E.g. store the join of tables R and S, rather than R and S separately  Denormalization (re)introduces redundancy in the base tables – E.g. now there are functional dependencies in the data CS346 Advanced Databases 11

Denormalization Example  Consider three relations in 3NF: – EMP (Emp_id, Emp_name, Emp_job_title) – PROJ (Proj_id, Proj_name, Proj_mgr_id) – EMP_PROJ (Emp_id, Proj_id, Percent_assigned)  Create a table of employee assignments from joins: ASSIGN (Emp_id, Proj_id, Emp_name, Emp_Job_title, Percent_Assigned, Proj_name, Proj_mgr_id, Proj_mgr_name) – Avoids performing the joins if ASSIGN is repeatedly queried  Only meets 1NF as there are (non-key) functional dependencies: – E.g. Proj_id  Proj_name, Proj_mgr_id  Can create ASSIGN as a view [result of a stored query] on tables – Materialized view: ASSIGN is stored on disk (vs created on demand) CS346 Advanced Databases 12

Database Tuning  Database Tuning: – Revising/adjusting the physical database design – Monitor resource utilization and internal DBMS processing – Reveal bottlenecks such as contention for data or devices  Goals of database tuning can include: – To make a particular application run faster – To lower the response time of queries/transactions – To improve the overall throughput of transactions  Considerations of tuning are very close to those for design – But additional statistics are available to those mentioned before CS346 Advanced Databases 13

Tuning Statistics  For tuning, the DBMS can maintain more statistics: – Size of individual tables – Number of distinct values in a column – How often a particular query or transaction is submitted/executed – Time required for different phases of query/transaction processing  Further statistics obtained from monitoring: – Storage statistics: how many disk blocks are used – I/O and device performance statistics – Query/transaction processing: how long query optimization takes – Locking/logging statistics: how long these steps take – Index statistics (levels, blocks etc.)  Can try to optimize many things through tuning: – Optimize buffer size, processor scheduling, disk/RAM usage – Avoid lock contention, minimize logging overhead [see later] CS346 Advanced Databases 14

Tuning indexes  A main focus of tuning is on the use of indexes – Certain queries may take too long to run for lack of an index – Some indexes may not get used at all – An index may be too costly as attribute is frequently updated  A few options to tune indexes – Drop or/and build new indexes – Change a non-clustered index to a clustered index (or vice versa) – Rebuild the index / reorganize the file  May need to pause operations while doing this work – Part of “scheduled maintenance” CS346 Advanced Databases 15

Tuning Database Design  If processing requirements change, so may database design – Denormalization: existing tables may be joined – Alternate design: switch to a different design, still in BCNF/3NF – Vertical partitioning: break one relation into many, with same key – Repeat attribute(s) from one table to another – Horizontal partitioning: split data by attribute values E.g. break SALES into UK_SALES, FR_SALES, DE_SALES… CS346 Advanced Databases 16

Tuning Queries  Recall: often databases have stored queries that are run often  Indications we need to tune stored queries: – A query causes too many disk accesses – The query plan shows that relevant indexes are not being used  Many complex considerations in tuning queries – Should we use temporary result tables between queries? Can be better than recomputing results many times – Are multiple join conditions possible? [E.g. SSN or name and d.o.b.] Pick those that use cluster indexes, avoid strings – Ordering of tables in a FROM clause may affect join processing – Some query optimizers perform worse on nested queries – Queries posed against a view could be faster on the base tables CS346 Advanced Databases 17

Tuning Query Guidelines  Query optimizer may not make use of indexes in some cases – E.g. A query with many conditions ORed together – Split into the union of multiple queries that will use index  Example of query splitting to encourage use of indexes: – With ‘OR’: SELECT Fname, Lname, Salary, Age FROM EMPLOYEE WHERE Age > 45 OR Salary < 50000; – With ‘UNION’: SELECT Fname, Lname, Salary, Age FROM EMPLOYEE WHERE Age > 45 UNION SELECT Fname, Lname, Salary, Age FROM EMPLOYEE WHERE Salary < 50000; CS346 Advanced Databases 18

Query Tuning guidelines:  Convert NOT condition into a positive expression: – NOT Age = 18  If an equijoin has a range condition on the join attribute in one table, repeat it for the other table  Ensure WHERE conditions make use of composite indexes – E.g. suppose we have an index on (Region, Product) – Q1: SELECT Region, Product, Month, Sales FROM SalesStats WHERE Region=3 AND ((Product=3) OR (Product=8)); – Q2: SELECT Region, Product, Month, Sales FROM SalesStats WHERE (Region=3 AND Product=3) OR (Region=3 AND Product=8); CS346 Advanced Databases 19

Summary CS346 Advanced Databases 20  Database design: going from logical design to physical layout – Which indexes to create? When to apply denormalization? – Informed by expected statistics on query frequency  Database tuning: modifying the design from experience – Which indexes to add/remove? How to rewrite stored queries? Chapter: “Physical Database Design and Tuning” in Elmasri and Navathe