Database Tuning Prerequisite Cluster Index B+Tree Indexing Hash Indexing ISAM (indexed Sequential access)

Slides:



Advertisements
Similar presentations
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Database Tuning. Overview v After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for our database.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Overview of Indexing Chapter 8 – Part II. 1. Introduction to indexing 2. First glimpse at indices and workloads.
Overview of Storage and Indexing
Manajemen Basis Data Pertemuan 7 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
1 Physical Design: Indexing Yanlei Diao UMass Amherst Feb 13, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
1 Physical Database Design and Tuning Module 5, Lecture 3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Overview of Indexing Chapter 8 – Part II. 1. Introduction to indexing 2. First glimpse at indices and workloads.
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Physical Database Design and Database Tuning, R. Ramakrishnan and J. Gehrke, modified by Ch. Eick 1 Physical Database Design Part II.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
© Pearson Education Limited, Chapter 13 Physical Database Design – Step 4 (Choose File Organizations and Indexes) Transparencies.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Storage and Indexing1 Overview of Storage and Indexing.
1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope ( )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design and Tuning Chapter 20.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Overview of Storage and Indexing Content based on Chapter 4 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Methodology – Physical Database Design for Relational Databases.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Physical DB Design Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.
CS522 Advanced database Systems Huiping Guo Department of Computer Science California State University, Los Angeles 3. Overview of data storage and indexing.
Practical Database Design and Tuning
Record Storage, File Organization, and Indexes
Methodology – Physical Database Design for Relational Databases
Physical Database Design for Relational Databases Step 3 – Step 8
Evaluation of Relational Operations: Other Operations
File organization and Indexing
Lecture 12 Lecture 12: Indexing.
Overview of Storage and Indexing
CS222: Principles of Data Management Notes #09 Indexing Performance
Chapter 8 – Part II. A glimpse at indices and workloads
Overview of Storage and Indexing
CS222P: Principles of Data Management Notes #09 Indexing Performance
Evaluation of Relational Operations: Other Techniques
Overview of Storage and Indexing
Physical Database Design and Tuning
Overview of Storage and Indexing
Chapter 11: Indexing and Hashing
Evaluation of Relational Operations: Other Techniques
Physical Database Design
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
Overview of Storage and Indexing
Presentation transcript:

Database Tuning Prerequisite Cluster Index B+Tree Indexing Hash Indexing ISAM (indexed Sequential access)

Physical Data base Design and Tuning The creation of physical schema after the creation of conceptual schema and integrity constrains is physical data base design Refining the conceptual schema and hence modifying physical schema and indexing schemes to improve the performance of the underlying database is called tuning Work load. An estimate of the frequency and selectivity of a list of expected queries and update operations, with user performance preferences if any is called work load

Work load – List of queries and their frequencies – List of updates and their frequencies – Performance goals (as expected by users) on the listed queries and updates Analysis on the work load – For each query Identify the relations which are accessed by the operation Attributes which are retained in the (SELECT CLAUSE) Attributes which have selection or join conditions on them in (WHERE CLAUSE) Selectivity of the conditions – For each update operation Identify the relations which are accessed by the operation Kind of update (DELETE, INSERT or UPDATE) The fields that are modified by the UPDATE

Important decisions made during tuning Creation of Indices – What type of index to use – Should the index be clustered or not Alteration to the conceptual schema – Choosing alternative normalization schemes – Denormalization – Vertical Partitioning – Views Rewriting frequently run queries and transactions effectively

Guidelines with example Decide whether to index a particular query attribute or not Issues – When there is no index the time to search a tuple is O(n) – b is the total number of tuples – Indexing an attribute does increase the query performance, since the searched tuple can easily be traced without a sequential scan less time Caution – When there are more updates on the attribute than query then adding an index may increase the overhead of update operations

Guideline 2 Selection conditions decide Indexing techniques – An equality selection would be faster with a hash indexing scheme SELECT E.Dno FROM Employee E WHERE E.ename = ‘Roy’ – A range selection would be faster with a B+Tree indexing SELECT E.Dno FROM Employee E WHERE E.Age < 40

Guideline 3 Join conditions decide the indexing Scheme – When there is a join condition on the attributes to be indexed – Selecting a hash indexing scheme on the inner relation attribute would be better than B+Tree SELECT E.ename, D.mgr FROM Employee E, Department D WHERE E.dno = D.dno A hash indexing on D.dno may be faster than B+Tree

Guideline 4 A very important decision regarding which attribute should be chosen for clustering – At most one attribute can be used for clustering (i.e., physical ordering of the records in the disk) – Only sequential scan on the attribute benefits from clustering since there will be fewer disk accesses on the tuple – If a selectivity of a query is low or if the frequency of a particular query is low, clustering the corresponding attributes may not be beneficial – If there is no sequencial access but only random accesses on the attribute, then a hash indexing without clustering may be enough. – To choose the best attribute for clustering, consider all the queries, updates, their frequencies and selectivity and choose the attributes which in frequent queries with range conditions with high selectivity.

Guideline 5 Balancing the cost of index maintenance – Indexes may need to be dropped if they slow down any of the frequent update operation – Indexes may also speed up update operations when there is a WHERE condition in the update operation UPDATE SET E.Income = WHERE E.eid = 123

Additonal Guideline 1 Analyse the query evaluation plan of the underlying database engine E.g., SELECT E.ename, D.mgr FROM Employee E, Department D WHERE D.dname = ‘Toy’ AND E.dno = D.dno Here both D.dname and D.dno may be indexed

Query Evaluation Plan guides index choices Department D Department σ Dname =‘Toy’ D.dno = E.dno ∏ E.ename Employee E the join is done on the resulting tuple and not on the database. So an index on D.dno is unnecessary All tuples with Dname = ‘Toy’ gets selected from the database Always the attributes in the lowest point of the plan are to be indexed. Others need not be indexed

Clustering Indices Clustered B+Tree UnClustered B+Trees Physical Storage in Secondary Storage devices If three records fit in a page A range operation < 60 will make 2 access to the disk If three records fit in a page A range operation < 60 will make 6 access to the disk once for every tuple

Selectivity decides clustering and non clustering index schemes When there are more than one query plan available, the selectivity of a condition decides the best plan and also the attribute to be clustered SELECT E.ename, D.Mgr FROM Employee E, Department D WHERE E.hobby = ‘stamps’ E.salary Between AND E. Dno = D.Dno

Query Plan 1 Is almost all are collecting stamps and are salaried in the given range, then an index need not be present on the two attributes (High selectivity). Department D E.dno = D.dno ∏ E.ename Employee E Selection is done on the resulting tuple. So an index on E.hobby and E.salary are unnecessary σ E.Hobby =‘Stamps’ σ Salary Between(10000,30000) A join is done using E.dno as external tuple. So a clustered index on E.dno is necessary A hash index is desirable for internal tuple attribute D.dno. It need not be clustered because it is not accessed sequentially

Query Plan 2 If only a few are collecting stamps (Low selectivity) Employee E D.dno = E.dno ∏ E.ename Department D Selection is done on the resulting tuple. So an index on E.Dno and E.salary are not necessary σ E.Hobby =‘Stamps’ σ Salary Between(10000,30000) Selection is done on the resulting tuple. So an index on E.Hobby is necessary and also a clustered index

Impact of clustering on Cost of operation Cost Percentage of tuples retrieved (selectivity) Unclustered index scheme Percentage of tuples retrieved (selectivity) Clustered index scheme Percentage of tuples retrieved (selectivity) Sequential scan Cost Percentage of tuples retrieved (selectivity) All schemes Cost The range for which unclustered index is better than sequential scan

Co-clustered indexing Co clustered index on manager income in department tuple and employee income in Employee tuple Each Department record is followed by its employees’ records in the secondary storage device. Clustered B+Tree Clustered Index on manager income (22 => 22000) SELECT ALL E.ename FROM Employee E, Department D WHERE D.MgrIncome > 53 AND D.Dno = E.Dno Selecting all employees who work under managers with income > 53 Only looks through the index on manager’s income an directly accesses his employees records on the disk

Indexes on multiple attribute search keys The following query returns all employees with age between 20 and 30 and having a salary from 3000 to 5000 SELECT E.eid FROM Employee E WHERE E.age BETWEEN 20 and 30 AND E.sal BETWEEN 3000 and 5000 A composite clustered B+ tree index on (age,salary) would be a better choice than one on (salary,age) since more employees would be having same age than the same salary A composite clustered index on (age,sal) first sorts tuples according to age and then within each age group sorts them according to salary

Index only plans If only index files are used to answer the query wthout accessing the actual data from the table then it is called index only plan If a composite index happens to have all relevant attributes of a query (including the one after the SELECT clause) as a search key then no database access is needed. SELECT E.eid FROM Employee E WHERE E.age BETWEEN 20 and 30 AND E.sal BETWEEN 3000 and 5000 A composite index on (age, sal, eid) is defined then no database access is needed. All eids in the index file satisfying the condition can be displayed as result In an index only access like this one clustering of physical records is not necessary.

Index only plans SELECT E.dno, COUNT(*) FROM Employee E GROUP BY E.dno SELECT E.dno, MIN(E.sal) FROM Employee E GROUP BY E.dno In the example if there is an index on E.dno we can just count the number of entries for each dno and give that as an answer for count(*). We need not access the secondary storage for tuples. This is also an example of index only plans Since index only schemes are faster, composite indices may be defined for attributes which are just projected. In the example salary is not used in conditions but having a composite (dno,sal) will save us from accessing the secondary device and just get away with the index files See for more examples from pages 472 an 473 in Gehrke

Data base Tuning Query Rewriting SELECT MIN (E.age) FROM Employee E GROUP BY E.Dno HAVING E.Dno = 102 SELECT MIN (E.age) FROM Employee E WHERE E.Dno = 102 SELECT * INTO TEMP FROM Employee E, Department D WHERE E.Dno = D.dno AND D.mgrname = ‘robby’ SELECT T.Dno, Avg(T.Sal) FROM TEMP T GROUP BY T.Dno SELECT E.Dno, Avg(E.Sal) FROM Employee E, Department D WHERE E.Dno = D.dno AND D.mgrname = ‘robby’ GROUP BY E.Dno Eliminating GROUP operation avoids expensive sorting Combining steps to form a single query avoids creating an unnecessary table Temp

Impact of concurrency The duration for which transactions hold a lock can affect performance significantly. Tuning transactions by writing to local variables and deferring database access can improve performance Replacing a transaction with several smaller transactions improve performance A careful partitioning of tuples in a relation and its associated indexes across a collection of disks can improve concurrent access If DBMS uses specialized locking protocols for tree indexes and sets fine – granularity locks concurrencu improves

Performance benchmarks To assist users in choosing a DBMS that well suits their needs, several performance benchmarks have been developed. They should be portable, easy to understand, and scale naturally to larger problem instances They should measure peak performance as well as price/performance ratios Transaction processing Council was created to define benchmarks for transaction processing and database systems E.g. TCP-A TCP- B bench marks (Online transaction processing benchmarks) Wisconsin benchmark (Query benchmark) 001 and 007 benchmarks (Object Database benchmarks) Read Section 16.8 in Gerhke for Tuning Conceptual Schema