1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.

Slides:



Advertisements
Similar presentations
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Database Tuning. Overview v After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for our database.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design Section Chapter 20.
1 Overview of Indexing Chapter 8 – Part II. 1. Introduction to indexing 2. First glimpse at indices and workloads.
Manajemen Basis Data Pertemuan 7 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
1 Physical Design: Indexing Yanlei Diao UMass Amherst Feb 13, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
Physical Database Design R&G Chapter 16 Lecture 26.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Normalization DB Tuning CS186 Final Review Session.
1 Physical Database Design and Tuning Module 5, Lecture 3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design Chapter 20.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
1 Overview of Indexing Chapter 8 – Part II. 1. Introduction to indexing 2. First glimpse at indices and workloads.
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
Physical Database Design and Database Tuning, R. Ramakrishnan and J. Gehrke, modified by Ch. Eick 1 Physical Database Design Part II.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design and Tuning.
Database Tuning Prerequisite Cluster Index B+Tree Indexing Hash Indexing ISAM (indexed Sequential access)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Queries, Database Design, Constraint Enforcement Specify Schema + specify constraints.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Storage and Indexing1 Overview of Storage and Indexing.
1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope ( )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design and Tuning Chapter 20.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Overview of Storage and Indexing Content based on Chapter 4 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Physical DB Design Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design Section Chapter 20.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
CS522 Advanced database Systems Huiping Guo Department of Computer Science California State University, Los Angeles 3. Overview of data storage and indexing.
Practical Database Design and Tuning
Record Storage, File Organization, and Indexes
Storage and Indexes Chapter 8 & 9
Evaluation of Relational Operations: Other Operations
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Lecture 12 Lecture 12: Indexing.
Overview of Storage and Indexing
CS222: Principles of Data Management Notes #09 Indexing Performance
File Organizations and Indexing
File Organizations and Indexing
Practical Database Design and Tuning
Physical Database Design
Chapter 8 – Part II. A glimpse at indices and workloads
Overview of Storage and Indexing
CS222P: Principles of Data Management Notes #09 Indexing Performance
Overview of Storage and Indexing
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Overview of Storage and Indexing
Physical Database Design and Tuning
Physical Database Design
Overview of Storage and Indexing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Physical Database Design
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
Overview of Storage and Indexing
Physical Database Design and Tuning
Presentation transcript:

1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li

2 Overview  After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for our database.  The next step is to choose indexes, make clustering decisions, and to refine the conceptual and external schemas (if necessary) to meet performance goals.  We should start by understanding the workload :  Most important queries and how often they arise.  Most important updates and how often they arise.  Desired performance goals for those queries/updates?

3 Decisions to Be Made Include...  What indexes should we create?  Which relations should have indexes? What field(s) should be their search key? Should we build several indexes?  For each index, what kind of an index should it be?  B+ tree? Hashed? Clustered or unclustered?  Should we make changes to the conceptual schema?  Consider alternative normalized schemas? (There are multiple choices when decomposing into BCNF, etc.)  Should we ``undo ’’ some decomposition steps and settle for a lower normal form? ( Denormalization. )

4 Understanding the Workload  For each query in the workload:  Which relations does it access?  Which attributes are retrieved?  Which attributes are involved in selection/join conditions? (And how selective are these conditions likely to be?)  For each update in the workload:  Which attributes are involved in selection/join conditions? (And how selective are these conditions likely to be?)  The type of update ( INSERT/DELETE/UPDATE ), and the attributes that are affected.

5 Index Classification ( Review )  Primary vs. secondary : If search key contains the primary key, then called the primary index.  Unique index: Search key contains a candidate key.  Clustered vs. unclustered : If order of data records is the same as, or `close to ’, the order of stored data records, then called a clustered index.  A table can be clustered on at most one search key.  Cost of retrieving data records via an index varies greatly based on whether index is clustered or not!

6 Clustered vs. Unclustered Indexes (Review) Index entries Data entries direct search for (Index File) (Data file) Data Records data entries Data entries Data Records CLUSTERED UNCLUSTERED (Read each page once.) (Read more pages – and repeatedly!)

7 Choice of Indexes (Contd.)  One approach: Consider the most important queries in turn. Consider the best query plan using the current indexes, and see if a better plan is possible with an additional index. If so, create it.  This implies that we must understand and see how a DBMS evaluates its queries (query evaluation plans).  Let’s start by discussing simple 1-table queries!  Before creating an index, must also consider its impact on updates in the workload.  Trade-off: Indexes can make queries go faster, but updates will become slower. (Indexes require disk space, too.)

8 Index Selection Guidelines  Attributes in WHERE clause are candidates for index keys.  Exact match condition  hashed index (B+ tree if not available).  Range query  B+ tree index. Clustering especially useful for range queries; also helps equality queries with duplicates (non-key field index).  Multi-attribute search keys should be considered when a WHERE clause contains several conditions.  Order of attributes important for range queries.  Such indexes can sometimes enable index-only strategies for important queries (e.g., aggregates / grouped aggregates). For index-only strategies, clustering isn’t important!  Choose indexes that benefit as many queries as possible.  Only one index can be clustered per relation, so choose it based on important queries that can benefit the most from clustering.

9  B+ tree index on E.age can be used to get qualifying tuples.  How selective is the condition?  Should the index be clustered?  Consider the GROUP BY query.  If most tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly.  Clustered E.dno index may win!  Equality queries & duplicates:  Clustering on E.hobby helps! Examples of Clustered Indexes SELECT E.dno FROM Emp E WHERE E.age > 40 SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age > 10 GROUP BY E.dno SELECT E.dno FROM Emp E WHERE E.hobby=‘Stamps’

10 Index-Only Query Plans  Some queries can be answered without retrieving any tuples from one or more of the relations involved if a suitable index is available. SELECT E.dno, COUNT (*) FROM Emp E GROUP BY E.dno

11 Tuning the Conceptual Schema  The choice of conceptual schema should be guided by the workload, in addition to redundancy issues:  We may settle for a 3NF schema rather than BCNF.  Workload may influence the choice we make in decomposing a relation into 3NF or BCNF.  We might denormalize (i.e., undo a decomposition step), or we might add fields to a relation.  We might consider vertical decompositions.  If such changes come after a database is in use, it’s called schema evolution ; might want to mask some of the changes from applications by defining views.

12 Example Schemas  We will concentrate on Contracts, denoted as CSJDPQV. The following ICs are given to hold: JP  C, SD  P, C  CSJDPQV.  Find the candidate keys for CSJDPQV.  Currently this relation is in 3NF. Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val)

13 Settling for 3NF vs BCNF  CSJDPQV can be decomposed into SDP and CSJDQV, and both relations are then in BCNF  Lossless decomposition, but not dependency-preserving.  Adding CJP makes it dependency-preserving as well.  Suppose that this query is very important:  Find the number of copies Q of part P ordered in contract C.  Requires a join on the decomposed schema, but can be answered by a scan of the original relation CSJDPQV.  Could lead us to settle for the 3NF schema CSJDPQV!

14 More Guidelines for Query Tuning  Minimize the use of DISTINCT : don ’ t need/say it if duplicates are acceptable or answer contains a key.  Perhaps minimize the use of GROUP BY and HAVING : SELECT MIN (E.age) FROM Employee E GROUP BY E.dno HAVING E.dno=102 SELECT MIN (E.age) FROM Employee E WHERE E.dno=102

15 Guidelines for Query Tuning (Contd.)  Avoid using intermediate relations: SELECT * INTO Temp FROM Emp E, Dept D WHERE E.dno=D.dno AND D.mgrname= ‘ Joe ’ SELECT T.dno, AVG (T.sal) FROM Temp T GROUP BY T.dno vs. SELECT E.dno, AVG (E.sal) FROM Emp E, Dept D WHERE E.dno=D.dno AND D.mgrname= ‘ Joe ’ GROUP BY E.dno +  Does not materialize the intermediate reln Temp.  If there is a dense B+ tree index on, an index-only plan can be used to avoid retrieving Emp tuples in the second query!

16 Summary  Database design consists of several tasks: requirements analysis, conceptual design, schema refinement, physical design and tuning.  In general, have to go back and forth between these tasks to refine a database design, and decisions in one task can influence the choices in another task.  Understanding the nature of the workload for the application, and the performance goals, is essential to developing a good design.  What are the important queries and updates? What attributes/relations are involved?

17 Summary ( Cont’d.)  The conceptual schema should be refined by considering performance criteria and workload:  May choose 3NF or lower normal form over BCNF.  May choose among alternative decompositions into BCNF (or 3NF) based upon the workload.  May denormalize, or undo, some decompositions.

18 Summary ( Cont’d.)  Over time, indexes may have to be fine-tuned (dropped, created, re-built,...) for performance.  Should determine the query plan(s) used by the system, and adjust the choice of indexes appropriately.  System may still not find a good plan:  Null values, arithmetic conditions, string expressions, the use of OR s, etc., can potentially confuse an optimizer.  So, may have to rewrite the query/view:  Avoid nested queries, temporary relations, complex conditions, and operations like DISTINCT and GROUP BY.