Presentation is loading. Please wait.

Presentation is loading. Please wait.

IT 20303 The Relational DBMS Section 06. Relational Database Theory Physical Database Design.

Similar presentations


Presentation on theme: "IT 20303 The Relational DBMS Section 06. Relational Database Theory Physical Database Design."— Presentation transcript:

1 IT 20303 The Relational DBMS Section 06

2 Relational Database Theory Physical Database Design

3 Relational Database Theory Physical Database Design –Goals Improve performance –By minimizing disk I/O Improving management of the data –By grouping tables that can be managed as a group

4 Relational Database Theory Physical design decisions are based on: –Use of the data (volume, frequency) –Features supported by the specific RDBMS –Disk storage configuration

5 Relational Database Theory DBA initially sets up the physical database –Tunes physical parameters on a ongoing basis As usage patterns change As new hardware/software options become available

6 Steps in Physical Design Process –Determine which tables can be managed as a group Many RDBMSs support the concept of a Container (Oracle Tablespace, db space, Access uses the.mdb) –A collection of tables, and indexes Relational Database Theory

7 –Develop a plan for allocating tables to disk devices Consider parallel disk controllers Group tables together that are frequently joined Distribute heavily accessed table to different disk devices –To avoid excessive head movement on one disk Relational Database Theory

8 –Build indexes on table columns, based on frequency of use –Restructure tables if necessary Fragment large tables into multiple smaller ones De-normalize tables if appropriate Relational Database Theory

9 Example of a Container Table 1 Table 2 Table N Tablespace OS File

10 Managing a collection of Tables, Indexes –Purpose of container concept Relate tables, indexes to physical disk files Aid in the management of the database –Example: A tablespace can be taken offline, backed up, and restored while the remainder of the database is online Relational Database Theory

11 –Support clustering data from related tables in the same file So that related data is read with the same I/O request Relational Database Theory

12 How the RDBMS processes a user request –RDBMS parses, validates, and optimizes the SQL request –Determines disk file in which the table is written Specific to each RDBMS & OS Relational Database Theory

13 –Initiates I/O request to operating system, if necessary I/O is requested if file is not currently in buffers –Processes execution plan using data in its buffer Relational Database Theory

14 Indexes –Index is a separate structure (table) Points into the data table Built on one or more columns in the data table Relational Database Theory

15 Comments on Indexing –An index can be built on any column or combination of columns –An index can be unique or non-unique –An index on the primary key is called the primary index –Most RDBMSs use an internal row id as the pointer to the row –Use of the index is transparent to the user Relational Database Theory

16 Use of an index –Provides access to a row based on data value(s) –Avoids duplicates – only way –Supports sequential processing on the indexed field –Improves performance Relational Database Theory

17 Use of an index improves performance on Retrieval –Processing an index is more efficient than processing a table – for reads Index is usually small, relative to the table –Can be held entirely in memory The smaller the index value, the more entries per block the more likely the index will be in memory Relational Database Theory

18 Most RDBMSs use a type of B-Tree Index –B-tree indexes were designed for efficient search of a sorted list –Algorithms exist for managing and maintaining B-trees Relational Database Theory

19 B-trees were introduced by Bayer (1972) and McCreight. –They are a special m-ary balanced tree used in databases because their structure allows records to be inserted, deleted, and retrieved with guaranteed worst-case performance Relational Database Theory

20 B-Tree Relational Database Theory

21 Use of index degrades performance on Updates –Inserting a row is the source of much disk I/O (overhead) Every index on the table must be searched and updated also Relational Database Theory

22 Frequently inserting rows leads to index block overflow –Causes much disk I/O as overflow condition is processed Relational Database Theory

23 Techniques for managing volatile tables (many interests, deletes) –Partially fill index blocks when creating the index –Periodically restructure (Drop, Create) the indexes Relational Database Theory

24 Indexing: Strengths and Weaknesses –Strengths Improves performance on retrieval of data Can be built or dropped at any time Usage is transparent to the user –Weaknesses Degrades update performance Relational Database Theory

25 De-normalization –De-normalization means combining two (or more) tables Usually done when tables are frequently joined –De-normalization (joining two tables) depends on usage Depends on how applications and users access the data Relational Database Theory

26 De-normalization is done to improve performance –Tailors data structures for one specific application’s use –Improves performance of one type of access at expense of others Relational Database Theory

27 De-normalization Trade-Offs NormalizationDe-normalization Eliminates update anomaliesImproves performance for specific application(s) Minimizes data redundancy Supports simpler logic Provides application- independent database design Encourages sharing of data

28 When to De-Normalize –This is EVIL, Do Not Do… –When does de-normalization have minimal impact? Data is accessed primarily on a read-only basis Data is accessed primarily by one application Relational Database Theory

29 When to de-normalize –After database design is done and tables are normalized to 3NF –After clustering related tables in the same logical container –After considering trade-offs and usage of data Relational Database Theory

30 Alternatives to de-normalization –Physical placement of data Use of container Can improve performance without impacting logical design –Selective hardware upgrades More main memory, expanded storage, cache storage devices Relational Database Theory

31 Fragmentation – Better alternative to de- normalization –Means breaking one table into two (or more) tables Usually done when one table is very large Or groups of user almost exclusively access a subset of data in a table Relational Database Theory

32 Fragmentation can be based on selection or projection –Must be able to reconstruct the original table – by union or join –Primary key column(s) must be included in all vertical fragments Disadvantage is that the DBA must be aware of all the fragmented tables Relational Database Theory

33 Physical Design Review Relational Database Theory

34 Physical Database Design –Goals Improve performance –By minimizing disk I/O Improving management of the data –By grouping tables that can be managed as a group

35 Indexing: Strengths and Weaknesses –Strengths Improves performance on retrieval of data Can be built or dropped at any time Usage is transparent to the user –Weaknesses Degrades update performance Relational Database Theory

36 Questions? Relational Database Theory


Download ppt "IT 20303 The Relational DBMS Section 06. Relational Database Theory Physical Database Design."

Similar presentations


Ads by Google