TM 5-1 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 5: Physical Database Design and Performance Jason C. H. Chen,

TM 5-1 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 5: Physical Database Design and Performance Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA 99258 chen@jepson.gonzaga.edu

TM 5-2 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Objectives Definition of terms Describe the physical database design process Choose storage formats for attributes Select appropriate file organizations Describe three types of file organization Describe indexes and their appropriate use Translate a database model into efficient structures Know when and how to use denormalization

TM 5-3 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Physical Database Design Purpose–translate the logical description of data into the technical specifications for storing and retrieving data Goal–create a design for storing data that will provide adequate performance and insure database integrity, security, and recoverability

TM 5-4 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Physical Design Process Normalized relations Volume estimates Attribute definitions Response time expectations Data security needs Backup/recovery needs Integrity expectations DBMS technology used Inputs Attribute data types Physical record descriptions (doesn’t always match logical design) File organizations Indexes and database architectures Query optimization Leads to Decisions

TM 5-6 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems What will we learn from Usage Map? Why should Manufacutred_Part and Purchased_Part tables be separated? Why should SUPPLIER and SUPPLIES tables be combined?

TM 5-7 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 7 Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.) Database access frequencies are estimated from “transaction volumes.” Data (transaction) volumes

TM 5-9 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis: 14,000 purchased parts accessed per hour  8000 quotations accessed from these 140 purchased part accesses  7000 suppliers accessed from these 8000 quotation accesses

TM 5-10 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis: 7500 suppliers accessed per hour  4000 quotations accessed from these 7500 supplier accesses  4000 purchased parts accessed from these 4000 quotation accesses

TM 5-11 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 11 Designing Fields Field: smallest unit of data in database Field design –Choosing data type –Coding, compression, encryption –Controlling data integrity

TM 5-12 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 12 Choosing Data Types CHAR–fixed-length character VARCHAR2–variable-length character (memo) LONG–large number NUMBER–positive/negative number INEGER–positive/negative whole number DATE–actual date BLOB–binary large object (good for graphics, sound clips, etc.)

TM 5-13 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 13 Figure 5-2 Example code look-up table (Pine Valley Furniture Company) Code saves space, but costs an additional lookup to obtain actual value

TM 5-14 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 14 Field Data Integrity Default value–assumed value if no explicit value – e.g., s_state CHAR(2) DEFAULT ‘WI’; Range control–allowable value limitations (constraints or validation rules) Null value control–allowing or prohibiting empty fields Referential integrity–range control (and null value allowances) for foreign-key to primary-key match- ups Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity

TM 5-15 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Handling Missing Data Substitute an estimate of the missing value (e.g., using a formula) Construct a report listing missing values In programs, ignore missing data unless the value is significant (sensitivity testing) Triggers can be used to perform these operations

TM 5-16 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Physical Records Physical Record: A group of fields stored in adjacent memory locations and retrieved together as a unit Page: The amount of data read or written in one I/O operation Blocking Factor: The number of physical records per page

TM 5-17 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Denormalization Transforming normalized relations into unnormalized physical record specifications Benefits: –Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessary join queries) Costs (due to data duplication) –Wasted storage space –Data integrity/consistency threats Common denormalization opportunities –One-to-one relationship (Fig. 5-3) –Many-to-many relationship with attributes (Fig. 5-4) –Reference data (1:N relationship where 1-side has data not used in any other relationship) (Fig. 5-5)

TM 5-19 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Figure 5-4 A possible denormalization situation: a many-to-many relationship with nonkey attributes Extra table access required Null description possible

TM 5-21 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Partitioning Horizontal Partitioning: Distributing the rows of a table into several separate files –Useful for situations where different users need access to different rows –Three types: Key Range Partitioning, Hash Partitioning, or Composite Partitioning Vertical Partitioning: Distributing the columns of a table into several separate relations –Useful for situations where different users need access to different columns –The primary key must be repeated in each file Combinations of Horizontal and Vertical Partitions often correspond with User Schemas (user views)

TM 5-22 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Partitioning (cont.) Advantages of Partitioning: –Efficiency: Records used together are grouped together –Local optimization: Each partition can be optimized for performance –Security, recovery –Load balancing: Partitions stored on different disks, reduces contention –Take advantage of parallel processing capability Disadvantages of Partitioning: –Inconsistent access speed: Slow retrievals across partitions –Complexity: Non-transparent partitioning –Extra space or update time: Duplicate data; access from multiple partitions

TM 5-24 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Oracle 11g Horizontal Partitioning Methods Range partitioning –Partitions defined by range of field values –Could result in unbalanced distribution of rows –Like-valued fields share partitions Hash partitioning –Partitions defined via hash functions –Will guarantee balanced distribution of rows –Partition could contain widely varying valued fields List partitioning –Based on predefined lists of values for the partitioning key Composite partitioning –Combination of the other approaches

TM 5-25 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 25 Designing Physical Files Physical File: –A named portion of secondary memory allocated for the purpose of storing physical records –Tablespace–named set of disk storage elements in which physical files for database tables can be stored –Extent–contiguous section of disk space Constructs to link two pieces of data: –Sequential storage –Pointers : field of data that can be used to locate related fields or records

TM 5-27 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 27 File Organizations Technique for physically arranging records of a file on secondary storage Factors for selecting file organization: –Fast data retrieval and throughput –Efficient storage space utilization –Protection from failure and data loss –Minimizing need for reorganization –Accommodating growth –Security from unauthorized use Types of file organizations –Sequential (Fig. 5-7a): the most efficient with storage space. –Indexed (Fig. 5-7b) : quick retrieval Join Index, Fig 5-8 –Hashed (Fig. 5-7c) : easiest to update

TM 5-28 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 28 Figure 5-7a Sequential file organization If not sorted Average time to find desired record = n/2 1 2 n Records of the file are stored in sequence by the primary key field values If sorted – every insert or delete requires resort

TM 5-29 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Indexed File Organizations Indexed File Organization: the storage of records either sequentially or nonsequentially with an index that allows software to locate individual records Index: a table or other data structure used to determine in a file the location of records that satisfy some condition Primary keys are automatically indexed Other fields or combinations of fields can also be indexed; these are called secondary keys (or nonunique keys) Oracle has a CREATE INDEX operation, and MS ACCESS allows indexes to be created for most field types

TM 5-31 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 31 Figure 5-7c Hashed file or index organization Hash algorithm Usually uses division- remainder to determine record position. Records with same position are grouped in lists

TM 5-32 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Figure 5-8 Join Indexes–speeds up join operations a) Join index for common non-key columns b) Join index for matching foreign key (FK) and primary key (PK)

TM 5-34 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 34 Clustering Files In some relational DBMSs, related records from different tables can be stored together in the same disk area Useful for improving performance of join operations Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table e.g. Oracle has a CREATE CLUSTER command

TM 5-35 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Cluster Clustering is a method of storing tables that are intimately related and often joined together into the same area on disk. For example, instead of the WORKER table being in one section of the disk and the WORKERSKILL table being somewhere else, their rows could be interleaved together in a single area, called a cluster. The cluster key is the field (or fields) by which the table are usually joined in a query (e.g., Worker_ID in the example). To cluster tables, you must own the tables you are going to cluster together.

TM 5-36 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Fig. 4-10: Mapping an entity with a multivalued attribute (a) Employee entity type with multivalued attribute (b) Mapping a multivalued attribute Multivalued attribute becomes a separate relation with foreign key [Two relations created with one containing all of the attributes except the multivalued attribute, and the second one contains the pk (on the first one) and the multivalued attribute] One–to–many relationship between original entity and new relation WORKER Worker_ID Name LodgingAge Worker_ID Skill Ability WORKERSKILL WORKER Worker_ID Name Age Loding {Skill, Ability}

TM 5-37 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems The following is the basic syntax of the create cluster command: CREATE CLUSTER cluster_name (field-1 datatype, Field-2 datatype …) CREATE CLUSTER WORKERandSKILL ( Judy VARCHAR2(25)); Cluster created. This creates a cluster (a space is set aside, as it would be for a table) with nothing in it. The use of Worker_Cluster for the cluster key is irrelevant; you’ll never use it again. Next, tables are created to be included in this cluster.

TM 5-38 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems CREATE TABLE WORKER ( Worker_ID NUMBER(2), Name VARCHAR2 (25) NOT NULL, Age NUMBER (2), Lodging VARCHAR2 (15) ) CLUSTER WORKERandSKILL (Worker_ID) ; Prior to inserting rows into WORKER, you must create a cluster index: CREATE INDEX WORKERandSKILL_NDX ON CLUSTER WORKERandSKILL; Notice that nowhere does either statement say explicitly that the Worker_ID column goes into the July cluster key. The mentioned in their respective cluster statements. Multiple columns and cluster keys are matched first to first, second to second, third to third, and so on. Now a second table is added to the cluster.

TM 5-39 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems CREATE TABLE WORKERSKILL ( Worker_ID NUMBER(2), Skill VARCHAR2(25) NOT NULL, Ability VARCHAR2 (15) NOT NULL ) CLUSTER WORKERandSKILL (Worker_ID) ; We will investigate how the tables are stored in the cluster. Recall that WORKER table has four fields: Worker_ID, Name, Age, and Lodging. The WORKERSKILL table has three fields: Worker_ID, Skill, and Ability. When these two tables are clustered, each unique Worker_ID is actually stored only once, in the cluster key. To each Worker_ID are attached the columns from both of these tables. The data from both of these tables is actually stored in a single location, almost as if the cluster where a big table containing data drawn from both of the tables that make it up.

TM 5-40 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems AGELODGINGNAMEWORKER_IDSKILLABILITY 23 29 22 18 16 43 41 55 15 33 27 21 25 33 35 32 67 26 25 PAPA KING ROSE HILL CRANMER ROSE HILL MATTS WEITBROCHT ROSE HILL PAPA KING MATTS ROSE HILL CRANMER WEITBROCHT MATTS MULLERS CRANMER ADAH TALBOT ANDREW DYE BART SARJEANT DICK JONES DONALD ROLLO ELBERT TALBOT GEORGE OSCAR GERHARDT KENTGEN HELEN BRANDT JED HOPKINS JOHN PEARSON PALMER WALLBOM PAT LAVAY PETER LAWSON RICHARD KOCH ROLAND BRANDT VICTORIA LYNN WILFRED LOWELL WILLIAM SWING 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 PHOTO SMITHY COMPUTER COMBINE DRIVER COMBINE DRIVR WOODCUTTER SMITHY PHOTO COMPUTER GOOD EXCELLENT SLOW VERY FAST GOOD AVERAGE GOOD PRECISE AVERAGE Figure: How data is stored in clusters from WORKER table from WORKERSKILL table CLUSTER key

TM 5-41 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems AGELODGINGNAMEWORKER_IDSKILLABILITY 23 29 22 18 16 43 41 55 15 33 27 21 25 33 35 32 67 26 25 PAPA KING ROSE HILL CRANMER ROSE HILL MATTS WEITBROCHT ROSE HILL PAPA KING MATTS ROSE HILL CRANMER WEITBROCHT MATTS MULLERS CRANMER ADAH TALBOT ANDREW DYE BART SARJEANT DICK JONES DONALD ROLLO ELBERT TALBOT GEORGE OSCAR GERHARDT KENTGEN HELEN BRANDT JED HOPKINS JOHN PEARSON PALMER WALLBOM PAT LAVAY PETER LAWSON RICHARD KOCH ROLAND BRANDT VICTORIA LYNN WILFRED LOWELL WILLIAM SWING 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 PHOTO SMITHY COMPUTER COMBINE DRIVER COMBINE DRIVR WOODCUTTER SMITHY PHOTO COMPUTER GOOD EXCELLENT SLOW VERY FAST GOOD AVERAGE GOOD PRECISE AVERAGE Figure: How data is stored in clusters Why we do not simply create a table like the one shown above?

TM 5-42 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Optimizing Query Processing - A Quick Way to Speed Access to Your data Indexes may be used to improve the efficiency of data searches to meet particular search criteria after the table has been in use for some time. Therefore, the ability to create indexes quickly and efficiently at any time is important. SQL indexes can be created on the basis of any selected attributes

TM 5-43 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Optimizing Query Processing (conti.) For example: SELECT index_name FROM user_indexes; SELECT * FROM inventory WHERE inv_qoh>= 130; CREATE INDEX qoh_index ON inventory (inv_qoh); Note that this statement defines an index called qoh_index for the qoh column in the inventory table. This index ensures that in the next example SQL only needs to look at row in the database that satisfy the WHERE condition, and is, therefore, quicker to produce an answer.

TM 5-44 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems SQL> SELECT * 2 FROM inventory 3 WHERE inv_qoh>= 130; INV_ID ITEM_ID COLOR INV_SIZE INV_PRICE INV_QOH ---------- ---------- -------------------- ---------- ---------- ---------- 3 3 Khaki S 29.95 150 4 3 Khaki M 29.95 147 6 3 Navy S 29.95 139 7 3 Navy M 29.95 137 9 4 Eggplant S 59.95 135 10 4 Eggplant M 59.95 168 11 4 Eggplant L 59.95 187 19 5 Bright Pink 10 15.99 148 20 5 Bright Pink 11 15.99 137 21 5 Bright Pink 12 15.99 134 10 rows selected. CREATE INDEX qoh_index ON inventory (inv_qoh); SQL> SELECT * 2 FROM inventory 3 WHERE inv_qoh>= 130; INV_ID ITEM_ID COLOR INV_SIZE INV_PRICE INV_QOH ---------- ---------- -------------------- ---------- ---------- ---------- 21 5 Bright Pink 12 15.99 134 9 4 Eggplant S 59.95 135 7 3 Navy M 29.95 137 20 5 Bright Pink 11 15.99 137 6 3 Navy S 29.95 139 4 3 Khaki M 29.95 147 19 5 Bright Pink 10 15.99 148 3 3 Khaki S 29.95 150 10 4 Eggplant M 59.95 168 11 4 Eggplant L 59.95 187 10 rows selected.

TM 5-45 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Optimizing Query Processing (conti.) You may even create an index that prevents you from using a value that has been used before. Such a feature is especially useful when the index field (attribute) is a primary key whose values must not be duplicated: CREATE UNIQUE INDEX ON (the key field); SELECT index_name FROM user_indexes; DROP INDEX ;

TM 5-46 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Summary on Optimizing Query Processing (conti.)  The indexes are defined so as to optimize the processing of SELECT statements. ;  An index is never explicitly referenced in a SELECT statement; the syntax of SQL does not allow this;  During the processing of a statement, SQL itself determines whether an existing index will be used;  An index may be created or deleted at any time;  When updating, inserting or deleting rows, SQL also maintains the indexes on the tables concerned. This means that, on the one hand, the processing time for SELECT statements is reduced, while, on the other hand, the processing time for update statements (such as INSERT, UPDATE and DELETE) can increase.

TM 5-47 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Rules for Using Indexes 1. Use on larger tables 2. Index the primary key of each table 3. Index search fields (fields frequently in WHERE clause) 4. Fields in SQL ORDER BY and GROUP BY commands 5. When there are >100 values but not when there are <30 values

TM 5-48 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Rules for Using Indexes (cont.) 6. Avoid use of indexes for fields with long values; perhaps compress values first 7. If key to index is used to determine location of record, use surrogate (like sequence nbr) to allow even spread in storage area 8. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s) 9. Be careful of indexing attributes with null values; many DBMSs will not recognize null values in an index search

TM 5-49 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Query Optimization Parallel query processing–possible when working in multiprocessor systems Overriding automatic query optimization– allows for query writers to preempt the automated optimization Picking data block size–factors to consider include: –Block contention, random and sequential row access speed, row size Balancing I/O across disk controllers

TM 5-50 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 50 Query Design Guidelines 1. Understand how indexes are used 2. Keep optimization statistics up-to-date 3. Use compatible data types for fields and literals 4. Write simple queries 5. Break complex queries into multiple, simple parts 6. Don’t use one query inside another 7. Don’t combine a table with itself

TM 5-51 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 51 Query Design Guidelines (cont.) 8. Create temporary tables for groups of queries 9. Combine update operations 10. Retrieve only the data you need 11. Don’t have the DBMS sort without an index 12. Learn! – Learning to Learn and Learning to Change! 13. Consider the total query processing time for ad hoc queries

TM 5-52 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 52 Database Architectures (Model) (Figure-Extra) Legacy Systems Current Technology Data Warehouses multidimensional

TM 5-1 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 5: Physical Database Design and Performance Jason C. H. Chen,

Similar presentations

Presentation on theme: "TM 5-1 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 5: Physical Database Design and Performance Jason C. H. Chen,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TM 5-1 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 5: Physical Database Design and Performance Jason C. H. Chen,

Similar presentations

Presentation on theme: "TM 5-1 Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 5: Physical Database Design and Performance Jason C. H. Chen,"— Presentation transcript:

Similar presentations

About project

Feedback