Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class Will Start Momentarily… CS8630 Database Administration.

Similar presentations


Presentation on theme: "CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class Will Start Momentarily… CS8630 Database Administration."— Presentation transcript:

1 CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class Will Start Momentarily… CS8630 Database Administration Dr. Mario Guimaraes

2 CS 8630 Database Administration, Dr. Guimaraes Overview Introduction: input to Physical Design, Decisions Create Index Rewrite SQL / Query Optimizer ( Leccotech ) Leccotech Denormalization, Materialized Views Partition Database Redundant Arrays of Inexpensive Disks (RAID) Redefine Main memory structures (SGA in Oracle) Change default Block Size at installation Export/Import (drop indexes): defragment Check Locks Separate data by category in proper tablespaces Redefining Client-Server Architecture Where should a DBA start when trying to optimize ? Why ? a) DB, b) OS, c) DB Application, 4) Other

3 CS 8630 Database Administration, Dr. Guimaraes DB Design Phases Conceptual Design Logical Design Physical Design

4 CS 8630 Database Administration, Dr. Guimaraes Introduction - Inputs to Physical Design Normalized relations. Volume estimates. Attribute definitions. Data usage: entered, retrieved, deleted, updated. Response time requirements. Requirements for security, backup, recovery, retention, integrity. DBMS characteristics. system

5 CS 8630 Database Administration, Dr. Guimaraes Physical Design Decisions Specifying attribute data types. Modifying the logical design. Specifying the file organization (sometimes) Choosing indexes.

6 CS 8630 Database Administration, Dr. Guimaraes Designing Fields Choosing PK Choosing data type. Coding, compression, encryption. Controlling data integrity. –Default value. –Range control. –Null value control. –Referential integrity.

7 CS 8630 Database Administration, Dr. Guimaraes Selection of a Primary Key Consider a shorter field or selecting another candidate key to substitute for a long, multi-field primary key (and all associated foreign keys.) –System-generated non-information-carrying key –Versus –Primary key like Phone number

8 CS 8630 Database Administration, Dr. Guimaraes Example of Data Dictionary

9 CS 8630 Database Administration, Dr. Guimaraes Example code-look-up table

10 CS 8630 Database Administration, Dr. Guimaraes Composite usage map

11 CS 8630 Database Administration, Dr. Guimaraes Designing Fields Handling missing data. –Substitute an estimate of the missing value. –Assign default value. –Trigger a report listing missing values. –In programs, ignore missing data unless the value is significant.

12 CS 8630 Database Administration, Dr. Guimaraes END OF INTRODUCTION TO PHYSICAL DESIGN START OF PERFORMANCE (INDEXES, QUERY OPTIMIZATION).

13 CS 8630 Database Administration, Dr. Guimaraes INDEXES What is an INDEX ? Why do we CREATE an INDEX ? A) To speed up query B) To speed up data entry (insert/update/delete) ? C) Both ?

14 CS 8630 Database Administration, Dr. Guimaraes Rules for Using Indexes 1. Use on larger tables. 2. Index the primary key of each table. 3. Index search fields. 4. Fields in WHERE clause of SQL commands. 5. Cardinality is high. For example, not on SEX, where cardinality is 2. Typically: When there are >100 different values but not when there are <10 values.

15 CS 8630 Database Administration, Dr. Guimaraes Rules for Using Indexes 6. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s). 7. Null values may not be referenced from an index. 8. Use indexes heavily for non-volatile databases (Datawarehouse); limit the use of indexes for volatile databases.

16 CS 8630 Database Administration, Dr. Guimaraes Different Type of Indexes Typical Indexes B-Trees (traditional) Indexes Hash-cluster Bitmap Indexes Index-Organized Tables Reverse-Key Indexes -------------------------------------- When we issue the command: Create index cidx on orders (cid); What type of an index do we create ? General Format: Create index on ( );

17 CS 8630 Database Administration, Dr. Guimaraes Indexes (Defaults) Anytime a PK is created, an index is automatically created. Anytime when the type of index is not specificied, the type of index created is a B-Trees.

18 CS 8630 Database Administration, Dr. Guimaraes B-Tree (Balanced Tree) Most popular type of index structure for any programming language or database. When you don’t know what to do, the best option is usually a B-Tree. They are flexible and perform well (not very well) in several scenarios. It is really the B+ tree or B* tree

19 CS 8630 Database Administration, Dr. Guimaraes B-Trees (continued) One node corresponds to one block/page (minimum disk I-O). Non-Leaf nodes(n keys, n+1 pointers) Leaf-Nodes (contain n entries, where each entry has an index and a pointer to a data block). Also, each node has a pointer to next node. All leaves are at the same height.

20 CS 8630 Database Administration, Dr. Guimaraes Good Indexing (B-Tree) Candidates Table must be reasonably large Field is queried by frequently Field has a high cardinality (don’t index by sex, where the cardinality is 2!!). Badly balanced trees may inhibit performance. Destroying and re-creating index may improve performance.

21 CS 8630 Database Administration, Dr. Guimaraes Bitmap Index Bitmap indexes contain the key value and a bitmap listing the value of 0 or 1 (yes/no) for each row indicating whether the row contains that value or not. May be a good option for indexing fields that have low cardinality (opposite of B-trees).

22 CS 8630 Database Administration, Dr. Guimaraes Bitmap Index (cont.) Syntax: Create Bitmap index …. Bitmap index works better with equality tests = or in (not with ) Bitmap index maintenance can be expensive; an individual bit may not be locked; a single update locks a large portion of index. Bitmap indexes are best in read-only datawarehouse situations

23 CS 8630 Database Administration, Dr. Guimaraes Hash Indexing B-trees and Bitmap index keys are used to find rows requiring I/O to process index Hash gets rows with a key based algorithm Rows are stored based on a hashed value Index size should be known at index creation Example: –create index cidx on orders (cid) hashed;

24 CS 8630 Database Administration, Dr. Guimaraes Hash Index work best with Very-high cardinality columns Only equal (=) tests are used Index values do not change Number of rows are known ahead of time

25 CS 8630 Database Administration, Dr. Guimaraes Index-Organized Tables Table data is incorporated into the B-Tree using the PK as the index. Table data is always in order of PK. Many sorts can be avoided. Especially useful for “lookup” type tables Index works best when there are few (and small) columns in your table other than the PK.

26 CS 8630 Database Administration, Dr. Guimaraes Reverse Key Indexes Key ‘1234’ becomes ‘4321’, etc. Only efficient for few scenarios envolving parallel processing and a hughe amount of data. By reversing key values, index blocks might be more evenly distributed reducing the likelihood of densely or sparsely populated indexes.

27 CS 8630 Database Administration, Dr. Guimaraes Conclusions on Indexes For high-cardinality key values, B-Tree indexes are usually best. B-Trees work with all types of comparisons and gracefully shrink and grow as table changes. For low cardinality read-only environments, Bitmaps may be a good option.

28 CS 8630 Database Administration, Dr. Guimaraes Denormalization Normally, we want to design our tables up to 3NF or BCNF (at least) When do we want to violate 3NF / BCNF ? When do we want to store Derived Data ? –A) Read Only Databases ? –B) Updateable Databases ?

29 CS 8630 Database Administration, Dr. Guimaraes Rules for Adding Derived Columns Use when aggregate values are regularly retrieved. Use when aggregate values are costly to calculate. Permit updating only of source data. Create triggers to cascade changes from source data.

30 CS 8630 Database Administration, Dr. Guimaraes Rules for Storing Repeating Groups Consider storing repeating groups across columns rather than down rows when: –The repeating group has a fixed number of occurrences, each of which has a different meaning or –The entire repeating group is normally accessed and updated as one unit.

31 CS 8630 Database Administration, Dr. Guimaraes Rules for Storing Repeating Groups Across Columns EMPLOYEE Phone Design Option: EMPLOYEE(EmpID, EmpName, …) EMP_PHONE(EmpID, Phone) Another Design Option: EMPLOYEE(EmpID, EmpName, Phone1, Phone2, …)

32 CS 8630 Database Administration, Dr. Guimaraes One-to-one relationship. Student 1,1 Submits 0,1 Application STUDENT and APPLICATION become a single relation STUDENT instead of 2 Many-to-many relationship. Vendor 1,N PriceQuote 1, N Item Physical design may suggest collapsing ITEM and PRICE_QUOTE into a single relation ITEM_QUOTE Denormalization

33 CS 8630 Database Administration, Dr. Guimaraes A possible denormalization situation: One-to-many relationship

34 CS 8630 Database Administration, Dr. Guimaraes Partitioning Horizontal Partitioning: Distributing the rows of a table into several separate files/locations. Vertical Partitioning: Distributing the columns of a table into several separate files/locations. – The primary key must be repeated in each file.

35 CS 8630 Database Administration, Dr. Guimaraes Partitioning Advantages of Partitioning: –Records used together are grouped together. –Each partition can be optimized for performance. –Security, recovery. –Partitions stored on different disks: contention. –Take advantage of parallel processing capability. Disadvantages of Partitioning: –Slow retrievals across partitions. –Complexity.

36 CS 8630 Database Administration, Dr. Guimaraes RAID with four disks and striping RAID

37 CS 8630 Database Administration, Dr. Guimaraes Intro. To Query Processing In network and hierarchical DBMSs, low-level procedural query language is generally embedded in high-level programming language. Programmer’s responsibility to select most appropriate execution strategy. With declarative languages such as SQL, user specifies what data is required rather than how it is to be retrieved. Relieves user of knowing what constitutes good execution strategy Gives DBMS more control over system performance. Disk access tends to be dominant cost in query processing for centralized DBMS. Two main techniques for query optimization: –heuristic rules that order operations in a query; –comparing different strategies based on relative costs, and selecting one that minimizes resource usage.

38 CS 8630 Database Administration, Dr. Guimaraes Goals Aims of QP: –transform query written in high-level language (e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing RA); –execute strategy to retrieve required data. As there are many equivalent transformations of same high-level query, aim of QO is to choose one that minimizes resource usage. Generally, reduce total execution time of query. Problem computationally intractable with large number of relations, so strategy adopted is reduced to finding near optimum solution.

39 CS 8630 Database Administration, Dr. Guimaraes 3 alternatives Find all Managers who work at a London branch. SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND (s.position = ‘Manager’ AND b.city = ‘London’); Three equivalent RA queries are: (1)  (position='Manager')  (city='London')  (Staff.branchNo=Branch.branchNo) (Staff X Branch) (2)  (position='Manager')  (city='London') ( Staff Staff.branchNo=Branch.branchNo Branch) (3) (  position='Manager' (Staff)) Staff.branchNo=Branch.branchNo (  city='London' (Branch))

40 CS 8630 Database Administration, Dr. Guimaraes Comparing costs Assume: –1000 tuples in Staff; 50 tuples in Branch; –50 Managers; 5 London branches; –no indexes or sort keys; –results of any intermediate operations stored on disk; –cost of the final write is ignored; –tuples are accessed one at a time. Cost (in disk accesses) are: (1) (1000 + 50) + 2*(1000 * 50) = 101 050 (2) 2*1000 + (1000 + 50) = 3 050 (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160 Cartesian product and join operations much more expensive than selection, and third option significantly reduces size of relations being joined together.

41 CS 8630 Database Administration, Dr. Guimaraes Phases of Query Processing QP has four main phases:

42 CS 8630 Database Administration, Dr. Guimaraes Dynamic versus Static Optimization First three phases of QP can be carried out: –dynamically every time query is run; –statically when query is first submitted. –Similar to compiled vs. interpreted lang. Advantages of dynamic QO arise from fact that information is up to date. Disadvantages are that performance of query is affected, time may limit finding optimum strategy. Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy. Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query is run. Could use a hybrid approach to overcome this.

43 CS 8630 Database Administration, Dr. Guimaraes Query Optimizer - Plan DBMSs allow you to view the query plan In ORACLE, you must use either set autotrace on or explain plan. Set autotrace on is much simpler. Explain plan is a little bit more efficient, but more complicated.

44 CS 8630 Database Administration, Dr. Guimaraes Oracle operations (results of autotrace) TABLE ACCESS FULL TABLE ACCESS BY ROWID INDEX RANGE SCAN INDEX UNIQUE SCAN NESTED LOOPS

45 CS 8630 Database Administration, Dr. Guimaraes TABLE ACCESS FULL (full table scan): Oracle will look at every row in the table to find the requested information. This is usually the slowest way to access a table.

46 CS 8630 Database Administration, Dr. Guimaraes TABLE ACCESS BY ROWID Oracle will use the ROWID method to find a row in the table. ROWID is a special column detailing an exact Oracle block where the row can be found. This is the fastest way to access a table (faster than any index. Less flexible than any index).

47 CS 8630 Database Administration, Dr. Guimaraes INDEX RANGE SCAN Oracle will search an index for a range of values. Usually, this even occurs when a range or between operation is specified by the query or when only the leading columns in a composite index are specified by the where clause. Can perform well or poorly, based on the size of the range and the fragmentation of the index.).

48 CS 8630 Database Administration, Dr. Guimaraes INDEX UNIQUE SCAN Oracle will perform this operation when the table’s primary key or a unique key is part of the where clause. This is the most efficient way to search an index.

49 CS 8630 Database Administration, Dr. Guimaraes NESTED LOOPS Indicates that a join operation is occurring. Can perform well or poorly, depending on performance on the index and table operations of the individual tables being joined.

50 CS 8630 Database Administration, Dr. Guimaraes Tuning SQL and PL/SQL Queries Sometimes, Same Query written more than 1000 ways. Generating more than 100 execution plans. Some firms have products that re-write correctly written SQL queries automatically.

51 CS 8630 Database Administration, Dr. Guimaraes ROWID SELECT ROWID, … INTO :EMP_ROWID, … FROM EMP WHERE EMP.EMP_NO = 56722 FOR UPDATE; UPDATE EMP SET EMP.NAME = … WHERE ROWID = :EMP_ROWID;

52 CS 8630 Database Administration, Dr. Guimaraes ROWID (cont.) Fastest Less Flexible Are very useful for removing duplicates of rows

53 CS 8630 Database Administration, Dr. Guimaraes SELECT STATEMENT Not exists in place of NOT IN Joins in place of Exists Avoid sub-selects Exists in place of distinct UNION in place of OR on an index column WHERE instead of ORDER BY

54 CS 8630 Database Administration, Dr. Guimaraes QUERY OPTIMIZER END OF QUERY OPTIMIZER

55 CS 8630 Database Administration, Dr. Guimaraes End of Lecture End Of Today’s Lecture.


Download ppt "CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class Will Start Momentarily… CS8630 Database Administration."

Similar presentations


Ads by Google