Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services.

Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services

2 Joe Carola, Siemens HS  Bio 30+ years in Information Technology –26 of them dedicated to Relational Database, covering all areas of database design, implementation, support, performance, etc. Full History: –Development: Prog Trainee, Prog/Analyst, Sys Analyst –DBA trainee, DBA, Mgr-DBA (“Actor on the Scene”) –Lead DB Consultant for Codd and Date Consulting –Director-DBA –Technical Database Architect (Currently) –DB2, Microsoft SQL Server, Oracle, Sybase SQL Server –Mainframe, Unix, Wintel –1993 recipient of an International Database User Group Award for Information Excellence, based on his contributions in the area of Relational Database technology, and has presented locally and internationally on a variety of Relational Database topics. –Chairman, Delaware Valley DB2 User Group

3 Agenda – “Practical Stuff”  DB Design In the simplest terms - Logical to Physical  Optimization and Indexing Optimizer Index Types and how they are used  Monitoring and Tuning Monitoring Process and what to monitor Tuning Steps The above is where I see that the rubber meets the road……

DB Design

5  Logical Design Provides complete understanding of data and it usage Defining data entities and their attributes Provides primary and foreign key definitions  Physical Design Based on information gathered during Logical design –Data must be understood to do this correctly and efficiently Provides physical aspects to enhance data usage –Data types, data lengths, row sizes, Provides precise access paths (Indexes) to rows of data –Support primary and foreign keys –Secondary indexes  Poor Logical and Physical Database design can be the largest reason for performance issues The price for poor DB Design must be paid at execution time

6 DB Design  Normalization: A synthesis of data design 1 st Normal Form – Data is dependent on………….. The Key 2 nd Normal Form – Data is dependent on The whole Key 3 rd Normal Form - “ “ “ And nothing but the Key, “so help me Codd” Edgar F. (Ted) Codd – Developed the Relational Model “A Relational Model for Large Shared Data Banks” (1970) –Solid, yet complex, mathematical foundation Relational Algebra Domains, Attributes, Tuples, and Relations (OMG!) –Re-stated to simpler terms….. Simple to understand tables, rows, and columns The Simplicity is partially the reason for the performance issues being addressed every day –Too many shortcuts are taken –Too many non-experienced data designers are designing and implementing database applications

7 DB Design  3 rd Normal Form is basically 1 st cut physical Next step after 3 rd NF in Physical DB Design is a very important step for Performance, Concurrency, Operations, etc. –De-Normalization takes place here Storing of data in summary or derived format If it doesn’t happen, it takes place at execution time Result: High processing costs – Materialization of the result data Administration costs – Maintenance of the data Low Currency – Concurrent Access to the data However……. –Anomalies are created as a result of De-normalization Insert, Delete, Update They all cost extra processing also –Must strike a balance based on requirements on performance, availability, storage, administration

Optimization & Indexing

9 Optimization and Indexing  Must understand the basics of indexes and performance statistics. As a general rule, indexes should be kept as narrow as possible, most likely following a business use requirement, to reduce the amount of processing overhead associated with each query.  Being familiar with how optimization works will improve the accuracy of your decision making when designing indexes Understanding how the optimizer works is the first step toward the establishment of a truly optimized database environment  As the sophistication of your database implementation increases……. The need to optimize performance will also increase.

10 Optimization and Indexing  SQL Query tuning is one of the most important tasks to improve application performance  Biggest bang for the performance dollar over everything else (“IMO”): Network, Storage, Memory, Processor  Should be done in the design and testing phases However, no amount of Database tuning or SQL statement tuning can make up for inefficient application design/coding –60% to 80% of Application Problems come from poorly written SQL or the code around it i.e. Prog101 abuse can wreck an application too!!

11 Optimizer  Responsible for choosing the least costly way to execute SQL (DML).  Creates an access path with it’s decision Performed at plan compilation time Determines Access Methods –Index Usage –Table Scan –Join Method –Sort Determines if Data and/or Index pages can be read in advance –Asynchronous Pre-fetch

12 The Importance of Statistics  Statistics provide the optimizer with the information to make decisions RDBMS Catalog Or Dictionary Generation Table Indexspace Tablespace As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less than optimal decisions on how to process a query.

13 Statistical Terms/Concepts  Cardinality Measures how many unique values exist in the table  Density Measures the uniqueness of values within a table. Helps the optimizer determine how many rows will be returned for a given key value Indexes with high densities will likely be ignored by the optimizer –i.e. the index is highly non-unique  Selectivity Measures the number of rows that will be returned by a particular query. Needed by Optimizer to calculate the relative cost of a query plan

From Request to Response REQUEST RESPONSE RELATIONAL DATA SERVICES REQUESTED DATA I/O DATA MANAGER PREDICATE ANY OTHER WITH INDEX (ES) INDEX KEY Non Indexed PREDICATE APPLIES BUFFER MANAGER STAGE 1 PREDICATES. STAGE 1 - Evaluated at the time the data rows are retrieved (SARGABLE). Performance advantage in using STAGE 1 PREDICATES because this stage eliminates ROWS passed to STAGE 2 via the Data Manager. STAGE 2 PREDICATES STAGE 2 - Evaluated after data retrieval via the relational (NON- SARGABLE, Residual) data services which is more expensive than the Data Manager.

15 Indexing  A very necessary part of Successful Database Implementation Why do some of my queries run so slow! I wonder what queries will be run ? What indexes will be needed? What columns will be used as predicates? What ORDER BY will be used most often?

Indexes are a good thing to add, however there is something to avoid….. “Thanks for fixing my query, what did you do?” ”I added an index to one of the columns “Great! Then add indexes to all the columns in my table #!*#!!!

17 Types of Indexes  There are two types of indexes: clustered and non- clustered, each with unique advantages depending on the data set.  Clustered index Dictates the storage order of the data in a table. Because the data is sorted, clustered indexes are more efficient on columns of data that are most often searched for ranges of values. This index type also excels at finding a specific row when the indexed value is unique.  Non-clustered index Similar to an index in a textbook where the data is stored in one place and the data value in another. A query searches for the data value by first searching the non-clustered index to find the location of the data value in the table and then retrieves the data directly from that location. The non-clustered index is useful for queries resulting in exact matches.

18 Basic Index Usage 1 Leaf Page 2 3 Root Page Non- Leaf Page Data Page Data Page Data Page Matching Index Scan Select * From TABLE1 Where INDEXED_COL1 = 12345

19 Basic Index Usage Non-Matching Index Scan 1 2 Root Page Non- Leaf Page Leaf Page Data Page Data Page Data Page Select * From TABLE1 Where INDEXED_COL1 > 00001

20 Basic Index Usage 1 Root Page Non-Leaf Page Leaf Page Index Only Select COL1 From TABLE1 Where INDEXED_COL1 > 00001

21 Join Methods  Nested Loop Join SELECT A,B,X,Y FROM OUTER, INNER WHERE A=10 AND B=X Tables: OUTERINNER COMPOSITE Columns: A B X Y A B X Y 103 101 102 106 101 5A3B2C1D2E9F7G5A3B2C1D2E9F7G 1033B 1011D 1022C 1022E 1011D 1.) Scan the outer table, For each qualifying row……… 2.) find all matching rows in the inner table, via table space scan or index access. The nested loop join produces this result

22 Join Methods  Merge Scan Join SELECT A,B,X,Y FROM OUTER, INNER WHERE A=10 AND B=X Tables: OUTERINNER COMPOSITE Columns: A B X Y A B X Y 101 102 103 106 1D2C2E3B5A7G9F1D2C2E3B5A7G9F 1011D 1022C 1022E 1033B 2.) Scan the outer table, For each qualifying row….… 3.) Scan a group if matching rows in the inner table. The merge scan join produces this result 1.) Condense and sort the outer table, or access it through an index on column B…... Condense and sort the inner table.

23 Join Methods  Hybrid / Hash Join SELECT C2,C33 FROM OUTER, INNER WHERE C1 = A AND C2 = C22 C1C2 A1 A2 A3 A6. OUTER ROW123456ROW123456 C22 C33 1 D 2 C 2 E 3 B 5 A 7 G INNER R I D P1 P2 P3 P4 P5 P6 C2RID 1P1 2P2 2P3 3P4 PARTIAL ROWS C2C33 1D 2C 2E 3B RESULT RID LIST P1P1P2 P3P4 1.) Apply local predicates and organize qualifying rows in join column sequence by either sorting or accessing via join column index…. 3.) Create partial rows, and sort in RID sequence...…. 2.) Obtain only inner table RIDs via index access using sequenced join column key values...…. 4.) List Prefetch inner table rows and complete partial rows

24 An Ounce of Prevention….,  Make your queries simple and efficient, ensuring the least costly access path available. Try not to overload your tables with indexes Try not to overload your indexes Try not to overload your queries  Keep the Database healthy Reorganization –Eliminates empty space, and fragmentation –Reduces I/O  Generate Statistics (if they are not automatic) The Optimizer is very smart, but data attributes are always changing –DB Size/Volume, Data Skewness, Data Content  Analyze SQL Query and access path selection prior to implementing into a production environment. Execute the Explain Plan periodically to determine what method the Optimizer is selecting for an access path.

25 “Explain” Plan / SHOWPLAN  Phase of the optimizer that captures information used in selecting the query access plan  Why use an Explain Plan? Gives clues as to why the optimizer made access decisions Can be used in advance of execution Can be used to maintain a history of problem query access –Before/After new indexes additions –Before/After Statistics are Generated/Re-Generated –Before/After Data additions/changes/deletions Problem determination is easier by comparing reference plans

26 Example  Graphical SHOWPLAN

Monitoring and Tuning

28 Monitor and Tuning  A Constant Process A very necessary part of successful database implementation Must be there to guarantee ongoing, optimal Database Performance 1.) Collect Data 2.) Analyze Data 3.) Consider Fixes 4.) Apply Fixes Design Data Object Data Activity Data Real time Periodic Historical Redesign Tune Repeat

29 Monitoring and Tuning  What to monitor Healthiness of Database Objects –Growth –Fragmentation Exists when TS and/or indexes have pages in which the logical ordering, based on key or link value, does not match the physical ordering of the pages inside the file Causes additional I/O and additional storage Causes of Fragmentation –DML (Insert, Delete, Update) –Inserts/Updates cause Page Splits –Delete/Updates cause holes

30 Monitoring and Tuning  What to monitor Fragmentation illustrated Reorganization –Reorders pages, compresses entries on a page Always be sure to run new Statistics collection (for the Optimizer) Uniform pages in order Non-uniform pages, out of order Index 1 Page 1 Index 1 Page 2 Index 1 Page 3 Index 1 Page 4 Index 1 Page 5 Index 1 Page 6 Index 1 Page 7 Index 1 Page 8 Index 1 Page 1 Index 1 Page 2 Index 2 Page 1 Index 2 Page 2 Index 1 Page 4 Index 1 Page 5 Index 3 Page 1 Index 1 Page 8

31 Monitoring and Tuning  What to monitor Object Usage –Access Patterns (Random, Sequential, Indexed, Non-Indexed) –I/O (Volume, Latency) –They tend to change over time as users learn the application Memory Usage –Buffer Hit Ratio –Data/Index pages in the Buffer will avoid an I/O Processing Activity –CPU utilization Will indicate excessive searching and/or sorting –Parallel, Non-Parallel Can speed up large searches Can also monopolize all the processors Locking –Timeouts –Deadlocks

32 Monitoring and Tuning  How to monitor – Tool usage DBMS SQL Request Result Statistical Generation Performance DB Tool to Collect & Interpret Alerts Reports

33 Monitoring and Tuning Steps Find the statements that consume the most resources –“Heavy Hitters” Physical Reads will indicate SQL requiring disk access to get queries –Most expensive part of a Query!!! Buffer Gets indicate the amount of searching going on within a query High Buffer Gets = Lots of Searching = Lots of Processing Sorts information will indicate if SQL is doing an excessive amount of sorting Find the offending statements without adding to the performance problem –Use simple top down approach Avoid heavy tracing Know the Database Design and Usage Run Explain Plan on SQL

Additional Questions?

Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services.

Similar presentations

Presentation on theme: "Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services.

Similar presentations

Presentation on theme: "Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services."— Presentation transcript:

Similar presentations

About project

Feedback