Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to Query Optimization DB2 UDB for iSeries

Similar presentations


Presentation on theme: "Intro to Query Optimization DB2 UDB for iSeries"— Presentation transcript:

1 Intro to Query Optimization DB2 UDB for iSeries
Tom McKinley IBM Rochester, MN USA

2 Background / Foundation

3 IBM's DB2 UDB Family Three code bases...
Based on the system history, architecture and operating system DB2 UDB for Linux, UNIX, Windows (LUW) DB2 UDB for z/OS (S/390) DB2 UDB for iSeries (AS/400)

4 DB2 UDB for iSeries i5 + i5/OS
System viewed as a database server, not just an application system DB2 UDB for iSeries (integrated part of OS/400 or i5/OS) Universal Database support Data Centric focus Business logic moving into the database engine SQL (DDL and DML) as primary interface to database GUI to operating system and database via iSeries Navigator

5 iSeries - Logical Partitioning (LPAR)
IXS/IXA i5/OS Linux AIX Windows*** DB2 UDB for iSeries DB2 UDB for Linux DB2 UDB for AIX DB2 UDB for Win Virtual 1Gbit Ethernet LAN *** No LPAR support

6 iSeries i5 i5/OS Architecture
M E O R Y Multiple CPUs N-way SMP QUERY Single System Single Level Storage 64 bit POWER Picture of the overall, high level architecture of the AS/400 This architecture can and will affect SQL request optimization and implementation The unique AS/400 architecture allows many of the implementation methods (table scan, table pre-load, SMP, etc.) Cover the picture from the bottom up... Independent I/O subsystem / storage management Tables spread across all disk units Single Level Storage and large memory system 64 bit addressing, 40G max main storage Multiple processors for N-way and SMP support 4th generation 64 bit RISC Storage Management IOP IOP IOP IOP IOP IOP IOP Table

7 i5/OS Objects SQL i5/OS schema/collection library table physical file
view logical file index keyed logical file row record AS/400's object based architecture and naming convention All objects in a library, all libraries in QSYS (mother of all libraries). Library/Object/Type naming must be unique. Every object has a unique virtual address (or address space). This chart is used to compare and contrast AS/400 objects to DB SQL objects Net: can create AS/400 objects via SQL interface, maps to underlying AS/400 DB objects column field log journal

8 i5/OS Objects SELECT... FROM Physical File Library (Schema)
CREATE ALIAS... Physical File (Table) Member 1 Alias_1 SELECT... FROM Alias_1 Member 2 Default is to open the first member, which has the same name as the physical file To access other members, use an override database file command (OVRDBF) or create an SQL alias. The SQL ALIAS is persistent With the SQL ALIAS, the query can reference the specific member (alias) directly. Alias_2 SELECT... FROM Alias_2 Member 3 Alias_3 SELECT... FROM Alias_3

9 i5/OS Objects AS/400's object based architecture and naming convention
System Library Object Type Attribute (subtype) My_Schema DB_Table *FILE PF (physical file) DB_Index LF (logical file) DB_View Must be unique CREATE TABLE My_Schema.DB_Table ... CREATE INDEX My_Schema.DB_Index ... AS/400's object based architecture and naming convention All objects in a library, all libraries in QSYS (mother of all libraries). Library/Object/Type naming must be unique. Every object has a unique virtual address (or address space). This chart is used to compare and contrast AS/400 objects to DB SQL objects Net: can create AS/400 objects via SQL interface, maps to underlying AS/400 DB objects CREATE VIEW My_Schema.DB_View ...

10 i5/OS Objects One Database Management System with multiple interfaces Structured Query Language (SQL) Embedded ODBC JDBC CLI Command Language (CL) DB2 DB File (PF) object CRTPF CREATE TABLE SELECT... FROM... High Level Language Native I/O

11 SQL Query Processing DB2 UDB for iSeries SQL request Optimize Open Run
Not much user interaction can be accomplished for either the validation or execution of the query. The user can have a real effect during the query optimization. Will focus on query optimization phase. The DB2 UDB for AS/400 optimizer is "cost based" with full query rewrite capability Stats provided by the DB are used to come up with the best cost or best plan (least costly, less time)

12 Query Optimization

13 V5R1 Database Architecture
ODBC / JDBC / ADO / DRDA / XDA Network Host Server CLI / JDBC Static Dynamic Extended Dynamic The optimizer and database engine are separated at different layers of the operating system Compiled embedded statements Prepare every time Prepare once and then reference High level picture of DB architecture and where the optimization occurs ADO = Active data objects (i.e. OLE DB) Implemented via ODBC or directly to Host Server (project Lightning) ODBC/JDBC/ADO = client query program interfaces CLI/JDBC = server query program interfaces All the components will be covered throughout the course Components will be covered from the bottom up Native (Record I/O) SQL Optimizer DB2 UDB (Data Storage & Management)

14 V5R2 and V5R3 Database Architecture
ODBC / JDBC / ADO .NET / DRDA / XDA Network Host Server CLI / JDBC Static Dynamic Extended Dynamic The optimizer and database engine merged to form the SQL Query Engine, and much of the work was moved to SLIC Compiled embedded statements Prepare every time Prepare once and then reference High level picture of DB architecture and where the optimization occurs ADO = Active data objects (i.e. OLE DB) Implemented via ODBC or directly to Host Server (project Lightning) ODBC/JDBC/ADO = client query program interfaces CLI/JDBC = server query program interfaces All the components will be covered throughout the course Components will be covered from the bottom up Native (Record I/O) SQL Optimizer DB2 UDB (Data Storage & Management)

15 V5R2 and V5R3 Database Architecture
High level picture of DB architecture and where the optimization occurs ADO = Active data objects (i.e. OLE DB) Implemented via ODBC or directly to Host Server (project Lightning) ODBC/JDBC/ADO = client query program interfaces CLI/JDBC = server query program interfaces All the components will be covered throughout the course Components will be covered from the bottom up

16 The Query Dispatcher Determines which engine will optimize and process each query request Only SQL requests are considered for the SQL Query Engine Initial step for all query optimization that occurs in i5/OS Ability to “back up” and use the Classic Query Engine when non-standard indexes are encountered during optimization Initial goal is to use SQE

17 The Query Dispatcher – V5R2
Dispatched to CQE if: >1 Table (i.e. no joins) OR & IN predicates SMP requested Non-Read (INSERT with subselect can use new path) LIKE predicates UNIONS View or Logical File references Subquery Derived Tables & Common Table expressions, UDTFs LOB columns LOWER, TRANSLATE, or UPPER scalar function CHARACTER_LENGTH, POSITION, or SUBSTRING scalar function using UTF-8/16 Sort Sequences & CCSID translation between columns Distributed queries via DB2 Multisystem Non-SQL queries (QQQQry API, Query/400, OPNQRYF) ALWCPYDTA(*NO) specified Sensitive Cursor SQE support added into V5R2 - May 2003 (Latest DB Group + SI07650) Not part of any package ALWCPYDTA(*YES) will use SQE, but temps may be used. For example, SQE does not use temp indexes, so a join may use a hash table, which is not maintained as the data changes.

18 The Query Dispatcher - V5R3
Dispatched to CQE if: LIKE predicates Logical File references UDTFs LOB columns LOWER, TRANSLATE, or UPPER scalar function CHARACTER_LENGTH, POSITION, or SUBSTRING scalar function using UTF-8/16 Sort Sequences & CCSID translation between columns DB2 Multisystem Non-SQL queries (QQQQry API, Query/400, OPNQRYF) ALWCPYDTA(*NO) specified Sensitive Cursor SQE now optimizes VIEWS, UNIONS, SubQueries INSERT, UPDATE, DELETE Star Schema Join queries ALWCPYDTA(*YES) will use SQE, but temps may be used. For example, SQE does not use temp indexes, so a join may use a hash table, which is not maintained as the data changes. Only SQE optimizes INTERSECT EXCEPT

19 The Query Dispatcher Back up to CQE to complete optimization if any of the following are encountered: Select/omit logical file Logical file over multiple members Join logical file Derived key (s) Native logical files that perform some intermediate mapping of the fields referenced in the key. Common ones are renaming fields, adding a translate or only selecting a subset of the columns Specifying an Alternate Collating Sequence (ACS) on a field used for a key will also make a “derived key” (an implied map occurs within the index) Sort Sequence (NLSS) specified for index or logical file Probably the trickiest one to detect for users. The index is built while an NLSS table is specified in the query environment Cost to “back up” and revert to CQE adds about 15% to the total optimization time QAQQINI parameter to ignore unsupported logical files Ignore_Derived_Index = *YES

20 Optimization The Optimizer The Optimizer
Writes the best? program to fulfill your request The Optimizer Provides the recipe Provides the methods Does no cooking The query optimizer's job is to build the access plan DB run time actually executes the plan Think of the optimizer as the program that writes the program to fulfill the user's request

21 Optimization... the intersection of various factors
Server attributes Server configuration Version/Release/Modification Level Server performance The Plan SMP Database design Job, Query attributes Table sizes, number of rows SQL Request A given query plan can be thought of as an intersection of all the factors that affect cost based optimization on a given server with a given database design. To really understand a given implementation plan and it performance, one must know and understand all the various factors and settings in effect at the time of query optimization and execution. Change any one or more of the factors and the implementation plan and performane may change. Static Dynamic Extended Dynamic Interfaces Views and Indexes (Radix, EVI) Work management

22 (Query) Access Plans The output of query optimization (“the recipe and methods”) Contents A control structure that contains information on the actions necessary to satisfy each SQL request These contents include: Access Method Info on associated tables and indexes Any applicable program and/or environment information High level view and explaination of an "access plan" Will be covered in detail later in the course

23 Query Optimization Cost Based Query Optimization
The DB2 for iSeries Optimizer performs "cost based" optimization "Cost" is defined as the estimated time it takes to run the request "Costing" various plans refers to the comparison of a given set of algorithms and methods in an attempt to identify the "fastest" plan Optimization is based on time, not on resource utilization Usually the fastest plan is also the most resource efficient plan, but this is not necessarily true The goal of the optimizer is to eliminate I/O as early as possible by identifying the best path to and through the data The optimizer has the ability and freedom to "rewrite" the query High level view and explanation of an "access plan" Will be covered in detail later in the course

24 Query Phases Query processing can be divided into four phases:
Query Validation Validate the query request Validate existing access plan Builds internal query structures Query Dispatcher Determine which query engine should complete the processing Query Optimization Choose most efficient access method Builds access plan Query Execution Build the structures needed for query cursor Build the structures for any temporary indexes (if needed) Builds and activates query cursor (ODP) Generate any feedback requested Debug messages in the job log DB Monitor records Visual Explain We can affect this... Not much user interaction can be accomplished for either the validation or execution of the query. The user can have a real effect during the query optimization. Will focus on query optimization phase. The DB2 UDB for AS/400 optimizer is "cost based" with full query rewrite capability Stats provided by the DB are used to come up with the best cost or best plan (least costly, less time)

25 Query Optimization Feedback
SQE Plan Cache DB Monitor Data Visual Explain SQL request Joblog Messages Not much user interaction can be accomplished for either the validation or execution of the query. The user can have a real effect during the query optimization. Will focus on query optimization phase. The DB2 UDB for AS/400 optimizer is "cost based" with full query rewrite capability Stats provided by the DB are used to come up with the best cost or best plan (least costly, less time) Query Optimization SQL Info from PGMs & PKGs

26 Number of rows searched / accessed
Data Access Methods Cost based optimization dictates that the fastest access method for a given table will vary based upon selectivity of the query High Response Time Method 3 Chart to introduce multiple access methods based on request, DB design, implementation, resources and performance This chart will be detailed and described as the course progresses Method 2 Method 1 Low Few Many Number of rows searched / accessed

27 Strategy for Query Optimization
Query optimization will generally follow this simplified strategy: Gather meta-data and statistics for costing Selectivity statistics Indexes available to be costed Sort the indexes based upon their usefulness Environmental attributes that may affect the costs Generate default cost Build an access plan associated with the default plan For each index: Gather information needed specific to this index Build an access plan based on this index Cost the use of the index with this access plan Compare the resulting cost against the cost from the current best plan The default statistics include filter factors, table size, and indexes over the table. The default cost will differ if the query requires an index for a join, ordering or grouping. It will either be an arrival scan or a temporary index create. (Assume arrival scan for now) Introduce the concept of Time-out while processing the indexes. Check if index matches the fields in the query. Estimate against the index using the query values and update the filter factors. Cost the index using page faults and the expected number of records as the key components. Use a "greedy" algorithm for costing by only comparing the current cost with the current best until all permutations run out.

28 Strategy for Query Optimization
Optimizing indexes will generally follow this simplified strategy: Gather list of indexes for statistics and costing Sort the list of indexes considering how the index can be used Local selection Joining Grouping Ordering Index only access One index may be useful for statistics, and another useful for implementation The default statistics include filter factors, table size, and indexes over the table. The default cost will differ if the query requires an index for a join, ordering or grouping. It will either be an arrival scan or a temporary index create. (Assume arrival scan for now) Introduce the concept of Time-out while processing the indexes. Check if index matches the fields in the query. Estimate against the index using the query values and update the filter factors. Cost the index using page faults and the expected number of records as the key components. Use a "greedy" algorithm for costing by only comparing the current cost with the current best until all permutations run out.

29 Statistics All query optimizers rely upon statistics to make plan decisions DB2 UDB for the iSeries has always relied upon indexes as its source for stats Other databases rely upon manual stats collection for their source SQE offers a hybrid approach where column stats will be automatically collected for cases where indexes do not already exist

30 Sources of Information
Meta-data sources Existing indexes (Radix or Encoded Vector) More accurately describes multi-column key values Stats available immediately as the index maintenance occurs Selectivity estimates from radix by reading n keys Selectivity from EVI by reading symbol table values Column Statistics SQE only Column Cardinality, Histograms & Frequent Values List Constructed over a single column in a table Stored internally as a part of the table object after created Collected automatically by default for the system Stats not immediately maintained as the table changes Stats are refreshed as they become “stale” over time Default sources No representation of actual values in columns Best Worst

31 SQE Automatic Stats Collection
i5/OS Statistics collection job Reactive, based on query requests Automatic collection runs in this background job at very low priority QDBFSTCCOL system job Statistics Manager continuously analyzes entries in the Plan Cache and queues up requests for the collection job Controlled by system value QDBFSTCCOL iSeries Navigator graphical interface to manage stats collected by the system API’s also provided to manage the stats

32 Review What is the optimizer's job? What is the optimizer's output?
What are some of the key elements used for cost based optimization? What things affect the Access plan? Look at resources used as well as response time. Audience review questions to ensure they are getting it... Determine the least costly method of implementing the request, and build the plan to implement the request. Access plan. Type of request, table and index statistics, system environment (CPUs, memory, disk, SMP), interface settings

33 Trademarks and Disclaimers
IBM Corporation All rights reserved. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both: Rational is a trademark of International Business Machines Corporation and Rational Software Corporation in the United States, other countries, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. SET and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC. Other company, product or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Photographs shown are of engineering prototypes. Changes may be incorporated in production models.


Download ppt "Intro to Query Optimization DB2 UDB for iSeries"

Similar presentations


Ads by Google