© IBM Corporation 2006 1 Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Presented By Akin S Walter-Johnson Ms Principal PeerLabs, Inc
Understanding SQL Server Query Execution Plans
Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
Database Performance Tuning and Query Optimization
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Hashing and Indexing John Ortiz.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Working with SQL Server Database Objects
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Primary Indexes Dense Indexes
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
CS4432: Database Systems II
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree Traversal –Update, Read –Insert, Delete phantom problem: need.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Extents, segments and blocks in detail. Database structure Database Table spaces Segment Extent Oracle block O/S block Data file logical physical.
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Database Management 9. course. Execution of queries.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Query Optimizer Execution Plan Cost Model Joe Chang
© IBM Corporation 2005 Informix User Forum 2005 John F. Miller III Explaining SQLEXPLAIN ®
Chapter 4 Indexes. Index Architecture  By default data is inserted on a first-come, first-serve basis  Indexes bring order to this chaos  Once you.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Session 1 Module 1: Introduction to Data Integrity
MISSION CRITICAL COMPUTING Siebel Database Considerations.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
CS 540 Database Management Systems
Bigtable: A Distributed Storage System for Structured Data
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Oracle9i Developer: PL/SQL Programming Chapter 11 Performance Tuning.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
Module 6: Creating and Maintaining Indexes. Overview Creating Indexes Understanding Index Creation Options Maintaining Indexes Introducing Statistics.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Select Operation Strategies And Indexing (Chapter 8)
CS 540 Database Management Systems
Database Performance Tuning &
COMP 430 Intro. to Database Systems
Database Management Systems (CS 564)
Database Performance Tuning and Query Optimization
Physical Database Design
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Execution Plans Demystified
Hash-Based Indexes Chapter 10
Introduction to Execution Plans
Chapter 11 Database Performance Tuning and Query Optimization
Introduction to Execution Plans
Query Processing.
Introduction to Execution Plans
Presentation transcript:

© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM

© IBM Corporation The Dice Problem Throw dice, how many will be 1?

© IBM Corporation Questions about the Dice How many dice are you throwing? How many sides does each dice have? Are all the dice the same? The better the information, the more accurate the estimate.

© IBM Corporation What does Update Statistics do? Collects information for the optimizer –Statistics LOW –Distributions MEDIUM & HIGH Drop Distributions Compile stored procedures

© IBM Corporation Statistics Collected systables syscolumns sysindexes Number of Rows Number of pages to store the data Second largest value for a column Second smallest value for a column # of unique values for the lead key How highly clustered the values for the lead key

© IBM Corporation Update Statistics Low Basic Algorithm Walk the leaf pages in each index Submit btree cleaner requests when deleted items are found causing re-balancing of indexes Collects the following information –Number of unique items –Number of leave pages –How clustered the data is –Second highest and lowest value

© IBM Corporation DISTRIBUTION --- ( -1 1: ( , 70, 75) 2: ( , 24, 100) 3: ( , 12, 116) 4: ( , 30, 147) 5: ( , 39, 194) 6: ( , 28, 222) --- OVERFLOW --- 1: ( , 43) 2: ( , 45) How to Read Distributions To get the range of values look at the highest value in the previous bin. # of rows represented in this bin # of unique values Highest Value in this bin # of rows for this value The value

© IBM Corporation Example - Approximating a Value --- DISTRIBUTION --- ( -1 1: ( , 70, 75) 2: ( , 24, 100) 3: ( , 12, 116) 4: ( , 30, 147) 5: ( , 39, 194) 6: ( , 28, 222) --- OVERFLOW --- 1: ( , 43) 2: ( , 45) There are rows containing a value between -1 and 75 There are 70 unique values in this range The optimizer will deduce / 70 = 12,404 records for each value between -1 and 75

© IBM Corporation Example - Dealing with Data Skew --- DISTRIBUTION --- ( -1 1: ( , 70, 75) 2: ( , 24, 100) 3: ( , 12, 116) 4: ( , 30, 147) 5: ( , 39, 194) 6: ( , 28, 222) --- OVERFLOW --- 1: ( , 43) 2: ( , 45) Data skew For the value 43 how many records will the optimizer estimate will exist? Answer values Any value that exceeds 25% of the bin size will be placed in an overflow bin

© IBM Corporation Basic Algorithm for Distributions Develop scan plan based on available resources Scan table –High = All rows –Medium = Sample of rows Sort each column Build distributions Begin transaction –Delete old columns distributions –Insert new columns distributions Commit transaction

© IBM Corporation Sample Size HIGH –All rows in the table Medium –Misconception about the number of rows sampled is based on the number of rows in the table, this is incorrect. –The number of samples depends on the Confidence and Resolution. –If the sample size is greater than the number of row in the table Medium turns into High mode

© IBM Corporation Update Statistics Medium Sample Size

© IBM Corporation How Much Information is Enough?? The better the information, the more accurate the estimate.

© IBM Corporation Examining the Running Query No Statistics VS Medium Statistics No Statistics QUERY: select * from t1 where c1 > Estimated Cost: Estimated # of Rows Returned: ) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > No Statistics QUERY: select * from t1 where c1 > Estimated Cost: Estimated # of Rows Returned: ) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > Medium Statistics QUERY: select * from t1 where c1 > Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > Medium Statistics QUERY: select * from t1 where c1 > Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > Overall performance improved The estimates were more accurate The query plan changed

© IBM Corporation Examining the Running Query Medium Statistics VS High Statistics High Statistics QUERY: select * from t1 where c1 > Estimated Cost: 33 Estimated # of Rows Returned: 30 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > High Statistics QUERY: select * from t1 where c1 > Estimated Cost: 33 Estimated # of Rows Returned: 30 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > Overall performance did not change The estimates were slightly more accurate The query plan did not change Medium Statistics QUERY: select * from t1 where c1 > Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > Medium Statistics QUERY: select * from t1 where c1 > Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250

© IBM Corporation Version of Update Statistics Improvements All version of 9.40 and UC Not fixed 7.31.UD2

© IBM Corporation Update statistics can not allocated memory between 4MB and 100MB of sort memory –The default has been raised from 4MB to 15MB –User can now configure the amount of memory Use DBUPSPACE has been augmented to include memory Format of DBUPSPACE –{max disk space}:{default memory} –To increase the memory to 35 MB, set DBUPSPACE=0:35. Allow update statistics to use light scans when scanning a a table –Implemented light scans –Set oriented reads Improvements in Update Statistics

© IBM Corporation Improvements in update statistics Information about building data distributions is not viewable by the DBA –Set explain will now print the scan path and resource usage when building data distributions Update statistics low on fragmented tables does not run in parallel –With PDQ turned on each index fragment will be scanned in parallel –PDQ at 1 means 10% of the index fragments scanned in parallel, while PDQ at 10 means all the index fragments will be scanned in parallel

© IBM Corporation Improvements in Update Statistics Various errors (126, 312, 100,…) when executing update statistics –Errors when trying to insert the distributions because set lock mode to wait was not handled properly inside update statistics Range scanning a fragmented index is slow Replace the next loop merge with a binary search merge when ordering items from index fragments –Most noticeable when the number of fragments in an index is large

© IBM Corporation Update Statistics Medium Memory Requirements

© IBM Corporation Update Statistics High Memory Requirements In memory sort –Approximate Memory = number of rows * sum(column widths + 2 * sizeof(pointer) )

© IBM Corporation Memory Rules Estimated Update Stats memory is below 100MB –Hard coded limit of 4MB –Attempts to minimize the scans by fitting as many columns into 4MB Estimated Update Stats memory is above 100MB –Memory is requested from MGM –Attempt to minimize the scans by fitting as many columns in the MGM memory

© IBM Corporation Examples Customer Table Cust_idinteger Fnamechar(50) Lnamechar(50) Address1char(200) Address2 char(200) Statechar(2) zipcodeinteger Number of Rows 500,000

© IBM Corporation Examples Memory for Incore Sort

© IBM Corporation Examples Number of Table Scans

© IBM Corporation Confidence A factor in the number of samples used by update statistics medium

© IBM Corporation Resolution Percentage of data that is represented in a distribution bin Example –100,000 rows in the table –Resolution of 2% –Each bin will represent 2,000 rows

© IBM Corporation Example Following Example –Table size 215,000 rows –Row size 445 bytes –Uniprocessor

© IBM Corporation Example of the current update statistics Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data MB Sort memory granted 4.0 MB Estimated number of table scans 10 PASS #1 c9 PASS #2 c5 PASS #3 c7 PASS #4 c6 ….. PASS #10 c4 Completed pass 1 in 0 minutes 24 seconds Completed pass 2 in 0 minutes 20 seconds Completed pass 3 in 0 minutes 17 seconds Completed pass 4 in 0 minutes 17 seconds Completed pass 5 in 0 minutes 17 seconds Completed pass 6 in 0 minutes 15 seconds Completed pass 7 in 0 minutes 14 seconds Completed pass 8 in 0 minutes 15 seconds Completed pass 9 in 0 minutes 16 seconds Completed pass 10 in 0 minutes 14 seconds Total Time 146 seconds

© IBM Corporation The New Defaults Completed pass 1 in 0 minutes 34 seconds Completed pass 2 in 0 minutes 19 seconds Completed pass 3 in 0 minutes 16 seconds Completed pass 4 in 0 minutes 14 seconds Completed pass 5 in 0 minutes 15 seconds Total Time 98 seconds New Memory Default Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data MB Sort memory granted 15.0 MB Estimated number of table scans 7 PASS #1 c9,c8,c10,c5,c7 PASS #2 c6,c1 PASS #3 c3 PASS #4 c2 PASS #5 c4

© IBM Corporation Enabling PDQ with Update Statistics Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data MB PDQ memory granted MB Estimated number of table scans 1 PASS #1 c1,c2,c3,c4,c5,c6,c7,c8,c9,c10 Index scans disabled Light scans enabled Completed pass 1 in 0 minutes 29 seconds Total Time 29 seconds PDQ Memory Features Enabled

© IBM Corporation Tuning with the New Statistics Turn on PDQ when running update statistics, but only for tables –Avoid PDQ when updating statistics for procedures When running high or medium increase the memory update statistics has to work with Enable parallel sorting (i.e. PSORT_NPROCS)

© IBM Corporation Considerations Change the RESOLUTION to 1.5 –Increasing the number of bins for the distributions –Increasing the sample size for update statistics medium

© IBM Corporation Old Recommendations Start one update statistics for each column of a table FnameLnameAddress Three sequential scans of the table

© IBM Corporation New Recommendations Start one update statistics for ALL columns giving it more resources (memory) Requires only one scan of the table to produce distributions on several columns. FnameLnameAddress One scans of the table

© IBM Corporation Other Information An Overview of the IBM Informix Dynamic Server Optimizer Understanding and Tuning Update Statistics ml Predicate Inference in Informix Dynamic Server oswami.html IBM Informix Performance Manual IBM Informix SQL Reference Manual

© IBM Corporation Questions