Download presentation

Presentation is loading. Please wait.

Published byMarlee Brasington Modified over 3 years ago

1
© IBM Corporation 2006 1 Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM

2
© IBM Corporation 2006 2 The Dice Problem Throw dice, how many will be 1?

3
© IBM Corporation 2006 3 Questions about the Dice How many dice are you throwing? How many sides does each dice have? Are all the dice the same? The better the information, the more accurate the estimate.

4
© IBM Corporation 2006 4 What does Update Statistics do? Collects information for the optimizer –Statistics LOW –Distributions MEDIUM & HIGH Drop Distributions Compile stored procedures

5
© IBM Corporation 2006 5 Statistics Collected systables syscolumns sysindexes Number of Rows Number of pages to store the data Second largest value for a column Second smallest value for a column # of unique values for the lead key How highly clustered the values for the lead key

6
© IBM Corporation 2006 6 Update Statistics Low Basic Algorithm Walk the leaf pages in each index Submit btree cleaner requests when deleted items are found causing re-balancing of indexes Collects the following information –Number of unique items –Number of leave pages –How clustered the data is –Second highest and lowest value

7
© IBM Corporation 2006 7 --- DISTRIBUTION --- ( -1 1: ( 868317, 70, 75) 2: ( 868317, 24, 100) 3: ( 868317, 12, 116) 4: ( 868317, 30, 147) 5: ( 868317, 39, 194) 6: ( 868317, 28, 222) --- OVERFLOW --- 1: ( 779848, 43) 2: ( 462364, 45) How to Read Distributions To get the range of values look at the highest value in the previous bin. # of rows represented in this bin # of unique values Highest Value in this bin # of rows for this value The value

8
© IBM Corporation 2006 8 Example - Approximating a Value --- DISTRIBUTION --- ( -1 1: ( 868317, 70, 75) 2: ( 868317, 24, 100) 3: ( 868317, 12, 116) 4: ( 868317, 30, 147) 5: ( 868317, 39, 194) 6: ( 868317, 28, 222) --- OVERFLOW --- 1: ( 779848, 43) 2: ( 462364, 45) There are 868317 rows containing a value between -1 and 75 There are 70 unique values in this range The optimizer will deduce 868317 / 70 = 12,404 records for each value between -1 and 75

9
© IBM Corporation 2006 9 Example - Dealing with Data Skew --- DISTRIBUTION --- ( -1 1: ( 868317, 70, 75) 2: ( 868317, 24, 100) 3: ( 868317, 12, 116) 4: ( 868317, 30, 147) 5: ( 868317, 39, 194) 6: ( 868317, 28, 222) --- OVERFLOW --- 1: ( 779848, 43) 2: ( 462364, 45) Data skew For the value 43 how many records will the optimizer estimate will exist? Answer 779848 values Any value that exceeds 25% of the bin size will be placed in an overflow bin

10
© IBM Corporation 2006 10 Basic Algorithm for Distributions Develop scan plan based on available resources Scan table –High = All rows –Medium = Sample of rows Sort each column Build distributions Begin transaction –Delete old columns distributions –Insert new columns distributions Commit transaction

11
© IBM Corporation 2006 11 Sample Size HIGH –All rows in the table Medium –Misconception about the number of rows sampled is based on the number of rows in the table, this is incorrect. –The number of samples depends on the Confidence and Resolution. –If the sample size is greater than the number of row in the table Medium turns into High mode

12
© IBM Corporation 2006 12 Update Statistics Medium Sample Size

13
© IBM Corporation 2006 13 How Much Information is Enough?? The better the information, the more accurate the estimate.

14
© IBM Corporation 2006 14 Examining the Running Query No Statistics VS Medium Statistics No Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 20888 Estimated # of Rows Returned: 6760 1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200 No Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 20888 Estimated # of Rows Returned: 6760 1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200 Medium Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > 20250 Medium Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > 20250 Overall performance improved The estimates were more accurate The query plan changed

15
© IBM Corporation 2006 15 Examining the Running Query Medium Statistics VS High Statistics High Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 33 Estimated # of Rows Returned: 30 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250 High Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 33 Estimated # of Rows Returned: 30 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250 Overall performance did not change The estimates were slightly more accurate The query plan did not change Medium Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250 Medium Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250

16
© IBM Corporation 2006 16 Version of Update Statistics Improvements All version of 9.40 and 10.00 9.30.UC3 9.21 Not fixed 7.31.UD2

17
© IBM Corporation 2006 17 Update statistics can not allocated memory between 4MB and 100MB of sort memory –The default has been raised from 4MB to 15MB –User can now configure the amount of memory Use DBUPSPACE has been augmented to include memory Format of DBUPSPACE –{max disk space}:{default memory} –To increase the memory to 35 MB, set DBUPSPACE=0:35. Allow update statistics to use light scans when scanning a a table –Implemented light scans –Set oriented reads Improvements in Update Statistics

18
© IBM Corporation 2006 18 Improvements in update statistics Information about building data distributions is not viewable by the DBA –Set explain will now print the scan path and resource usage when building data distributions Update statistics low on fragmented tables does not run in parallel –With PDQ turned on each index fragment will be scanned in parallel –PDQ at 1 means 10% of the index fragments scanned in parallel, while PDQ at 10 means all the index fragments will be scanned in parallel

19
© IBM Corporation 2006 19 Improvements in Update Statistics Various errors (126, 312, 100,…) when executing update statistics –Errors when trying to insert the distributions because set lock mode to wait was not handled properly inside update statistics Range scanning a fragmented index is slow Replace the next loop merge with a binary search merge when ordering items from index fragments –Most noticeable when the number of fragments in an index is large

20
© IBM Corporation 2006 20 Update Statistics Medium Memory Requirements

21
© IBM Corporation 2006 21 Update Statistics High Memory Requirements In memory sort –Approximate Memory = number of rows * sum(column widths + 2 * sizeof(pointer) )

22
© IBM Corporation 2006 22 Memory Rules Estimated Update Stats memory is below 100MB –Hard coded limit of 4MB –Attempts to minimize the scans by fitting as many columns into 4MB Estimated Update Stats memory is above 100MB –Memory is requested from MGM –Attempt to minimize the scans by fitting as many columns in the MGM memory

23
© IBM Corporation 2006 23 Examples Customer Table Cust_idinteger Fnamechar(50) Lnamechar(50) Address1char(200) Address2 char(200) Statechar(2) zipcodeinteger Number of Rows 500,000

24
© IBM Corporation 2006 24 Examples Memory for Incore Sort

25
© IBM Corporation 2006 25 Examples Number of Table Scans

26
© IBM Corporation 2006 26 Confidence A factor in the number of samples used by update statistics medium

27
© IBM Corporation 2006 27 Resolution Percentage of data that is represented in a distribution bin Example –100,000 rows in the table –Resolution of 2% –Each bin will represent 2,000 rows

28
© IBM Corporation 2006 28 Example Following Example –Table size 215,000 rows –Row size 445 bytes –Uniprocessor

29
© IBM Corporation 2006 29 Example of the current update statistics Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data 101.4 MB Sort memory granted 4.0 MB Estimated number of table scans 10 PASS #1 c9 PASS #2 c5 PASS #3 c7 PASS #4 c6 ….. PASS #10 c4 Completed pass 1 in 0 minutes 24 seconds Completed pass 2 in 0 minutes 20 seconds Completed pass 3 in 0 minutes 17 seconds Completed pass 4 in 0 minutes 17 seconds Completed pass 5 in 0 minutes 17 seconds Completed pass 6 in 0 minutes 15 seconds Completed pass 7 in 0 minutes 14 seconds Completed pass 8 in 0 minutes 15 seconds Completed pass 9 in 0 minutes 16 seconds Completed pass 10 in 0 minutes 14 seconds Total Time 146 seconds

30
© IBM Corporation 2006 30 The New Defaults Completed pass 1 in 0 minutes 34 seconds Completed pass 2 in 0 minutes 19 seconds Completed pass 3 in 0 minutes 16 seconds Completed pass 4 in 0 minutes 14 seconds Completed pass 5 in 0 minutes 15 seconds Total Time 98 seconds New Memory Default Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data 101.4 MB Sort memory granted 15.0 MB Estimated number of table scans 7 PASS #1 c9,c8,c10,c5,c7 PASS #2 c6,c1 PASS #3 c3 PASS #4 c2 PASS #5 c4

31
© IBM Corporation 2006 31 Enabling PDQ with Update Statistics Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data 101.4 MB PDQ memory granted 106.5 MB Estimated number of table scans 1 PASS #1 c1,c2,c3,c4,c5,c6,c7,c8,c9,c10 Index scans disabled Light scans enabled Completed pass 1 in 0 minutes 29 seconds Total Time 29 seconds PDQ Memory Features Enabled

32
© IBM Corporation 2006 32 Tuning with the New Statistics Turn on PDQ when running update statistics, but only for tables –Avoid PDQ when updating statistics for procedures When running high or medium increase the memory update statistics has to work with Enable parallel sorting (i.e. PSORT_NPROCS)

33
© IBM Corporation 2006 33 Considerations Change the RESOLUTION to 1.5 –Increasing the number of bins for the distributions –Increasing the sample size for update statistics medium

34
© IBM Corporation 2006 34 Old Recommendations Start one update statistics for each column of a table FnameLnameAddress Three sequential scans of the table

35
© IBM Corporation 2006 35 New Recommendations Start one update statistics for ALL columns giving it more resources (memory) Requires only one scan of the table to produce distributions on several columns. FnameLnameAddress One scans of the table

36
© IBM Corporation 2006 36 Other Information An Overview of the IBM Informix Dynamic Server Optimizer www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0211desai/0211desai.html Understanding and Tuning Update Statistics www.ibm.com/developerworks/db2/zones/informix/library/techarticle/miller/0203miller.ht ml Predicate Inference in Informix Dynamic Server www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0206goswami/0206g oswami.html IBM Informix Performance Manual IBM Informix SQL Reference Manual

37
© IBM Corporation 2006 37 Questions

Similar presentations

Presentation is loading. Please wait....

OK

Hashing and Indexing John Ortiz.

Hashing and Indexing John Ortiz.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google