Presentation is loading. Please wait.

Presentation is loading. Please wait.

What’s Up with dbms_stats?

Similar presentations


Presentation on theme: "What’s Up with dbms_stats?"— Presentation transcript:

1 What’s Up with dbms_stats?
Terry Sutton Database Specialists, Inc. NoCOUG Spring Conference 2006

2 Session Objectives Examine some of the statistics-gathering options and their impact. Focus on actual experience. Learn why to choose various options when gathering statistics.

3 Statistics are Important!
As the optimizer gets more sophisticated in each version of Oracle, the importance of accurate statistics increases.

4 DBMS_STATS Procedures
Package contains over 40 procedures, including: Deleting existing statistics for table, schema, or database Setting statistics to desired values Exporting and importing statistics Gathering statistics for a schema or entire database Monitoring tables for changes

5 DBMS_STATS Procedures
We will focus on: DBMS_STATS.GATHER_TABLE_STATS DBMS_STATS.GATHER_INDEX_STATS The starting points for getting statistics so the optimizer can make informed decisions

6 DBMS_STATS.GATHER_TABLE_STATS
ownname VARCHAR2, tabname VARCHAR2, partname VARCHAR2 DEFAULT NULL, estimate_percent NUMBER DEFAULT NULL, block_sample BOOLEAN DEFAULT FALSE, method_opt VARCHAR2 DEFAULT 'FOR ALL COLUMNS SIZE 1', degree NUMBER DEFAULT NULL, granularity VARCHAR2 DEFAULT 'DEFAULT', cascade BOOLEAN DEFAULT FALSE, stattab VARCHAR2 DEFAULT NULL, statid VARCHAR2 DEFAULT NULL, statown VARCHAR2 DEFAULT NULL, no_invalidate BOOLEAN DEFAULT FALSE);

7 DBMS_STATS.GATHER_INDEX_STATS
ownname VARCHAR2, indname VARCHAR2, partname VARCHAR2 DEFAULT NULL, estimate_percent NUMBER DEFAULT NULL, stattab VARCHAR2 DEFAULT NULL, statid VARCHAR2 DEFAULT NULL, statown VARCHAR2 DEFAULT NULL, degree NUMBER DEFAULT NULL, granularity VARCHAR2 DEFAULT 'DEFAULT', no_invalidate BOOLEAN DEFAULT FALSE);

8 We’re Going to Test: estimate_percent block_sample method_opt cascade
These are the parameters which address statistics accuracy and performance.

9 estimate_percent The percentage of rows to estimate Null means compute
Use DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine the best sample size for good statistics

10 block_sample Determines whether or not to use random block sampling instead of random row sampling. "Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated."

11 method_opt Determines whether to collect histograms to help in dealing with skewed data. FOR ALL COLUMNS, FOR ALL INDEXED COLUMNS, or a COLUMN list determine which columns, and SIZE determines how many histogram buckets.

12 method_opt Use SKEWONLY for SIZE, to have Oracle determine the columns on which to collect histograms based on their data distribution. Use AUTO for SIZE, to have Oracle determine the columns on which to collect histograms based on data distribution and workload.

13 cascade Determines whether to gather statistics on indexes as well.

14 Our Approach Try different values for these parameters and look at the impact on accuracy of statistics and performance of the statistics gathering job itself.

15 Our Data (Table 1) FILE_HISTORY [1,951,673 rows 28507 blocks, 223MB ]
Column Name Null? Type Distinct Values FILE_ID NOT NULL NUMBER FNAME NOT NULL VARCHAR2(240) STATE_NO NUMBER FILE_TYPE NOT NULL NUMBER PREF VARCHAR2(100) CREATE_DATE NOT NULL DATE TRACK_ID NOT NULL NUMBER SECTOR_ID NOT NULL NUMBER TEAMS NUMBER BYTE_SIZE NUMBER START_DATE DATE END_DATE DATE LAST_UPDATE DATE CONTAINERS NUMBER

16 Our Data (Table 1)– Indexes
Table Unique? Index Name Column Name FILE_HISTORY NONUNIQUE TSUTTON.FILEH_FNAME FNAME NONUNIQUE TSUTTON.FILEH_FTYPE_STATE FILE_TYPE STATE_NO NONUNIQUE TSUTTON.FILEH_PREFIX_STATE PREF UNIQUE TSUTTON.PK_FILE_HISTORY FILE_ID

17 Our Data (Table 2) PROP_CAT [11,486,321 rows 117705 blocks, 920MB ]
Column Name Null? Type Distinct Values LINENUM NOT NULL NUMBER(38) LOOKUPID VARCHAR2(64) EXTID VARCHAR2(20) SOLD NOT NULL NUMBER(38) CATEGORY VARCHAR2(6) NOTES VARCHAR2(255) DETAILS VARCHAR2(255) PROPSTYLE VARCHAR2(20)

18 Our Data (Table 2)– Indexes
Table Unique? Index Name Column Name PROP_CAT NONUNIQUE TSUTTON.PK_PROP_CAT EXTID SOLD NONUNIQUE TSUTTON.PROPC_LOOKUPID LOOKUPID NONUNIQUE TSUTTON.PROPC_PROPSTYLE PROPSTYLE

19 To Find What Values the Statistics Gathering Obtained:
index.sql: select ind.table_name, ind.uniqueness, col.index_name, col.column_name, ind.distinct_keys, ind.sample_size from dba_ind_columns col, dba_indexes ind where ind.table_owner = 'TSUTTON' and ind.table_name in ('FILE_HISTORY','PROP_CAT') col.index_owner = ind.owner col.index_name = ind.index_name col.table_owner = ind.table_owner col.table_name = ind.table_name order by col.table_name, col.column_position;

20 tabcol.sql: select table_name, column_name, data_type, num_distinct,
sample_size, to_char(last_analyzed, ' HH24:MI:SS') last_analyzed, num_buckets buckets from dba_tab_columns where table_name in ('FILE_HISTORY','PROP_CAT') order by table_name, column_id;

21 The Old Days—ANALYZE A “Quick and Dirty” attempt:
SQL> analyze table file_history estimate statistics; Table analyzed. Elapsed: 00:00:08.26 SQL> analyze table prop_cat estimate statistics; Elapsed: 00:00:14.76

22 The Old Days—ANALYZE But someone complains that their query is taking too long: SQL> SELECT FILE_ID, FNAME, TRACK_ID, SECTOR_ID FROM file_history WHERE FNAME = 'SOMETHING'; no rows selected Elapsed: 00:00:08.66

23 The Old Days—ANALYZE So we try it with autotrace on: Execution Plan
SELECT STATEMENT Optimizer=CHOOSE (Cost=2743 Card=10944 Bytes=678528) TABLE ACCESS (FULL) OF 'FILE_HISTORY' (Cost=2743 Card=10944 Bytes=678528) Statistics 0 recursive calls 0 db block gets consistent gets physical reads 0 redo size 465 bytes sent via SQL*Net to client 460 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed

24 The Old Days—ANALYZE “It’s not using our index!”
Let’s look at the index statistics: Table Uniqueness Index Name Column Name Distinct Keys Sample Size FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME ,937, NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO NONUNIQUE FILEH_PREFIX_STATE PREF , STATE_NO , UNIQUE PK_FILE_HISTORY FILE_ID ,952, DBA_INDEXES tells us we have 1,937,490 distinct keys for the index on FNAME, pretty close to the actual value of 1,951,673.

25 The Old Days—ANALYZE But when we look at DBA_TAB_COLUMNS
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR Only 178 distinct values for FNAME. The optimizer concludes that a full table scan will be more efficient. We’re also told there are only 478 distinct values for PREF, when we know there are 65,345.

26 The Old Days—ANALYZE Let’s try a larger sample.
The first test only sampled about 0.05% of the rows. Let’s try 5% of the rows. SQL> analyze table file_history estimate statistics sample 5 percent; Table analyzed. Elapsed: 00:00:36.21 SQL> analyze table prop_cat estimate statistics sample 5 percent; Elapsed: 00:02:35.11

27 The Old Days—ANALYZE We try our query:
SQL> SELECT FILE_ID, FNAME, TRACK_ID, SECTOR_ID FROM file_history WHERE FNAME = 'SOMETHING'; no rows selected Elapsed: 00:00:00.54 Execution Plan SELECT STATEMENT Optimizer=CHOOSE (Cost=54 Card=110 Bytes=6820) TABLE ACCESS (BY INDEX ROWID) OF 'FILE_HISTORY' (Cost=54 Card=110 Bytes=6820) INDEX (RANGE SCAN) OF 'FILEH_FNAME' (NON-UNIQUE) (Cost=3 Card=110)

28 The Old Days—ANALYZE Our query, continued:
Statistics 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 465 bytes sent via SQL*Net to client 460 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed Much better! Let’s look at the stats.

29 The Old Days—ANALYZE Table Uniqueness Index Name Column Name Distinct Keys Sample Size FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME ,926, NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO NONUNIQUE FILEH_PREFIX_STATE PREF , STATE_NO , UNIQUE PK_FILE_HISTORY FILE_ID ,952, TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR

30 The Old Days—ANALYZE 11,133 distinct values for FNAME, much better than 178. 23,744 distinct values for PREF, much better than 478. And our query is efficient! But can we do better?

31 The Old Days—ANALYZE How about a full “compute statistics”?
SQL> analyze table file_history compute statistics; Table analyzed. Elapsed: 00:07:38.32 SQL> analyze table prop_cat compute statistics; Elapsed: 00:29:15.29

32 The Old Days—ANALYZE And the statistics:
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR 78692 distinct values for FNAME! After examining every row in the table!

33 Is DBMS_STATS Better? Let’s start with a quick run, a 1% estimate and cascade=true, so the indexes are analyzed also. SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>1,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:41.70 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>1,cascade=>true) Elapsed: 00:01:44.29

34 And the Statistics: TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR

35 1% estimate, cascade=true
Gathering statistics took 3:26. “Analyze estimate statistics” took 23 seconds. “Analyze estimate 5%” took 3:11. Stats are very accurate for FNAME. Stats are off for PREF (6522 vs. 65,345 actual)

36 5% estimate, cascade=true
Let’s try 5%: SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>5,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:23.52 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>5,cascade=>true) Elapsed: 00:03:08.75

37 5% estimate, cascade=true
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR PREF VARCHAR

38 5% estimate, cascade=true
A 5% estimate took 4:32. PREF now shows 24,222 distinct values (actual is 65,345). A 5% estimate using analyze took 3:11, but only showed 11,133 distinct values for FNAME

39 Full Compute SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>null,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:09:35.13 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>null,cascade=>true) Elapsed: 00:29:09.46

40 Full Compute TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED FILE_HISTORY FILE_ID NUMBER :19:30 FNAME VARCHAR :19:30 PREF VARCHAR :19:30 PROP_CAT LOOKUPID VARCHAR :50:19 EXTID VARCHAR :50:19 PROPSTYLE VARCHAR :50:19

41 Full Compute Total time taken for a “DBMS_STATS compute” was slightly longer than for an “ANALYZE compute” (38:45 vs. 36:54). The stats are right on! There’s another interesting behavior…

42 Full Compute – Indexes Table Uniqueness Index Name Column Name Distinct Keys Sample Size FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME ,019, NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO NONUNIQUE FILEH_PREFIX_STATE PREF , STATE_NO , UNIQUE PK_FILE_HISTORY FILE_ID ,951, PROP_CAT NONUNIQUE PK_PROP_CAT EXTID ,995, SOLD ,995, NONUNIQUE PROPC_LOOKUPID LOOKUPID , NONUNIQUE PROPC_PROPSTYLE PROPSTYLE ,

43 Full Compute – Indexes We’re doing a full compute, cascade.
Sample size for the indexes ranges from 3% to 100%!

44 block_sample = true If >20 rows per block, visiting 5% of rows could visit every block. So, is block_sample=true faster?

45 block_sample = true, estimate = 5%, cascade = true
Results shown in summary table. Took 4:11 (block_sample=false took 4:32). Doesn’t seem “more efficient”. Less accurate.

46 cascade = false People have suggested a small estimate for table, compute for indexes. Let’s test 1% estimate for tables, compute for indexes.

47 Estimate = 1%, cascade = false
estimate = 1% for table, compute for indexes Took 2:59 (vs. 3:26 for cascade = true) As accurate as cascade = true Be careful– if your indexes change, you’ll need to change your DBMS_STATS jobs.

48 cascade = false BUT— the sample sizes are still not 100%
Table Uniqueness Index Name Column Name Distinct Keys Sample Size FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME ,961, NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO NONUNIQUE FILEH_PREFIX_STATE PREF , STATE_NO , UNIQUE PK_FILE_HISTORY FILE_ID ,951, PROP_CAT NONUNIQUE PK_PROP_CAT EXTID ,224, SOLD ,224, NONUNIQUE PROPC_LOOKUPID LOOKUPID , NONUNIQUE PROPC_PROPSTYLE PROPSTYLE ,

49 # Distinct PROPSTYLE (48936)
Summary of Tests So Far estimate_percent Block_sample Elapsed Time # Distinct FNAME ( ) # Distinct PREF (65345) # Distinct PROPSTYLE (48936) 1%, cascade False 3:26 6522 16984 1% table, 20% indexes 2:44 6835 16932 1% table, compute indexes 2:59 6797 16917 5%, cascade 4:32 24222 20095 True 4:11 15927 10092 5% table, 20% indexes 4:00 24188 20084 10%, cascade 6:04 36172 23547 20%, cascade 9:21 48271 29108 9:13 40653 21897 50%, cascade 20:24 60804 39708 null (compute) , cascade 38:45 65345 48936

50 Other Options “What about all that ‘Auto’ stuff?”
Two commonly referenced “auto” options: AUTO_SAMPLE_SIZE for estimate_percent. method_opt=>'FOR ALL COLUMNS SIZE AUTO'

51 DBMS_STATS.AUTO_SAMPLE_SIZE
This option tells Oracle to choose the proper sample size for estimate_percent. Let’s test it.

52 DBMS_STATS.AUTO_SAMPLE_SIZE
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:08:19.19 tabname=>'PROP_CAT', Elapsed: 00:22:55.99

53 DBMS_STATS.AUTO_SAMPLE_SIZE
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR Took 31:15. 7 ½ minutes less than “compute, cascade”. 10 minutes longer than “50%, cascade”, but not much more accurate. Looks like it sampled nearly every row in the table anyway.

54 method_opt We tested different histogram options.
First, FOR ALL INDEXED COLUMNS, which seems to be the most commonly used choice. Note that: You’re unlikely to need histograms on all your indexed columns. You may well want histograms on some non-indexed columns. But it’s a good start for a test.

55 method_opt- 'for all indexed columns'
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all indexed columns size 30', cascade=>true); end; / Elapsed: 00:01:35.63 ownname=>'TSUTTON', tabname=>'PROP_CAT', Elapsed: 00:06:01.14

56 method_opt- 'for all indexed columns'
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR CREATE_DATE DATE TRACK_ID NUMBER SECTOR_ID NUMBER TEAMS NUMBER BYTE_SIZE NUMBER START_DATE DATE END_DATE DATE LAST_UPDATE DATE CONTAINERS NUMBER

57 method_opt- 'for all indexed columns'
Took 7:37 (compared to 6:04 for same sample size without histograms) Statistics aren’t gathered for non-indexed columns.

58 method_opt- 'for all columns size skewonly'
Rather than deciding the maximum number of buckets to use, let’s let Oracle decide. begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true); end; / Elapsed: 00:02:27.02 ownname=>'TSUTTON', tabname=>'PROP_CAT', Elapsed: 00:12:57.15

59 method_opt- 'for all columns size skewonly'
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR CREATE_DATE DATE TRACK_ID NUMBER SECTOR_ID NUMBER TEAMS NUMBER BYTE_SIZE NUMBER START_DATE DATE END_DATE DATE LAST_UPDATE DATE CONTAINERS NUMBER

60 method_opt- 'for all columns size skewonly'
We get statistics on all columns It took 15:24 (twice as long as ‘for all indexed columns’ 200 buckets for the primary key???

61 Bug # Metalink Note : “DBMS_STATS WITH SKEWONLY GENERATES HISTOGRAMS FOR UNIQUE KEY COLUMN.” The subject of the note says it all. So SKEWONLY isn’t very useful for the affected versions.

62 method_opt- 'for all columns size auto‘
Let’s see if the AUTO option for size does any better. begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; / Elapsed: 00:01:40.49 ownname=>'TSUTTON', tabname=>'PROP_CAT', Elapsed: 00:03:53.78

63 method_opt- 'for all columns size auto'
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR

64 method_opt- 'for all columns size auto'
Took only 5:34 to gather statistics (best yet for a 10% estimate) Statistics leave something to be desired. 18,905 values for FNAME 38 buckets for FNAME (there is NO skew in this column)

65 Collecting Histograms with DBMS_STATS
Not as efficient as advertised in Oracle 9i.

66 Is 10g Any Better? Performance and accuracy of statistics is reasonable in 9i for “basic” choices. Serious deficiencies with AUTO_SAMPLE_SIZE and histogram collection. Are these issues resolved in 10g Release 1?

67 10g- 'for all columns size skewonly'
First we’ll try the SKEWONLY option for SIZE in histogram collection. begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true); end; / Elapsed: 00:02:45.05 ownname=>'TSUTTON', tabname=>'PROP_CAT', Elapsed: 00:12:28.17

68 10g- 'for all columns size skewonly'
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR

69 10g- 'for all columns size skewonly'
Collection took 15:13 (about the same as in 9i). Results are just as bad. Still collecting histograms on primary key and FNAME. 26,167 distinct values of FNAME.

70 10g- 'for all columns size auto'
Let’s try the auto option for size (and start our demo). begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; / Elapsed: 00:03:22.81 ownname=>'TSUTTON', tabname=>'PROP_CAT', Elapsed: 00:08:15.00

71 10g- 'for all columns size auto'
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR

72 10g- 'for all columns size auto'
Statistics gathering took 11:38 (twice as long as in 9i). Statistics are reasonable. No histograms on unique columns. So the AUTO option for SIZE works in Oracle 10g… sort of.

73 10g- 'for all columns size auto'
SQL> select STATE_NO, COUNT(*) from FILE_HISTORY group by STATE_NO;   STATE_NO   COUNT(*)          0         95         20        569         30            40         39        999          4       9999          9 6 rows selected. Seems like a candidate for a histogram.

74 10g- 'for all columns size auto'
SQL> select file_type, count(*) from file_history group by file_type; FILE_TYPE COUNT(*) 7 rows selected. Also a candidate for a histogram?

75 10g- 'for all columns size auto'
Let’s try some queries: select count(*) from file_history where file_type = 5; select count(*) from file_history where file_type = 4; select count(*) from file_history where fname = 'SOMETHING';

76 10g- 'for all columns size auto'
And gather stats again begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; /

77 10g- 'for all columns size auto'
TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR dbms_stats has taken the workload into account and created histograms on the columns we queried (though it’s again creating a histogram on FNAME).

78 10g- AUTO_SAMPLE_SIZE Let’s see how 10g does with using DBMS_STATS.AUTO_SAMPLE_SIZE for estimate_percent. begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:02:15.95 ownname=>'TSUTTON', tabname=>'PROP_CAT', Elapsed: 00:04:01.00

79 10g- AUTO_SAMPLE_SIZE TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE FILE_HISTORY FILE_ID NUMBER FNAME VARCHAR STATE_NO NUMBER FILE_TYPE NUMBER PREF VARCHAR

80 10g- AUTO_SAMPLE_SIZE Statistics gathering took 6:17 (an enormous improvement over the 31:15 in 9i). Accuracy may be lacking. PREF shows 2200 distinct values (actual is 65,345) Results aren’t as accurate as a 5% sample in 9i (which took 1/3 less time)

81 Summary From our testing, it appears that:
The sweet spot for balancing performance of the gathering process and accuracy of statistics lies between 5 and 20%. Gathering statistics separately for indexes is faster than the cascade=true option while gathering table statistics. Block_sample=true doesn't appear to appreciably speed up statistics gathering, while at the same time delivering somewhat less accurate statistics.

82 Summary Using AUTO_SAMPLE_SIZE takes nearly as long as COMPUTE in Oracle 9i. It is much faster in 10gR1, but the accuracy of the statistics is not as good as with an estimate_percent of 5, which gathers the statistics faster.

83 Summary Using the SKEWONLY option to size histograms is inadvisable in both Oracle 9i and 10gRelease1. Using the AUTO option to choose and size histograms is inadvisable in Oracle 9i, but seems to work better in 10gR1 (though it still has issues).

84 In Conclusion These results should help you decide what options you want to use in gathering statistics on your databases. Hopefully you’ll test different options on your data. It is clear that DBMS_STATS provides greater power, accuracy, and flexibility than ANALYZE, and it is improving with each version.

85 The White Paper A companion white paper to this presentation is available for free download from our company’s website at:

86 Resources from Database Specialists
The Specialist newsletter Database Rx® dbrx.dbspecialists.com/guest Provides secure, automated monitoring, alert notification, and analysis of your Oracle databases

87 Contact Information Terry Sutton Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA 94111 Tel: 415/ Web:


Download ppt "What’s Up with dbms_stats?"

Similar presentations


Ads by Google