1, 2, 3… Scatter! Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software

1, 2, 3… Scatter! Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software tom@wss.com

A Few Words about the Speaker Tom Bascom; Progress user & roaming DBA since 1987 VP, White Star Software, LLC – Expert consulting services related to all aspects of Progress and OpenEdge. – tom@wss.com President, DBAppraise, LLC – Remote database management service for OpenEdge. – Simplifying the job of managing and monitoring the world’s best business applications. – tom@dbappraise.com 2

3 “Fragmentation” vs “Scatter”

$ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95 ------------------------------------------------------- Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS 1676180 47.9M 28 58 29 1676190 1.0 1.9 … Fragmentation “Fragmentation” is splitting records into multiple pieces. 4

$ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95 ------------------------------------------------------- Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS 1676180 47.9M 28 58 29 1676190 1.0 1.9 … Fragmentation “Fragmentation” is splitting records into multiple pieces. 5 10 additional fragments Beware!

Fragmentation Occurs When Record data is too big for the block: i.e. 16k of data going into a 4k block. Updated data needs more room to expand than is available. The “create limit” and the “toss limit” can be used to reserve more free space in blocks and control fragmentation. Progress will automatically “de-frag” when possible (10.1+). 6

Scatter “Scatter” is a measure of the “sequentialness” of records. 7 $ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95 ------------------------------------------------------- Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS 1676180 47.9M 28 58 29 1676190 1.0 1.9 …

What is “Scatter Factor”? The “factor” does not care about any ordering – not even primary key. For type 1 areas it is a measure of how well the data fits into the minimum # of blocks that would be required to hold it with “distance” between blocks taken into account. For type 2 areas there is no distance penalty – but free space in a cluster can increase scatter. “Logically adjacent” isn’t really reported by dbanalys. 8

Fragmentation and Scatter “Fragmentation” is splitting records into multiple pieces. “Scatter” is a measure of the “sequentialness” of records. The “scatter factor” that proutil reports might be better described as “density”. If you have two or more indexes at least one of them is probably scattered. 9

10 Why is This Important?

Locality of Reference When data is referenced there is a high probability that it will be referenced again soon. If data is referenced there is a high probability that “nearby” data will be referenced soon. Locality of reference is why caching exists at all levels of computing. 11

How Does Progress Help? Temporal Locality (Will be reused “soon”) -B (LRU Chain) -mmax -Bt Spatial Locality (“nearby” data will be accessed) Type 2 storage areas -B2 (no LRU) Memory mapped prolib 12

Cache Effectiveness 13 LayerTime# of Recs # of OpsCost per Op Relative Progress 4GL to –B0.96100,000203,4730.0000051 -B to FS Cache10.24100,00026,7110.00038375 FS Cache to SAN5.93100,00026,7110.00022245 -B to SAN Cache11.17100,00026,7110.000605120 SAN Cache to Disk200.35100,00026,7110.0075001500 -B to Disk211.52100,00026,7110.0079191585

14 Logical Scatter

Definition Logical Scatter is the probability that records in a given logical order are also in “physical” order (in the same block). Each index has its own ordering and thus its own logical scatter. It is very unlikely that more than one index will be well ordered. It is quite possible that all indexes might be scattered. 15

Type 1 Storage Area 16 Block 1 1Lift ToursBurlington 3669/239/28Standard Mail 11544.86Shipped 125523.85Shipped Block 2 13538.77Shipped 21192.75Shipped 22496.78Shipped 231310.99Shipped Block 3 14Cologne GermanyGermany 2Upton FrisbeeOslo 1KoberleinKelly 1531/261/31FlyByNight Block 4 BBBBrawn, Bubba B.1,600 DKPPitt, Dirk K.1,800 4Go Fishing LtdHarrow 16Thundering Surf Inc.Coffee City

Type 2 Storage Area 17 Block 1 1Lift ToursBurlington 2Upton FrisbeeOslo 3HoopsAtlanta 4Go Fishing LtdHarrow Block 2 5Match Point TennisBoston 6Fanatical AthletesMontgomery 7AerobicsTikkurila 8Game Set MatchDeatsville Block 3 9Pihtiputaan PyoraPihtipudas 10Just Joggers LimitedRamsbottom 11Keilailu ja BiljardiHelsinki 12Surf LautaveikkosetSalo Block 4 13Biljardi ja tennisMantsala 14Paris St GermainParis 15Hoopla BasketballEgg Harbor 16Thundering Surf Inc.Coffee City

Tangent… The preceding slides should be all you need to see in order to be convinced that type 1 areas are a bad place to be putting data. The schema area is always a type 1 area. Should it have data in it? 18

How to Determine “Logical Scatter”? You could read the whole database… Multiple times… (Because every index must be considered) -or- For each table randomly choose a record. For each index of that table find the NEXT record. Is it in the same block? Lather, Rinse and Repeat. 19

Type 2 Storage Area 20 Block 1 1Lift ToursBurlington 2Upton FrisbeeOslo 3HoopsAtlanta 4Go Fishing LtdHarrow Block 2 5Match Point TennisBoston 6Fanatical AthletesMontgomery 7AerobicsTikkurila 8Game Set MatchDeatsville Block 3 9Pihtiputaan PyoraPihtipudas 10Just Joggers LimitedRamsbottom 11Keilailu ja BiljardiHelsinki 12Surf LautaveikkosetSalo Block 4 13Biljardi ja tennisMantsala 14Paris St GermainParis 15Hoopla BasketballEgg Harbor 16Thundering Surf Inc.Coffee City Id = 100%Name = 25%City 19%

21 Which Index?

4GL Index Selection 22 compile cust.p xref “cust.xrf” /* cust.p */ for each customer no-lock: display custNum. end. cust.p 1 COMPILE cust.p cust.p 1 CPINTERNAL ISO8859-1 cust.p 1 CPSTREAM ISO8859-1 cust.p 1 STRING "Customer" 8 NONE UNTRANSLATABLE cust.p 1 SEARCH sports2000.Customer CustNum WHOLE-INDEX cust.p 2 ACCESS sports2000.Customer CustNum cust.p 2 STRING ">>>>9" 5 NONE TRANSLATABLE FORMAT cust.p 3 STRING "Cust Num" 8 LEFT TRANSLATABLE cust.p 3 STRING "CustNum" 7 NONE UNTRANSLATABLE cust.p 3 STRING "--------" 8 NONE UNTRANSLATABLE

Amdahl’s Law 23 The performance enhancement possible with a given improvement is limited by the fraction of the execution time that the improved feature is used.

Compile Time is Not Enough Dynamic Queries SQL-92 Cost Based Optimizer Unreached Code Rarely Run Code Widespread, but Low Impact Code 24

Execution Time Index Usage 25 for each _indexStat no-lock: find _index no-lock where _index._idx-num = _indexStat._indexStat-id no-error. if available( _index ) then do: find _file no-lock where recid( _file ) = _index._file-recid no-error. display _indexStat._indexStat-id "*" when ( _file._prime-index = recid( _index )) "u" when ( _index._unique = true ) _file._file-name when available _file _index._index-name when available _index _indexStat._indexStat-read. end.

Execution Time Index Usage 26 IdFile-NameIndex-NameRead 1* _File_File-Name39,550,882 2*U_Field_File/Field29 3U_Field_Field-Name2,457,451 4U_Field_Field-Position1,999,506 5*U_Index_File/Index4,791,744 6*U_Index-Field_Index/Number7,344,550 7_Index-Field_Field0 8*Utable1t1_idx16,668,593 9Utable1t1_idx2224,913 10*Utable2t2_idx142,078,065 11table2t2_idx219,772,967,351 12table2t2_idx30

VST Note The default db settings only collect statistics for the first 50 tables and indexes. To fix this: 27 -tablerangesize 1000 -indexrangesize 3000 define variable t as integer no-undo label “Tables”. define variable i as integer no-undo label “Indexes”. for each _file no-lock where _hidden = no: t = t + 1. end. for each _index no-lock: i = i + 1. end. display t i.

28 Case Study

Logical Scatter Case Study 29 A process reading approximately 1,000,000 records. An initial run time of 2 hours. – 139 records/sec. Un-optimized database.

Baseline TableIndex%Sequential%Idx UsedDensity Table1t1_idx10%100%0.09 t1_idx20% 0.09 Table2t2_idx169%99%0.51 t2_idx298%1%0.51 t2_idx374%0%0.51 4k DB Block Type 1 Area 30 -B 25,000 Hit Ratio 95% 19,208 IO ops Run time 2 hours

Round 1 – Increase Big B TableIndex%Sequential%Idx UsedDensity Table1t1_idx10%100%0.09 t1_idx20% 0.09 Table2t2_idx169%99%0.51 t2_idx298%1%0.51 t2_idx374%0%0.51 4k DB Block Type 1 Area 31 -B 100,000 Hit Ratio 98% 9,816 IO ops Run time 60 minutes

Round 2 – Increase Some More TableIndex%Sequential%Idx UsedDensity Table1t1_idx10%100%0.09 t1_idx20% 0.09 Table2t2_idx169%99%0.51 t2_idx298%1%0.51 t2_idx374%0%0.51 4k DB Block Type 1 Area 32 -B 200,000 Hit Ratio 99% 6,416 IO ops Run time 40 minutes

Restructure DB Dump & Load Convert to 8KB DB Blocks Convert to Type 2 Storage Areas 33

Round 3 – Back to Baseline –B 8k DB Block Type 2 Areas 34 -B 12,500 Hit Ratio 95% 9,417 IO ops Run time 55 minutes TableIndex%Sequential%Idx UsedDensity Table1t1_idx171% (0)100%0.10 t1_idx263% (0)0%0.10 Table2t2_idx185% (69)99%1.00 t2_idx2100% (98)1%1.00 t2_idx383% (74)0%0.99

Round 4 – Bump Big B 8k DB Block Type 2 Areas 35 -B 50,000 Hit Ratio 98% 4,746 IO ops Run time 30 minutes TableIndex%Sequential%Idx UsedDensity Table1t1_idx171%100%0.10 t1_idx263%0%0.10 Table2t2_idx185%99%1.00 t2_idx2100%1%1.00 t2_idx383%0%0.99

Round 5 – Big B … Again 8k DB Block Type 2 Areas 36 -B 100,000 Hit Ratio 99% 3,192 IO ops Run time 20 minutes TableIndex%Sequential%Idx UsedDensity Table1t1_idx171%100%0.10 t1_idx263%0%0.10 Table2t2_idx185%99%1.00 t2_idx2100%1%1.00 t2_idx383%0%0.99

Are We Done? 8k DB Block Type 2 Areas 37 The most used index is not the most sequential index! TableIndex%Sequential%Idx UsedDensity Table1t1_idx171%100%0.10 t1_idx263%0%0.10 Table2t2_idx185%99%1.00 t2_idx2100%1%1.00 t2_idx383%0%0.99

Restructure DB Dump & Load Dump Table 2 using the most used index: t2_idx1 Load Normally 38

Why Not? Return on Investment: – Pickups from improving %SEQ are less than those from improving Hit Ratio. – That last 15% is a drop in the bucket compared to the 6x improvement already gained. – Expected improvement would be about 2% -- of 20 minutes. Or around 24 seconds. 39

However… At low buffer hit ratios (95% or lower): – Restructuring to favor the most used index results in a 60% improvement in time. – And the hit ratio improves to 99.75%. – By eliminating 95% of the disk IO ops (112,247 -> 5196). On the other hand… the system in question has grown again and it may now be worth revisiting. 40

Conclusion Type 2 Storage Areas improve “logical scatter”. Addressing “logical scatter” can be a powerful performance improvement technique. Addressing “logical scatter” can be an alternative to increasing –B in environments where shared memory is constrained. 41

Questions? 42

Questions? Should I use USE-INDEX to force a “well ordered” index? Why might scatter grow over time? I have two (or more) conflicting, but very important, needs. What can I do? 43

Thank You! 44

1, 2, 3… Scatter! Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software

Similar presentations

Presentation on theme: "1, 2, 3… Scatter! Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1, 2, 3… Scatter! Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software

Similar presentations

Presentation on theme: "1, 2, 3… Scatter! Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software"— Presentation transcript:

Similar presentations

About project

Feedback