Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Deepest Depths of promon

Similar presentations


Presentation on theme: "The Deepest Depths of promon"— Presentation transcript:

1 The Deepest Depths of promon
And how it may help in troubleshooting certain DB problems Presented by: Dan Foreman

2 Dan Foreman Progress User since 1984
Progress Employee since October 2014 Author of several Progress related Publications Author of several cool and useful Progress DBA Tools ProMonitor & ProCheck & LockMon Pro Dump&Load Balanced Benchmark Basketball Fanatic…which sometimes leads to unexpected trips to the ER

3 Dan Foreman

4 Gus Bjorklund Progress Wizard

5 Brief History of the debghb option in promon
Added to promon V6.3 by Gus Purpose: The shared memory architecture introduced in V6.3 was new and very different from earlier versions and a way to monitor shared memory activity at a detailed level was needed The debghb option was not a formally endorsed enhancement but written by Gus in his spare time deb = DEBug ghb = Gus’s initials….the middle initial is a Top Secret fanatically guarded by the Finnish government

6 Warning The debghb option is not formally documented by Progress
DO NOT call or Gus or PTS for help in using this option Many of the screens and/or metrics have no value to a DBA The view of the data is not transactionally consistent sometimes even on the same screen; example to follow Some of the data is not accurate (data overflow, rounding errors, etc) Some of the screens are broken (don’t display any data) The debghb option can be altered, removed, or hidden by Progress any time they (we?) want to

7 Warning #2 DB activities in shared memory can be slowed down if certain options are enabled 05/08/ OpenEdge Release 10 Monitor (R&D) 19:10: Adjust Latch Options 1. Spins before timeout: 2. Enable latch activity data collection 3. Enable latch timing data collection 4. Initial latch sleep time: 10 milliseconds 5. Maximum latch sleep time: milliseconds 6. Record Free Chain Search Depth Factor: 5

8 How do I access debghb? Start promon
Enter: R&D (also works in lower case starting in V10) Enter: debghb You have now entered the debghb “zone” Two main differences in the world of debghb Extensions to some existing R&D screens Enables access to “This menu is not here Menu” Enter: “6” even though there is no visible option 6 See next slide

9 This menu is not here Menu
06/06/ OpenEdge Release 10 Monitor (R&D) 15:23: This menu is not here Menu 1. Cache Entries 2. Hash Chain 3. Page Writer Queue 4. Lru Chains 5. Locked Buffers 6. Buffer Locks 7. Buffer Use Counts 8. Resource Queues 9. TXE Lock Activity 10. Adjust TXE Options 11. Latch Counts 12. Latch Times 13. I/O Wait Time by Type 14. I/O by File 15. Buffer Lock Queue 16. Semaphores 17. Shutdown

10 Operating Hints Allow at least 40-45 lines of screen data
Allow at least columns of screen width Zero out the stats (“z”) to get a clean starting place This ‘zeroing’ does not wipe out the actual shared memory counters but only affects the current promon session Update the stats periodically (“u”) to get snapshots All the above can be scripted

11 Operating Hints - User# -1
Usecnt = # of concurrent processes accessing the block When initially examining the BLQ there were 5 Clients accessing the same DBKEY But before all 5 could be displayed: One Client dropped off, i.e. released the Buffer Lock, before they could be displayed Another one of the 5 is partially displayed; i.e. the -1 User# 02/06/ Status: Buffer Lock Queue 00:37:21 User DBKEY Area T Status Type Usect I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE

12 Useful Screens - Checkpoints
Extensions to the ‘normal’ Checkpoint screen added in V10.2B SP5 Columns of interest Duration: the amount of time required to complete a Checkpoint; the entire Database is transactionally frozen during this time _CheckPoint._CheckPoint-Duration (V10.2B SP5) Sync Time: a subset of the ‘Duration’ column; the amount of time required to execute fdatasync() system call _CheckPoint._CheckPoint.Synctime (V10.2B SP5) See for an excellent description of fdatasync (don’t confuse it with fsync). Sample data on the next slide

13 Useful Screens - Checkpoints
The ‘Duration’ of the Checkpoints (i.e. the total freeze time) is very high for most of the CPs displayed A ‘Duration’ of less than 1 second is a good goal The 10 sec ‘Duration’ is approximately 1/3 of the CP ‘Freq’. In other words, a CP is occurring approximately every seconds and for up to 10 seconds of that period, NO transaction activity can take place. Ckpt Database Writes ---- No. Time Len Freq Dirty CPT Q Scan APW Q Flushes Duration Sync Time :56: :52: :51: :51: :50: :49: :49:

14 Useful Screens – Resource Queues
NHM (Not Here Menu) - Option #8 Do not confuse Resources with Latches Rich Banville gave a good talk on this at Exchange 2008 In general the busiest locks will be: DB Buf S Lock DB Buf X Lock Record Lock Waits that can be problematic: DB Buf I Lock (I = Intent but these are for Index blocks) Sample on the next slide

15 Useful Screens – Resource Queues
01/31/ Activity: Resource Queues 00:31: /31/13 00:26 to 01/31/13 00:31 (5 min) Queue Requests Waits Total /Sec Total /Sec Pct Record Lock Trans Commit DB Buf I Lock Record Get DB Buf Read DB Buf Write DB Buf S Lock DB Buf X Lock DB Buf S Lock LRU DB Buf X Lock LRU DB Buf Write LRU BI Buf Read BI Buf Write TXE Share Lock TXE Update Lock TXE Commit Lock

16 Useful Screens – Latch Counts
NHM (Not Here Menu) - Option #11 The R&D Blocked Clients screen doesn’t show Latch contention so debghb is the only place in promon where detailed Latch activity is visible Definition of Naps: When –spin is ‘used up’ by a Progress Client, the process Naps (i.e. does no useful work) for a while and tries again General Principle: Napping is bad (unless it’s my wife) Samples on the next few slides

17 Latch Counts – OM Latch OM (Object Cache) Latch activity can be totally eliminated by setting the -omsize parameter equal to or greater than the number of _StorageObject records. 04/24/ Activity: Latch Counts 00:59: /24/13 00:54 to 04/24/13 00:59 (5 min 1 sec) ----- Locks Busy Naps Spins Nap Max - Owner Total /Sec /Sec Pct /Sec /Sec /Lock /Busy Total HWM MTX USR OM

18 Latch Counts – USR Latch
The small contention on the USR (DB Connection Table) Latch is because Statement Caching is enabled 04/25/ Activity: Latch Counts 00:33: /25/13 00:28 to 04/25/13 00:33 (5 min 0 sec) ----- Locks Busy Naps Spins Nap Max Owner Total /Sec /Sec Pct /Sec /Sec /Lock /Busy Total HWM MTX USR OM BIB SCH LKP GST TXT SEQ AIB TXQ EC LKF BFP BHT PWQ CPQ LRU LRU BUF

19 Latch Counts – LRU Chain
The total number of Locks for LRU is the second highest of all the Resources shown (BHT – Buffer Hash Table – is #1) The # of Naps per Second is the highest of all latches (Zero is ideal) 01/31/ Activity: Latch Counts 00:05: /31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks Busy Naps Owner Total /Sec /Sec Pct /Sec MTX OM BHT CPQ LRU LRU BUF BUF BUF BUF

20 Latch Counts – LRU Chain
The # of locks on the second LRU (Alternate Buffer Cache) is zero because all the ABC blocks completely fit in the amount of –B2 memory allocated 01/31/ Activity: Latch Counts 00:05: /31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks Busy Naps Owner Total /Sec /Sec Pct /Sec BHT CPQ LRU LRU BUF BUF

21 Show before and after LRU contention!!!!!!

22 Latch Counts – LRU Chain
‘Owner’ column: if the User# doesn’t change (in value or frequency) that can be a problem indicator because Latches should be held for only a fraction of a second 01/31/ Activity: Latch Counts 00:05: /31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks Busy Naps Owner Total /Sec /Sec Pct /Sec BHT CPQ LRU LRU BUF BUF

23 Using Latch Counts to set -spin
Short answer – Forget It! If it was that easy Progress would have done it already Past attempts have not been successful Also the optimal value of –spin is not going to be the same for each Latch General guidelines: Greater than 1,000 Less than 50,000 Current Default: 6,000 * (# of CPU Cores) Default not advised if you have more than 16 Cores Dan’s Formula (Patent Pending): (DBA-Birthday-Year * ) Gus’s formula: 5,000

24 Useful Screens – Buffer Lock Queue
NHM (Not Here Menu) - Option #15 The ‘normal’ R&D Blocked Clients screen does not show the Area that the DBKEY belongs to The Buffer Lock Queue (BLQ) Screen shows the Area as well as the Block Type Examples on the next two slides

25 R&D Blocked Clients The R&D Blocked Clients screen doesn’t show enough information to identify the Object involved in this contention storm for DBKEY There were 29 Clients all blocked on the same DBKEY 01/31/ Status: Blocked Clients 00:26:41 Usr Name Type Wait Wait Info Trans id Login time 730 _AUTO-B SELF/ABL BKSH /30/13 23:22 735 _AUTO-B SELF/ABL BKSH /30/13 23:23 743 _AUTO-B SELF/ABL BKSH /30/13 23:22 747 _AUTO-B SELF/ABL BKSH /30/13 23:22 749 _AUTO-B SELF/ABL BKSH /30/13 23:23 755 _AUTO-B SELF/ABL BKSH /30/13 23:23 769 _AUTO-B SELF/ABL BKSH /30/13 23:22

26 Buffer Lock Queue IF there is a matching DBKEY on the the BLQ screen, we can get the Area# and the Block Type (I = Index) There were 29 processes on the Blocked Clients screen with this DBKEY and only 4 on the BLQ screen with the same DBKEY (remember, not transactionally consistent) 01/31/ Status: Buffer Lock Queue 00:26:41 User DBKEY Area T Status Type Usect I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE <lines unrelated to DBKEY snipped>

27 Thank You! Questions?


Download ppt "The Deepest Depths of promon"

Similar presentations


Ads by Google