Presentation is loading. Please wait.

Presentation is loading. Please wait.

Promon for Dummies & Savants

Similar presentations


Presentation on theme: "Promon for Dummies & Savants"— Presentation transcript:

1 Promon for Dummies & Savants
What’s important & what isn’t Presented by: Dan Foreman

2 Dan Foreman Intro Progress guy since 1984

3 Dan Foreman Intro I enjoy cycling & basketball the same time)

4 Introduction: Dan Foreman
I used to be a Super Frequent Flier 3.4 million miles on United Airlines 3.5 million miles on American Airlines

5 Dan Foreman Intro Unstable employment history in the last few years
US Navy Polar Cryogenics BravePoint/USI –2015 Progress White Star Software 2017 Retirement ??? Author of several Progress related publications Progress Performance Tuning Guide Progress Database Administration Guide Progress System Table Guide Available at:

6 Disclaimer & Stuff Almost every DBA/Consultant has an opinion about what is the most/least important data in promon It is very difficult to adequately cover such an extensive topic in an hour Mobile devices on mute – if Scott Dulecki hasn’t said that already Audience Survey Progress Version Largest Single DB Who are the promon Dummies? Who are the promon Savants?

7 Promon Basics First appeared in Progress V5
Similar to today’s promon but also hugely different

8 Promon Basics Totally overhauled in V6.3 when the new Shared Memory architecture was introduced Basic operation: Take a snapshot of a particular area of DB Shared Memory and render it into human readable form Therefore promon doesn’t work in single user mode Limitations: Snapshot of a moving target (the DB stats) quickly becomes out-of-date Single database per promon session Fixed Interface No Warnings or Alerts

9 Promon versus Virtual System Tables
Virtual System Tables (VSTs) were introduced in V8 Initially VSTs just mimicked the promon menu structure Eventually there was functionality added to the VSTs that didn’t exist in promon Examples: _tablestat – Table Level I/O _indexstat – Index Level I/O With a few exceptions most promon screens shown in this presentation have a corresponding VST table or tables.

10 Promon versus Virtual System Tables
DB Monitoring Tools that use the VSTs ProTop (White Star) ProMonitor (Dan Foreman, now owned by Progress) OpenEdge Management For super details on VSTs refer to my System Tables Guide

11 Important Metrics – Top Level – 3. Block Access
DB Requests – the number of times a shared memory Client requested a DB Block DB Reads – the number of times the requested block was not in the DB Buffer Cache (-B/-B2) and had to be retrieved from physical storage, i.e. the DB Data is displayed for the entire DB (User# 99999) and individual Client processes (if they are connected at the time of the snapshot) Type Usr:Ten Name Domain DB Requests DB Reads BI Reads AI Reads \Writes \Writes \Writes Acc TOTAL Acc Dan

12 Important Metrics – Top Level – 4. Lock Table
I recommend using the _Lock VST for monitoring & debugging record locking issues The promon view of the Lock Table doesn’t have filters so it can be difficult to locate locking conflicts Also paging through a busy Record Lock Table in promon is a soul sucking exercise VSTs allow you to easily see the holder of the lock as well as the Clients waiting for the lock to be released The performance issues related to accessing the _Lock VST were fixed in V11.4

13 Important Metrics – Top Level – 5. Activity
Activity - Sampled at 04/20/17 10:41 for 0:04:09. Event Total Per Sec Event Total Per Sec Commits Undos Record Updates Record Reads Record Creates Record Deletes DB Writes DB Reads BI Writes BI Reads AI Writes Record Locks Record Waits Checkpoints Buffs Flushed Rec Lock Waits 0 % BI Buf Waits % AI Buf Waits % Writes by APW % Writes by BIW % Writes by AIW % Buffer Hits % Primary Hits % Alternate Hits 0 % DB Size MB BI Size K AI Size K FR chain blocks RM chain blocks Shared Memory K Segments 0 Servers, 2 Users (2 Local, 0 Remote, 0 Batch),1 Apws

14 Important Metrics – Top Level – 5. Activity
DB Uptime – because many of the metrics on this screen are based upon how long the DB has been running Record Reads, a good general load measurement Record Waits, i.e. Locking Conflicts Checkpoints – CPs will be discussed in more detail later in this presentation Buffers Flushed Zero is a perfect ‘score’ but not always possible Fix with Async Page Writers (APWs) and a proper BI Cluster Size Buffer Hits % - MISLEADING METRIC – The Buffer Hit Ratio (DB Requests / DB Reads) is a much better measurement of DB Cache effectiveness

15 Important Metrics – Top Menu – 6. Shared Resources
Locking table (-L parameter) entries in use Locking table high water mark Shared Resources: Busy After Image Extent: /ai2/sxe/nxt.a25 Number of database buffers (-B): Number of database alternate buffers (-B2): 0 Number of before image buffers (-bibufs): 100 Number of after image buffers (-aibufs): 100 Excess shared memory size (-Mxs): 708 Before-image truncate interval (-G): 0 No crash protection (-i): Not enabled Maximum private buffers per user (-Bpmax): 64 Current size of locking table (-L): Locking table entries in use: 0 Locking table high water mark: 995 Maximum number of clients per server (-Ma): 25 Max number of JTA transactions (-maxxids): 0 Delay of before-image flush (-Mf): 3 Maximum number of servers (-Mn): 70 Maximum number of users (-n): 1701 Before-image file I/O (-r -R): Raw

16 Important Settings – Top Level – 7. Database Status
Database block size Before Image Block Size After Image Block Size Before Image Cluster Size Database Status: Database version number: 4269 Database state: Open (1) Database damaged flags: None (0) Integrity flags: None (1536) Database block size (bytes): << S/B 8192 (8k) Total number of database blocks: Database blocks high water mark: Free blocks below highwater mark: 59 Record blocks with free space: 3 Before image block size (bytes): 8192 Before image cluster size (kb): 512 << S/B > (16mb+) After image block size (bytes): 8192 Last transaction number: Highest file number defined: 0

17 R&D Section History of R&D
Originally undocumented The first documentation available in the “wild” appeared in my Progress Performance Tuning Guide but was not officially “sanctioned” by Progress Eventually R&D got documented…sort of Documentation example from R&D > Status > BI Log There is a lot of useless stuff (for a DBA) in R&D However there is also a huge amount of very important stuff Bytes free in current cluster — The number of free bytes remaining in the current BI cluster.

18 R&D Main Menu 04/20/17 OpenEdge Release 11 Monitor (R&D)
10:49: Main (Top) Menu 1. Status Displays ... 2. Activity Displays ... 3. Other Displays ... 4. Administrative Functions ... 5. Adjust Monitor Options

19 R&D > 1. Status 04/20/17 OpenEdge Release 11 Monitor (R&D)
10:50: Status Displays Menu 1. Database 2. Backup 3. Servers 4. Processes/Clients ... 5. Files 6. Lock Table 7. Buffer Cache 8. Logging Summary 9. BI Log 10. AI Log 11. Two-Phase Commit 12. Startup Parameters 13. Shared Resources 14. Shared Memory Segments 15. AI Extents 16. Database Service Manager 17. Servers By Broker 18. Client Database-Request Statement Cache ... 19. Schema Locks & Wait Queue <<< V10.2B04 20. Broker Startup Parameters <<< V11.5.0

20 R&D > 1. Status > 4. Processes/Clients > 2. Blocked Clients
Shows record lock conflicts (REC) and some (but not all!!) Shared Memory contention BKSH = Share Lock on a DB Block in the Buffer Cache More info on Shared Memory conflicts later in the session Ideally this screen will always be blank or the Clients will only display for one screen iteration 03/01/ Status: Blocked Clients 08:19:16 Usr Name Type Wait Wait Info Trans id Login time rpt SELF/ABL BKSH /01/18 07:48 rpt SELF/ABL REC /01/18 08:16 rpt SELF/ABL BKSH /01/18 07:55

21 R&D > 1. Status > 4. Processes/Clients > 2. Blocked Clients
Progress V11.7 adds 3 cool new fields Num Txns BI Rread – BI Record (Note) Reads BI Rwries – BI Record (Note) Writes 03/17/ Status: Blocked Clients by user number for all tenants 14:02:13 Usr:Ten Name Domain Type Wait Wait Info Trans id Num Txns BI RRead BI RWries Login time Schema Time dan SELF/ABL REC : /17/18 14: dan SELF/ABL REC : /17/18 14:

22 Active Transactions are the #1 cause of abnormal, excessive BI growth
R&D > 1. Status > 4. Processes/Clients > 3. Active Transactions Active Transactions are the #1 cause of abnormal, excessive BI growth 10/30/ Status: Active Transactions 6:00:25 Usr Name Type Login time Tx start time Trans id Trans State 6 gus SELF 10/30/02 05:57 10/30/02 05: Active 7 margaret SELF 10/30/02 05:57 10/30/02 05: Active 9 dan SELF 10/30/02 05: Begin

23 BI RReads – BI Record Reads for only this transaction
R&D > 1. Status > 4. Processes/Clients > 3. Active Transactions V11.7 BI RReads – BI Record Reads for only this transaction BI Rwrites – BI Record Writes for only this transaction 03/17/ Status: Active Transactions by user number for all tenants 14:18:19 Usr:Ten Name Domain Type Login time Tx start time Trans id BI RReads BI RWrites Trans State dan SELF/ABL 03/17/18 14: Begin FWD

24 R&D > 1. Status > 7. Buffer Cache
Empty Buffers indicate potentially wasted memory because the Buffer Cache (-B) is larger than necessary If Empty Buffers is consistently non-zero over a span of a few weeks, the Buffer Cache is probably too large A Buffer Cache that is too large doesn’t hurt anything but that surplus memory can probably be used more productively elsewhere Total buffers: Hash table size: 887 Used buffers: Empty buffers:

25 R&D > 1. Status > 9. BI Log
The only place that you can actually experience a Checkpoint freeze, the same duration that the users are Repeatedly refresh the screen as the percent approaches 0% and “feel the pause”… if there is one More than .5 seconds is poor; > 1 second is terrible Before-image block size: bytes Before-image cluster size: kb ( bytes) Number of before-image extents: 1 Before-image log size (kb): Bytes free in current cluster: (20 %) Last checkpoint was at: /19/17 10:58 Number of BI buffers: Full buffers: Current BI Cluster: < V11.7 BI Cluster HWM: < V11.7

26 R&D > 1. Status > 16. Database Service Manager
Very important for OpenEdge Replication users Related to the –pica Cache Communication Area Size : KB Total Message Entries : Free Message Entries : Used Message Entries : Used HighWater Mark : Area Filled Count : Service Latch Holder : Access Count : Access Collisions :

27 R&D > 2. Activity 04/20/17 OpenEdge Release 11 Monitor (R&D)
11:16: Activity Displays Menu 1. Summary 2. Servers 3. Buffer Cache 4. Page Writers 5. BI Log 6. AI Log 7. Lock Table 8. I/O Operations by Type 9. I/O Operations by File 10. Space Allocation 11. Index 12. Record 13. Other

28 R&D > 2. Activity > 5. BI Log
Empty buffer waits. Number of times –bibufs were full and could not accept new Notes (records). Should be zero or very low. 03/13/ Activity: BI Log 11:08: /11/18 06:41 to 03/13/18 10:52 (52 hrs 11 min) Total Per Min Per Sec Per Tx Total BI writes BIW BI writes Records written Bytes written K Total BI Reads Records read Bytes read Clusters closed Busy buffer waits Empty buffer waits Log force waits Log force writes Partial writes Input buffer hits Output buffer hits Mod buffer hits BO buffer hits

29 R&D > 2. Activity > 13. Other
See Rich Banville’s session on Shared Memory Contention from Progress Exchange 2008/2011 download.psdn.com/media/exch_audio/2008/OPS/OPS-28_Banville.ppt pugchallenge.org/slides/Waiting_AmericaPUG.pptx 03/13/ Activity: Other 11:10: /11/18 06:41 to 03/13/18 10:52 (52 hrs 11 min) Total Per Min Per Sec Per Tx Commit Undo Wait on semaphore Non-blocking waits Semaphore latch waits Flush master block

30 R&D > 3. Other 03/13/18 OpenEdge Release 10 Monitor (R&D)
11:20: Other Displays Menu 1. Performance Indicators 2. I/O Operations by Process 3. Lock Requests By User 4. Checkpoints 5. I/O Operations by User by Table 6. I/O Operations by User by Index 7. Total Locks per User

31 R&D > 3. Other > 1. Performance indicators
03/13/ Activity: Performance Indicators 11:14: /11/18 06:41 to 03/13/18 10:52 (52 hrs 11 min) Total Per Min Per Sec Per Tx Commits Undos Index operations K Record operations K Total o/s i/o Total o/s reads Total o/s writes Background o/s writes Partial log writes Database extends Total waits Lock waits Resource waits Latch timeouts Buffer pool hit rate: 99 % Primary pool hit rate: 99 % Alternate pool hit rate: 0 %

32 R&D > 3. Other > 1. Performance indicators
Total o/s reads – An indicator of how much disk read activity is occurring for this DB; high values may indicate a small Buffer Cache, fragmented records, inefficient code/indexing Database extends – if you are using the “all fixed extents” model, this number should be zero For the metrics in green, I refer you again to Rich Banvilles 2008/2011 presentations

33 R&D > 3. Other > 4. Checkpoints
This screen is critical because a huge amount of behind-the-scenes housekeeping takes place during a CP That housekeeping can be the source of multiple performance problems History of the last 8 Checkpoints In V11.7 the default is 32 Checkpoints and tuneable

34 R&D > 3. Other > 4. Checkpoints
Extensions to the ‘normal’ Checkpoint screen were added in V10.2B SP5 Columns of interest Duration: the amount of time required to complete a Checkpoint; the entire Database is transactionally frozen during this time VST: _CheckPoint._CheckPoint-Duration Sync Time: a subset of the ‘Duration’ column; the amount of time required to execute fdatasync() system call VST: _CheckPoint._CheckPoint.Synctime See for an excellent description of fdatasync (don’t confuse it with fsync). Sample data on the next slide

35 R&D > 3. Other > 4. Checkpoints
The Duration of the Checkpoints (i.e. the total freeze time for all transactions) is very high for the CPs displayed A Duration of less than 1 second is a good goal The 10 sec Duration is approximately 1/3 of the CP Freq. In other words, a CP is occurring every seconds and for up to 10 seconds of that Freq, NO Client transaction activity can take place. Ckpt Database Writes ---- No. Time Len Freq Dirty CPT Q Scan APW Q Flushes Duration Sync Time :56: :52: :51: :51: :50: :49: :49:

36 R&D > 3. Other > 4. Checkpoints
DB Write. Added in V The time in seconds used to scan and write the buffers from the Buffer Cache (-B & -B2). A large buffer cache can make this value abnormally high. Bi Write. Added in V The time in seconds needed to write the buffers from the -bibufs memory. 03/24/15 Checkpoints 17:50:12 Ckpt Database Writes Performance Timings No. Time Len Freq Dirty CPT Q Scan APW Q DB Writes Bi Writes Duration Sync Time DB Write Bi Write NumChkpt :50: :50: :50: :50: :50: :49: :49: :49:

37 Brief History of the debghb option in promon
Added to promon in V6.3 by Gus Bjorklund Purpose: The shared memory architecture introduced in V6.3 was new and very different from earlier versions and a way to monitor shared memory activity at a detailed level was needed The debghb option was not an enhancement formally endorsed PSC but written by Gus in his spare time deb = DEBug ghb = Gus’s initials….the middle initial is a Top Secret fanatically guarded by the Finnish government

38 Warning #1 The debghb option is not documented by Progress
DO NOT call or Gus or PTS for help in using this option (Gus is retired anyway) Many of the screens and/or metrics have zero value to a DBA The view of the data is not transactionally consistent sometimes even on the same screen; example to follow Some data is not accurate (data overflow, rounding errors, etc) Some screens are broken/disabled (don’t display any data) The debghb option can be altered, removed, or hidden by Progress any time they want to

39 Warning #2 DB activities in shared memory can be slowed down if certain options are enabled 05/08/ OpenEdge Release 10 Monitor (R&D) 19:10: Adjust Latch Options 1. Spins before timeout: 2. Enable latch activity data collection 3. Enable latch timing data collection 4. Initial latch sleep time: 10 milliseconds 5. Maximum latch sleep time: milliseconds 6. Record Free Chain Search Depth Factor: 5

40 debghb Operating Hints
Allow at least lines of screen height Allow at least columns of screen width Zero out the stats (“z” option) to get a clean starting place This ‘zeroing’ does not wipe out the actual Shared Memory counters but only affects the current promon session Update the stats periodically (“u”) to get a fresh snapshot All the above can easily be scripted

41 How do I access debghb? Start promon
Enter: R&D (also works in lower case starting in V10) Enter: debghb You have now entered the debghb “universe” Two main differences in the debghb zone Extensions to some existing R&D screens Enables access to “This menu is not here Menu” Enter: “6” even though there is no visible option 6 See next slide

42 This menu is not here Menu
06/06/ OpenEdge Release 10 Monitor (R&D) 15:23: This menu is not here Menu 1. Cache Entries 2. Hash Chain 3. Page Writer Queue 4. Lru Chains 5. Locked Buffers 6. Buffer Locks 7. Buffer Use Counts 8. Resource Queues 9. TXE Lock Activity 10. Adjust TXE Options 11. Latch Counts 12. Latch Times 13. I/O Wait Time by Type 14. I/O by File 15. Buffer Lock Queue 16. Semaphores 17. Shutdown

43 debghb Operating Hints - User# -1
Usecnt = # of concurrent processes accessing the block When initially examining the Buffer Lock Queue there were 5 Clients accessing the same DBKEY But before all 5 could be displayed: One Client dropped off, i.e. released the Buffer Lock, before it was completely displayed Another one of the 5 is partially displayed; i.e. the -1 User# 02/06/ Status: Buffer Lock Queue 00:37:21 User DBKEY Area T Status Type Usect I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE

44 8. Resource Queues Do not confuse Resources with Latches
Rich Banville’s talks on this topic at Exchanges 2008/2011 download.psdn.com/media/exch_audio/2008/OPS/OPS-28_Banville.ppt pugchallenge.org/slides/Waiting_AmericaPUG.pptx In general the busiest locks will be: DB Buf S Lock (Share Lock on a DB Buffer) DB Buf X Lock (Exclusive Lock on a DB Buffer) Record Lock Waits that can be problematic (if very high): DB Buf I Lock (I = Intent but these are only for Index blocks) Sample on the next slide

45 8. Resource Queues 01/31/13 Activity: Resource Queues
00:31: /31/13 00:26 to 01/31/13 00:31 (5 min) Queue Requests Waits Total /Sec Total /Sec Pct Record Lock Trans Commit DB Buf I Lock Record Get DB Buf Read DB Buf Write DB Buf S Lock DB Buf X Lock DB Buf S Lock LRU DB Buf X Lock LRU DB Buf Write LRU BI Buf Read BI Buf Write TXE Share Lock TXE Update Lock TXE Commit Lock

46 11. Latch Counts The R&D Blocked Clients screen does not show Latch contention so debghb is the only place in promon where detailed Latch activity is visible Definition of Naps: When –spin is ‘used up’ by a Progress Client, the process Naps (i.e. does no useful work) for a while and tries again General Principle: Napping is evil! Samples on the next few slides

47 11. Latch Counts – OM Latch OM (Object Cache) Latch activity can be totally eliminated by setting the -omsize DB startup parameter equal to or greater than the number of _StorageObject records. Note that the _StorageObject table is not a VST but a System Table. 04/24/ Activity: Latch Counts 00:59: /24/13 00:54 to 04/24/13 00:59 (5 min 1 sec) ----- Locks Busy Naps Spins Nap Max - Owner Total /Sec /Sec Pct /Sec /Sec /Lock /Busy Total HWM MTX USR OM

48 11. Latch Counts – LRU Chain
The total number of Locks for LRU is the second highest of all the Latches shown (BHT – Buffer Hash Table – is #1) The # of Naps per Second is the highest of all latches (Zero is ideal) 01/31/ Activity: Latch Counts 00:05: /31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks Busy Naps Owner Total /Sec /Sec Pct /Sec MTX OM BHT CPQ LRU LRU(2) BUF BUF BUF BUF

49 11. Latch Counts – LRU Chain
The # of locks on the second LRU (Alternate Buffer Cache) is zero because all the ABC blocks completely fit in the amount of –B2 memory allocated (or the –B2 is not implemented) 01/31/ Activity: Latch Counts 00:05: /31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks Busy Naps Owner Total /Sec /Sec Pct /Sec BHT CPQ LRU LRU(2) BUF BUF

50 11. Latch Counts – Owner Owner column: if the User# doesn’t change (in value or frequency) that can be a problem indicator Latches should be held for only a fraction of a second It is unusual for the same Client to be ‘dominant’ on a particular latch 01/31/ Activity: Latch Counts 00:05: /31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks Busy Naps Owner Total /Sec /Sec Pct /Sec BHT CPQ LRU LRU(2) BUF BUF

51 Using Latch Counts to set -spin
Short answer – Forget It, exercise in futility If it was that easy Progress would have done it already Past attempts have not been successful Also the optimal value of –spin is not going to be the same for each type of Latch General guidelines: Greater than 1,000 Less than 50,000 Current Default: 6,000 * (# of CPU Cores) Default not advised if you have 16 Cores or more Dan’s Formula (Patent Pending): (DBA-Birthday-Year * ) Gus’s “formula”: 5,000

52 15. Buffer Lock Queue The ‘normal’ R&D Blocked Clients screen does not show the Area that the DBKEY belongs to; remember DBKEYs are unique to an Area, not a DB The debghb Buffer Lock Queue (BLQ) screen shows the Area as well as the Block Type Examples on the next two slides

53 R&D – 1. Status – 4. Processes/Clients – 4. Blocked Clients
The R&D Blocked Clients screen doesn’t show enough information to identify the Object involved in this contention storm for DBKEY There were 29 Clients all blocked on the same DBKEY Where is the problem? 01/31/ Status: Blocked Clients 00:26:41 Usr Name Type Wait Wait Info Trans id Login time 730 _AUTO-B SELF/ABL BKSH /30/13 23:22 735 _AUTO-B SELF/ABL BKSH /30/13 23:23 743 _AUTO-B SELF/ABL BKSH /30/13 23:22 747 _AUTO-B SELF/ABL BKSH /30/13 23:22 749 _AUTO-B SELF/ABL BKSH /30/13 23:23 755 _AUTO-B SELF/ABL BKSH /30/13 23:23 769 _AUTO-B SELF/ABL BKSH /30/13 23:22 …….

54 15. Buffer Lock Queue (BLQ)
IF there is a matching DBKEY on the BLQ screen, we can get the Area# and the Block Type (I = Index) There were 29 Clients on the Blocked Clients screen accessing this DBKEY and only 4 on the BLQ screen with same DBKEY (remember, target is moving fast!) 01/31/ Status: Buffer Lock Queue 00:26:41 User DBKEY Area T Status Type Usect I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE I LOCKED SHARE <lines unrelated to DBKEY snipped>

55 Solution Wrote a script to invoke promon every few seconds and dump only the Blocked Clients & BLQ screens The ‘every few seconds’ frequency improves the likelihood of catching the problem but produces a huge amount of data Then wrote a 4GL program to parse the data and find which DBKEY/Areas were the hot spots The ultimate source of the problem was dozens of Clients updating an indexed field in the same record

56 Questions? dforeman@bravepoint.com dforeman@progress.com
Thank You! Questions? (not checked very often) (I own the domain)


Download ppt "Promon for Dummies & Savants"

Similar presentations


Ads by Google