Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Monitoring With

Similar presentations


Presentation on theme: "Database Monitoring With"— Presentation transcript:

1 Database Monitoring With
Tom Bascom President, Greenfield Technologies

2 Agenda Why do you need a monitor? Monitoring Alternatives
What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features

3 Why Do You Need A Monitor?
Baselining Benchmarking Interactive troubleshooting Capacity management Resource Optimization

4 Agenda Why do you need a monitor? Monitoring Alternatives
What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features

5 Monitoring Alternatives
SAR, vmstat, iostat Glance, TOPAS, Navisphere, Measureware, PerfMon … TOP, NMON PROMON Fathom ProMonitor ProTop!

6 Progress Focused Interactive, Real-Time Sample Oriented Multi-platform VST Based 4GL Code Open Source Free!

7 Agenda Why do you need a monitor? Monitoring Alternatives
What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features

8 What Are VSTs? Virtual System Tables
A 4GL View of Progress Data Structures (the same as those shown in PROMON.) No Performance Impact (unless you do some really dumb things!) Primarily Read-Only Not Terribly “User Friendly” Quirky at times… Dumb things: for each _block… for each _lock…

9 Some VST Quirks Updateable: Table & Index Ranges: Quirky Keys:
_startup._spin Private buffers APW settings Table & Index Ranges: -tablebase, -tablerangesize -indexbase, -indexrangesize Table & Index Window can be reset! Quirky Keys: _myconnection… _tablestat & _indexstat

10 User Number/Id VST Confusion…
find _myconnection no-lock. find _connect no-lock where _connect-usr = _myconn-userid. display _connect-usr _connect-id _myconn-userid. find _userio no-lock where _userio-usr = _connect-usr. display _userio-id _userio-usr. User-Id _Connect-Id MyConn-UserId _UserIO-Id Usr ======= =========== ============= =========== ===========

11 Table Stats /** This does NOT work if –tablebase <> 1!!!
find _File no-lock where _File._File-num = p_tbl. find _TableStat no-lock where _TableStat-id = p_tbl. display p_tbl _file-num _TableStat-id. **/ /*** instead, use the following: ***/ find _File no-lock where _File._File-num = _TableStat-id. -tablebase <> 1 is the problem _tablestat-id will be magically fixed when an _tablestat record is retrieved. Bug? or well known "feature"?

12 Index Name find _IndexStat no-lock where _IndexStat-id = p_idx.
find _Index no-lock where _Index._Idx-num = _IndexStat-id. find _File where recid( _File ) = _Index._File-recid. tt_index.idxnote = _File._File-name + “.” + _Index._Index-name + ( if _file._prime-index = recid(_index) then “ P" else “ " ) + ( if _index._unique then "U" else "" )

13 Agenda Why do you need a monitor? Monitoring Alternatives
What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features

14 A Monitoring Architecture
VST Based Multi-Platform UNIX Character HTML Windows GUI Using Publish & Subscribe More than just a VST Browser! Customizable!

15 A Monitoring Architecture
Need to add some pub & sub animation… but animation always breaks 

16 Agenda Why do you need a monitor? Monitoring Alternatives
What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features

17 Customizing And Extending The Code
Events That A Module Handles Structure Of A Module Defining the Display Maintaining State Adding Help Making A Module Available

18 Events That A Module Handles
Mon-Restart Empty Temp-Table Remove self from memory Mon-Init Define Display Data Elements Mon-Update Refresh Data Calculate intervals, rates and so forth Update UI Temp-Table with results

19 Structure Of A Module {lib/protop.i} def var support as character no-undo initial “Resources”. {lib/tt_xstat.i} procedure mon-restart: empty temp-table tt_xstat. delete procedure this-procedure. end. procedure mon-init: empty temp-table tt_xstat. /* define labels */ end. procedure mon-update: /* the real work */ end. subscribe to “mon-restart” anywhere run-procedure “mon-restart”. subscribe to “mon-init” anywhere run-procedure “mon-init”. subscribe to “mon-update” anywhere run-procedure “mon-update”. publish “register-disp-type” ( input support ).

20 Defining the Display ui-define-label( support, 1, 1, "xid", " Id" ).
ui-define-label( support, 1, 2, "xname", "Resource " ). ui-define-label( support, 1, 5, "stat1", " Locks" ). ui-define-label( support, 1, 6, "stat2", " Waits" ). ui-define-label( support, 1, 8, "stat-ratio", " Lock%" ). ui-define-label( support, /* display type */ 1, /* variant */ 8, /* order */ "stat-ratio", /* data element name */ " Lock%“ /* label value */ ).

21 Maintaining State define temp-table tt_xstat no-undo
field xid as integer field xvalid as logical field xname as character field misc1 as character field misc2 as character field stat1 as integer extent 5 field stat2 as integer extent 5 field stat3 as integer extent 5 field stat-ratio as decimal index xid-idx is unique primary xid. This is a general purpose TT useful for a lot of ProTop stats. Xid keeps track of the instance of something being tracked – latch id, userid, etc… Xvalid determines if the instance is valid – (need good example) Xname is a descriptive name of the instance. Stat# is a metric being tracked – the 5 extents are: * [1] = base * [2] = last (previous cumulative value) * [3] = this * [4] = cumulative * [5] = interval This allows sampling to proceed in a straightforward manner. Stat-ratio is the ratio between stat1 & stat2

22 Sample, Summary, Rate & Raw Data
BaseValue LastValue ThisValue SampleTime SummaryTime SampleRate = (ThisValue – LastValue) / SampleTime. SummaryRate = (ThisValue – BaseValue) / SummaryTime. SampleRaw = (ThisValue – LastValue) / 1. SummaryRaw = (ThisValue – BaseValue) / 1. “s” = sample “S” = Summary “r” = rate “R” = raw

23 Updating Data for each dictdb._Resrc no-lock: run update_xstat (
input _Resrc-Id, input _Resrc-name, input "", input "", input _Resrc-lock, input _Resrc-wait, input 0 ). end. ui-det(support, 1, i, 1, "xid", string(tt_xstat.xid, ">>9")). ui-det(support, 1, i, 2, "xname", string(tt_xstat.xname, "x(20)")). ui-det(support, 1, i, 5, "stat1", string((tt_xstat.stat1[x]/z), ">>>>>>>>>9")). ui-det(support, 1, i, 6, "stat2", string((tt_xstat.stat2[x]/z), ">>>>>>>>>9")). ui-det(support, 1, i, 8, "stat-ratio", string(tt_xstat.stat-ratio, ">>9.99%")). ui-det( support, 1, i, 1, "xid", string( tt_xstat.xid, ">>9" )). ui-det( support, 1, i, 2, "xname", string( tt_xstat.xname, "x(20)" )). ui-det( support, 1, i, 5, "stat1", string( ( tt_xstat.stat1[x] / z ), ">>>>>>>>>9" )). ui-det( support, 1, i, 6, "stat2", string( ( tt_xstat.stat2[x] / z ), ">>>>>>>>>9" )). ui-det( support, 1, i, 8, "stat-ratio", string( tt_xstat.stat-ratio, ">>9.99%" )).

24 Adding Help Help files are in the “hlp” directory.
File name is value(“hlp/” + support + “.hlp”) Title the screen. Provide an overview of the screen. Try to explain why the metrics are important and how they are related to other metrics. Define each label and give some insight into its meaning. Provide explanations of any codes that might appear under a label.

25 FileIO.hlp IO Operations to Database Extents Id: The extent id number.
Extent Name: The file name of the extent. Mode: The "mode" in which the file is opened. Possible values are: BUFIO The extent is opened for buffered IO. UNBUFIO The extent is opened for un-buffered IO. BOTHIO The extent is opened for both buffered and un-buffered IO. Variable extents are opened with BOTHIO (there are two file descriptors unless you're using -directio.) BlkSz: The Block size for the extent. This potentially varies between data, before-image and after-image extents. Values are expressed in bytes.

26 Making A Module Available
Drop it into the “mon/” directory. “mon/mymetric.p” If it is OS specific use the “os/” directory “os/AIX/df.p” “os/Linux/netstat.p” Send me a copy so that I can include it in the base distribution! If ProTop is already running restart! Lower-case “w”.

27 Agenda Why do you need a monitor? Monitoring Alternatives
What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features

28

29 Basic Capabilities Summary Data Blocked Clients & Open Transactions
Table & Index Activity User Activity Estimating Big B Latches & Resources Storage Area Capacity Balancing IO Clients & Servers

30 Summary Data 11:32:52 ProTop xvi -- Progress Database Monitor 07/05/05
Sample sports [/db/sports] Rate Hit Ratio: 182:1 195: Commits: Sessions: 2057 Miss% : % 0.512% Latch Waits: Local: 953 Hit% : 99.45% % Tot/Mod Bufs: Remote: 956 Log Reads: Evict Bufs: Batch: 1045 OS Reads: Lock Table: Server: 97 Rec Reads: LkHWM|OldTrx: :00 Other: 51 Log/Rec: Old/Curr BI: TRX: 26 Area Full: % After Image: Disabled Blocked: 0

31 BI Clusters for each _Trans no-lock where _Trans-usrnum <> ?:
if _Trans-counter <> ? and _Trans-counter > 0 then do: if oldbi = 0 or _Trans-counter < oldbi then oldbi = _Trans-counter. currbi = max( currbi, _Trans-counter ). end. find _BuffStatus no-lock. currbi = _BfStatus-LastCkpNum. if oldbi = 0 then oldbi = currbi. /* if no TRX is active… */ Showing the case where we do NOT know the base cluster#. /* Adjust to align with checkpoint numbers -- if we can. _Trans-counter is the * checkpoint# since the last "truncate bi". Unfortunately the only way to map * from that number to _BfStatus-LastCkpNum involves starting a transaction in * mon-init(). And that is optional since ProTop shouldn't require the user to * write to a monitored database. So if that option isn't enabled (chkp-base = ?) * we have to live with the fact that we can't map them. Which means that we're * showing 0 at times when the real active checkpoint is probably something else. * That's not as bad as it sounds though because it should only happen when there * are no active transactions which means that, while we don't know what the * actual current checkpoint # is, we also only have one active checkpoint so * there's nothing to be excited about in these two numbers. (The whole point * of this metric is to show growth in the number of active bi clusters.) * */

32 Blocked Sessions Blocked Sessions Usr Name Waiting Note
24 tom :00:32 REC XQH 102 [Order] julia, peter 22 tucker 00:00:02 REC XQH 201 [Cust] astro, tiger 321 julia 00:00:00 BKSH: : REC means record lock, Explain XQH The reasons that the session is blocked. The most frequent reason is a record lock. This will be displayed as "REC" followed by flags, the RECID of the record, the table name of the record and a list of users queued for that record (the list may be longer than can be shown on the screen.) Possible values of the flags are: S Share Lock X Exclsuive Lock U Upgraded Share Lock L Limbo Lock Q Queued Lock H Hold Flag Other blocked states (such as BKSH and RGET) may be shown but are relatively rare. The "Resource Waits" screen has additional data about these waits and the frequency of requests and waits. 102 = RECID of locked record Order = table name that we’re blocked on “Julia” is a comma delimited list of users queued for the record. 1st user is current holder – everyone is ahead of blocked user.

33 Locked Records for each _Lock no-lock while _Lock-usr <> ?:
if _Lock-recid = _Connect-wait1 then do: find _file where _file._file-num = _Lock-table. bxtbl = _file._file-name. end. if _Lock-usr = _Connect-usr then bxwait = bxwait + “ “ + _Lock-flags. else bxque = bxque + " " + _Lock-name. bxnote = bxtbl + bxwait + bxque.

34 Open Transactions Open Transactions
Usr Name TRX Num BI Clstr Start Trx Stat Duration Wait 9 tom :39:05 ACTIVE 00:00: 20 jami ALLOCATE 00:00: 5 emily :39:06 ACTIVE 00:00: 7 peter :39:06 ACTIVE 00:00: 23 julia ALLOCATE 00:00: 22 astro ALLOCATE 00:00: Allocated, Active, Dead, Committing, Preparing, Prepared (last 3 are 2pc related…) Open Transactions Help ====================== Open Transactions displays all transactions -- including those whose status is "Allocated". Allocated transactions may not turn into "real" transactions if no data is actually updated and are considered a nusiance by some. The following fields are displayed for each active transaction: Usr: The Usr# of the connection that owns the transaction. Name: The User Name associated with the connection. TRX Num: The unique id of this transaction. BI Clstr: The BI Cluster where this transaction has been opened. BI Clusters are numbered sequentially since the last "truncate bi" operation. If the difference betweenthe oldest BI cluster number and the current BI cluster number is large then you have active transations that span large periods of time -- this is bad in itself. You may also be experiencing unusual amounts of BI file growth since BI clusters cannot be reused while the still contain an active transaction. If there are no active transations the cluster numbers will show as 0. The numbers here do not directly correspond to checkpoint numbers by default. BI clusters numbers are stored as sequence since the last "truncate BI" was executed and can be aligned to checkpoint numbers but that requires a database transaction to accomplish. There is experimental code that can be enabled to do this in lib/protop.i but these values convey the important data well enough. Start: The time when this transaction became active. (It may be a different day! If so it isn't displayed.) Trx Stat: The status of this transaction. Duration: How long has this transaction been active? Wait: What, if anything, are we waiting for? Record locks are typical. They are indicated by "REC" followed by the RECID of the record. The "Blocked" screen gives more information about such a situation.

35 Table Activity Table Statistics
Tbl# Table Name Create Read Update Delete 4 OrderLine 18 Order 24 POLine 23 PurchaseOrder 21 Bin 2 Customer 1 Invoice

36 Index Activity Index Statistics
Idx# Index Name Create Read Split Delete BlkDel 904 usage 78 journal P 435 keyindex 388 icest PU 1251 keyindex 1247 warehs U 900 stuff PU

37 User IO Activity UIO Usr Name Flags PID DB Access OS Rd OS Wr Hit%
13 tom SB % 10 jami SB % 16 julia SB % 17 peter SB % 15 emily SB % 11 tiger SB* % 14 tucker SB % 19 granite SB % 7 astro SB %

38 Estimating Big B Big B GuessTimator
Pct Big B % db Size Hit:1 Miss% Hit% OS Rd 10% % % % 1343 25% % % % 849 50% % % % 601 100% % % % <= 150% % % % 347 200% % % % 300 400% % % % 213

39 Big B http://www.peg.com/lists/dba/history/200301/msg00509.html
MissPct = 100 * ( 1 – ( LogRd – OSRd ) / LogRd )). HitPct = 100 – MissPct. OSRd = LogRd * ( MissPct / 100 ). m2 = m1 * exp(( b1 / b2 ), 0.5 ).

40 Resource Waits Resource Waits Id Resource Locks Waits Lock%
10 DB Buf S Lock % 6 Record Get % 7 DB Buf Read % 2 Record Lock % 11 DB Buf X Lock % 19 TXE Share Lock % 8 DB Buf Write % 21 TXE Commit Lock % 1 Shared Memory % 3 Schema Lock %

41 Latch Waits Latch Waits Id Latch Requests Waits Lock%
28 MTL_BF % 17 MTL_BHT % 21 MTL_LRU % 10 MTL_LHT % 15 MTL_LKF % 26 MTL_BF % 27 MTL_BF % 25 MTL_BF % 4 MTL_OM %

42 Storage Area Capacity Area Statistics
A# Area Name Alloc Var Hi Water Free %Used Note 68 order_idx % i(3) 67 order % t(1) 6 Schema Area % i(25) * 3 BI Area % 13 customer % t(15) 92 After Image % Busy 49 order-line % t(1) 61 inventory % t(1) 55 discount % t(1) 57 employee % t(1) ** = tables/indexes in the schema area! t(##), i(##)

43 Storage Area Capacity for each _AreaStatus no-lock, _Area no-lock where _Area._Area-num = _AreaStatus._AreaStatus-Areanum: bfree = _AreaStatus-Totblocks - _AreaStatus-Hiwater. if ( _AreaStatus-Freenum <> ? ) then bfree = bfree + _AreaStatus-Freenum. if bfree = ? then bfree = _AreaStatus-totblocks. used = (( _AreaStatus-totblocks - bfree) / _AreaStatus-totblocks ) * 100. end.

44 Storage Area Contents for each _storageobject no-lock where
_storageobject._area-number = xid and _storageobject._object-num > 0 and _storageobject._object-associate > 0: if _storageobject._object-type = 1 then so_tbl = so_tbl + 1. else if _storageobject._object-type = 2 then so_idx = so_idx + 1. end. /* ianum = initial area number… */

45 Balancing IO Database File IO Id Ext Name Mode Blksz Size Read Wrt Ext
63 s2k_29.d1 F UNBUF 64 s2k_29.d2 F UNBUF 124 s2k_55.d2 F UNBUF 125 s2k_55.d3 F UNBUF 123 s2k_55.d1 F UNBUF 67 s2k_30.d1 F UNBUF 57 s2k_26.d1 F UNBUF 128 s2k_56.d1 F UNBUF 135 s2k_57.d6 F UNBUF 140 s2k_58.d2 F UNBUF 121 s2k_54.d1 F UNBUF 139 s2k_58.d1 F UNBUF 134 s2k_57.d5 F UNBUF 69 s2k_31.d1 F UNBUF 73 s2k_33.d1 F UNBUF 3 s2k.b2 V UNBUF Note the modes

46 Servers and Clients Servers
Srv Type Port Con Max MRecv MSent RRecv RSent QSent Slice 1 Login 2 Auto 3 Auto Server IO Srv Type Port Con Max DB Access OS Rd OS Wr Hit% 19 Auto % 20 Auto % 18 Auto % 16 Auto %

47 Agenda Why do you need a monitor? Monitoring Alternatives
What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features

48 Drill Down User Details
Usr#: Name: tom PID: Device: /dev/pts/3 Transaction: Jul 7 15:20: ACTIVE 00:00:45 REC 5892 Blocked On: REC XQH 5892 [Customer] peter User 23's Other Sessions Usr Name Flags PID DB Access OS Rd OS Wr Hit% 23 tom S * % 0 tom O % 22 tom S % 24 tom S % Per user table & index stats! 4gl procedure stack! find _myconnection no-lock. find _connect no-lock where _connect-usr = _myconn-userid. display _connect-usr _connect-id _myconn-userid. find _userio no-lock where _userio._userio-usr = _connect-usr. display _userio-id _userio-usr. User-Id _Connect-Id MyConn-UserId _UserIO-Id Usr

49 ProTop Alerts

50 Alerts & Alarms # $PROTOP/etc/alert.cfg #
# Metric Type ? Target Message Action # ========= ==== == ====== =========== ==================== LogRd num > "&1 &2 &3" alert-log OSRd num > "&1 &2 &3" alert-log BufFlsh num > "&1 &2 &3" alert-log,alert-mail Trx num > "&1 &2 &3" alert-log,alert-mail LatchTMO num > "&1 &2 &3" alert-log,alert-mail ResrcWt num > "&1 &2 &3" alert-log,alert-mail

51 Summary Reasons to monitor.
Some tools that are available for monitoring. How Progress VSTs work. An architecture for monitoring. How to modify and extend ProTop. What ProTop can do for you “out of the box”. What is “under the covers” of ProTop. How to use VSTs more effectively. The Stuff That We’ve Learned Today!

52 ? Questions

53 Thank you for your time. tom@greenfieldtech. com http://www


Download ppt "Database Monitoring With"

Similar presentations


Ads by Google