Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Gold from the RMF Data Mountain Ivan L. Gelb Gelb Information Systems Corp. Phone: 732-303-1333 CMG ‘007 – San Diego, CA.

Similar presentations


Presentation on theme: "Mining Gold from the RMF Data Mountain Ivan L. Gelb Gelb Information Systems Corp. Phone: 732-303-1333 CMG ‘007 – San Diego, CA."— Presentation transcript:

1 Mining Gold from the RMF Data Mountain Ivan L. Gelb Gelb Information Systems Corp. Phone: CMG ‘007 – San Diego, CA

2 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information TRADEMARKS  The following are trade or service marks of the IBM Corporation: CICS, CICS TS, CICSPlex, DB2, IBM, MVS, OS/390, z/OS, Sysplex, Parallel Sysplex. Any omissions are purely unintended.  © 2007 Gelb Information Systems Corp. URL: Phone: No part of this material can be reproduced by any means without prior written permission from the author and with proper attribution displayed.

3 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information MOTHER OF ALL DISCLAIMERS (MOAD )  All of the information in this document is tried and true. However, this fact alone cannot guarantee that you can get the same results at your place and with your skills. In fact, some of this advice can be hurtful if it is misused and misunderstood. As with all kinds of analysis, anything you may hear or read can be understood and misunderstood in many ways that may seem contradictory to you. Gelb Information Systems Corporation, Ivan Gelb and any one found anywhere assume no responsibility for this information’s accuracy, completeness or suitability for any purpose. Anyone attempting to adapt these techniques to their own environments anywhere do so completely at their own risk.

4 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Agenda  Your Questions…Now  SMF & RMF Introduction  RMF Reports Overview  CPU Reports  LPAR Reports  5 More Reports  Drawing for attendee prizes Note:  symbol flags recommendations PLUS: Rewards for most questions? ?

5 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information SMF & RMF Introduction  SMF & RMF Data Collection  RMF Record Types  RMF Reports Overview  RMF Report Types  Monitor I Reports  Monitor II Reports  Monitor III Reports

6 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information SMF & RMF Data Collection - 1  ERBRMF00 or 02 member for Monitor I options. Examples:  CYCLE(250)/* Sample every 250 msec.  SYNCH(SMF)/* Use SMFPRMxx time values  SMFPRMxx member for SMF options. Examples:  INTVAL(mm)/* recording interval (30)  SYNCVAL(mm)/* recording synchronization (00)  INTERVAL(hhmmss)/* NOINTERVAL is default for SMF 30s  SMF,SYNC/*type 30s sync-d based on SYNCVAL

7 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information SMF & RMF Data Collection - 2   Processor overhead for record collection increases as CYCLE value is decreased.   Shorter INTVAL produces more SMF and RMF records and higher collection related overhead   Recommended service definition coefficients:  MSO = 0.0  CPU = 1.0  SRB = 1.0  IOC = 1.0 or less by orders of 10 (0.1 or 0.01; IBM recommends 0.5)  Note potential impact on chargeback algorithms if they use service units in their calculations.

8 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information RMF Record Types Summary  70-1Processor  70-2Crypto processor  71Paging activity  72-1Workload PGN-s (compat. mode)  72-3Workload service classes (goal mode)  73Channel path activity  74-1Device activity  74-2XCF activity  74-5Cache activity  74-7FICON director activity  75Paging activity  77Enqueue activity  78-2Virtual storage activity

9 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information RMF Report Types  Monitor I – 20+ real-time reports and long-term data collection  Monitor II – 20+ activity snapshot reports  Monitor III – 50+ interactive performance analysis reports and long-term data collection  Other RMF data based reporting tools (downloads):   Spreadsheet reporter   RMF PA (Performance Analyzer)

10 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Monitor I Reports  CACHE – Cache Subsystem  CF – Coupling Facility Activity  CHAN – Channel Path Activity  CPU – CPU Activity  CRYPTO – Crypto Hardware Activity  DEVICE – Device Activity  DOMINO – Lotus Domino Server  ENQ – Enqueue Activity  FCD – FICON Director Activity  HFS – Hierarchical File System  HTTP – HTTP Server  IOQ – IO Queuing Activity  OMVS – OMVS Kernel Activity  PAGESP – Page/Swap Data Set Activity  PAGING – Paging Activity  SDEVICE – Shared Device Activity  TRACE – Trace Activity  VSTOR – Virtual Storage Activity  WKLD – Workload Activity (compat mode)  WLMGL – Workload Activity (goal mode)  XCF – Cross-system Coupling Activity Monitor I can produce these reports at the end of each collection interval, or they can be produced by the Postprocessor component at a later time.

11 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Monitor II Reports  ARD / ARDJ – Address Space Resource Data  ASD / ASDJ – Address Space Data  ASRM / ASRMJ – Address Space SRM Data  CHANNEL – Channel Path Activity  DDMN – Domain Activity  DEV / DEVV Device Activity  HFS – Hierarchical File System  ILOCK – IRLM Long Lock Detect  IOQUEUE – IO Queuing Activity  LLI - Library List  PGSP – Page/Swap Data Set Activity  SDS – Sysplex Data Server  SENQ – Systems ENQ Contention  SENQR – System ENQ Reserve  SPAG – Paging Activity  SRCS – Central Storage / Processor / SRM  TRX – Transaction Activity Monitor II can produce snapshot reports on demand or at definable intervals.

12 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Monitor III Reports Monitor III can produce Sysplex wide or for a single system reports of the delays experienced by a job, group of jobs, service class, TSO, OMVS, enclaves, etc…. We will present just 7 of more than 50 available reports: 1.Delay Report 2.Processor Delays 3.CF Overview 4.CF Systems 5.Device Delays 6.VSAM LRU Overview 7.VSAM RLS Activity by Storage Class and by Data Set

13 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Who, What, How Much, & Analysis  RMF Delay Report  CPU Activity Report & Analysis  LPAR Activity Report & Analysis  CF Activity Report and Analysis  Workload Activity Report & Analysis  I/O Device Activity Report & Analysis  File I/O Activity Report & Analysis

14 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information M3- Which Resources Cause Delays

15 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information  Use to quickly establish which system resources are delaying the work. “% Delayed for” indicators are:  PRC = in/ready but work not being dispatched on CPU  DEV = delayed for disk or tape  STR = delayed for storage liked COMM, LOCAL, SWAP, XMEM, or found on OUT & READY queue  SUB = delayed by JES, HSM, XCF  OPR = delayed by operator message, or mount request, or quiesce command by operator  ENQ = delayed waiting for any enqueued resource Address space type column – CX: A = ASCHO = as second char. Indicates OMVS process for this task B = BatchS = Started Task E = EnclaveT = TSO O = OMVS? = invalid/missing data Cr column indicates CPU critical or Storage critical attribute for this address space Which System Resources Cause Delays… NOTES

16 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information CPU, LPAR & CF Activity Reports  CPU Activity Reports  Processor Delays  What is Your LPAR’s Guaranteed Capacity?  LPAR Partition Data Report  Coupling Facility Activity (CF) Report

17 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information PP- CPU Activity Report - Part 1

18 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information PP- CPU Activity Report - Part 2

19 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information  Provides 100% accurate CPU utilization figures for all LPAR-s and each LPAR individually. Use it in conjunction with workload activity measurements to establish CPU utilization capture ratios. Observe and consider: 1.ONLINE TIME – less than 100% indicated CPU being varied on- or offline. IRD or manual process may cause this. 2.LPAR BUSY % - what % of each allocated CPU this LPAR utilized. Less than 100% indicates possible capacity issues. 3.MVS BUSY % - LPAR’s % CPU utilization. 100% should cause performance and capacity concerns if (a) anyone complains, and (b) critical workloads + SYSTEM make up 90-95%+ of the utilization 4.QUEUE LENGTHS (%) – indicates how many others you may have to wait behind for CPU access 5.IN READY - address spaces ready to run but CPU not available 6.OUT READY – even worst than IN READY if the OUT-s are workloads you care about. See workload activity reports to determine the victims CPU Activity Report…. NOTES

20 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Monitor III (M3) Processor Delays - 1  

21 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Processor delays report identifies who is delayed and by ABOUT how much. 1.DLY % = (# of Delay Samples / # of Samples) * 100 is % of time task is delayed from getting CPU time 2.USG % = (# Using Samples / # Samples ) * 100 is % of time the task is receiving CPU service 3.Holding Job(s) – up to three tasks that most contributed to delay Note that delays are collected via statistical sampling! MVS reduced preemption approach, the cause of always present CPU delay Monitor III Processor Delays NOTES

22 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information  What is Your LPAR’s Guaranteed Capacity?  LPAR’s share is determinant of physical CP capacity  LPAR weights & # logical CPUs determine share Share = LCPU/Tot-PCPU * LPAR weight / ∑ LPAR weights  Example: If two LPARS, PRODA 700 weight and PRODB weight 300, with access to the total of 10 physical CPs each:  PRODA Capacity = 10/10 * 700/1000 = 7.0 CPs  PRODB Capacity = 10/10 * 300/1000 = 3.0 CPs  LPAR weights are ONLY enforced if Physical CP BUSY = 100% or if LPAR is capped by PR/SM  If PRODA only utilizes 2.0 CPs most of the time, PRODB could get the other 8.0 CPs if it needs them! When PRODA gets busy using its maximum share, PRODB will be  !

23 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information PP- LPAR Partition Data Report

24 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Partition Data Report is from the RMF post processor. This is the most useful single place where we can see defined and actual LPAR capacity reporting. 1.WGT – LPAR’s weight/Total defined weight is the % SHARE this LPAR will be dispatched by PRSM if it needs CPU service 2.MSU DEF and ACT – defined and actual LPAR MSUs 3.CAPPING DEF – partition’s capping option 4.CAPPING WLM% - % of time WLM capped this LPAR 5.LPAR MGT – LPAR management overhead Type = AAP for zAAP-s processors Type = IIP for zIIP-s processors LPAR Partition Data Report… NOTES

25 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information CF Activity Reports and Analysis  Data collection controlled by ERBRMFxx option of CFDETAIL or NOCFDETAIL  CFDETAIL collects a lot of SMF data!  To reduce system overhead, data collection is done only on one member of Sysplex as decided automatically by RMF Sysplex Data Server

26 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information M3- CF Activity - 1  If PROCESSOR UTIL% is high (95%+???):  Under PR/SM, dedicate CPs or add CPs to partition  Rebalance by moving structures to lower utilized CF if available  Buy more or faster CFs

27 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information M3- CF Activity - 2  AVG SERV in microseconds! Do compare Async. Serv. to Disk Serv.!   CHNG% percent of requests changed from sync to asynch   DEL% percent of requests delayed by subchannel contention or dump serialization

28 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Workload Activity Reports & Analysis  RMF Workload Measurements  RMF Workload Activity – 1 CICS Service  RMF Workload Activity – 2 TSO Service  RMF Workload Activity Report Analysis

29 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Source: Chris Baker, IBM RMF Workload Measurements

30 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Source: Chris Baker, IBM PP- RMF Workload Activity - 1 CICS REPORT BY: POLICY=HPTSPOL1 WORKLOAD=PRODWKLD SERVICE CLASS=CICSHR RESOURCE GROUP=*NONE PERIOD=1 IMPORTANCE=HIGH -TRANSACTIONS-- TRANSACTION TIME HHH.MM.SS.TTT AVG 0.00 ACTUAL MPL 0.00 QUEUED ENDED 216 EXECUTION END/SEC 0.24 STANDARD DEVIATION #SWAPS 0 EXECUTD RESPONSE TIME BREAKDOWN IN PERCENTAGE STATE SUB P TOTAL ACTIVE READY IDLE WAITING FOR SWITCHED TIME (%) TYPE LOCK I/O CONV DIST LOCAL SYSPL REMOT TIMER PROD MISC LOCAL SYSPL REMOT CICS BTE CICS EXE This is a sample RMF post processor (ERBRMFPP) output with option SYSRPTS(WLMGL(SCPER)) PP = RMF Post processor Report

31 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information PP- RMF Workload Activity - 2 TSO Part

32 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information PP- RMF Workload Activity – 2 TSO Part 2

33 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information RMF Workload Activity – 2…. NOTES 1.CPU and STORAGE Service class attributes 2.TRANSACTIONS - Number of transactions and related statistics 3.TRANS. TIME – various transaction time measures 4.DASD I/O – rate and response time components 5.SERVICE RATES 6.  PAGE-IN RATES monitor them carefully 7.  MSO coefficient should be 0! Other values will produce unstable performance on zSeries processors! 8.  APPL% can be greater than 100%. If a single CICS region is in report, it can track CICS TCB saturation risk. APPL% includes AAPCP and IIPCP time! 9.AAPCP & IIPCP are time zAAP and zIIP eligible work spent on standard CPs. 10.PROJECTCPU option in SYS1. PARMLIB member IEAOPTxx needed for AAPCP and IIPCP 11.APPL% calculation: IIT= I/O interpt. HST= Hiperspace RCT= Region Ctl.

34 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information RMF Workload Activity Report Analysis  Response time distribution report is best and usually least overhead causing source for design of repose time goals.  Workload activity response time distribution report can be produced for a variety of report classes in support of service policy development activities.   Quick and low overhead source of service and utilization data.   Watch out for “funny” samples in STATE SAMPLE BREAKDOWN (%) – WAITING FOR. Each state sample category’s value, except OTHR, is based on the last 14 non- zero values.

35 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information I/O Activity Reports & Analysis  Device Activity Components  I/O Device Activity - 1  I/O Device Activity – 2  Device Delays  Device Activity Tuning  VSAM File I/O Activity

36 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information  Device Activity Components  CONN = due to data transfer time  DISC = time disconnected from channel that consists of SEEK and SET SECTOR, Latency (wait for record to be under head), RPS (obsolete with ESS – Sharks)  PEND = I/O delays in access path. May include delays caused by channel, control unit, director port delay. Often caused by shared DASD!  IOSQ = wait for another task on the same system to finish using this device.  What I/O response time is too high? WARNING: this is a trick question.  Analyze response time components to decide what to do.

37 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information I/O Device Activity (RMF PP Report) ⑥ ⑤ ④ ③ ② ① ⑧ ⑦ ⑪ ⑨ ⑩

38 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information M3- Device Delays   

39 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information  Device Activity Tuning - 1  I/O priority ON (check for APAR OW47667)  CONN = due to data transfer time  DISC, IOSQ, PEND are I/O delays  Enable Parallel Access Volumes (PAV) to reduce / eliminate IOSQ  Manage static PAVs to minimize IOSQ  Manage number of dynamic PAVs via policy to minimize IOSQ  ESS (Shark) multiple allegiance support reduces contention reported as PEND time.  Track cache performance and manage it as needed

40 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information  Device Activity Tuning - 2  DISC > 2-5 msec with cache may indicate problem(s).  Not enough Non-Volatile Storage (NVS) or NVS get filled.  Poor cache hit ratio on IBM ESS.  High physical disk utilization. May need to move data to balance the activity between available resources.  Very high disk to cache transfer activity rate.  DISC > 13 msec may indicate RPS misses due to path contention. This should not occur on IBM ESS.  If %DEV UTIL > 35%, work to reduce activity rate on device:  Balance activity better across available resources  Isolate or Do not cache files and volumes that are BAD cache candidates  Tune based on analysis of caching activity

41 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information M3- File I/O Tuning – VSAM LRU - 1  Buffer goal limit defaults to 100 MB; can be 1.5 GB max; see IGDSMxx in your PARMLIB for details   “Accel %” when LRU aging algorithms were accelerated;  “Reclaim %” when aging algorithms were to reclaim buffers “Read BMF%” data found in local buffers “Read CF%” data found in Coupling Facility (CF) cache  “Read DASD%” data read from DASD  Monitor average CPU time used by BMF LRU

42 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information M3- File I/O Tuning – VSAM RLS - 1 VSAM RLS activity by data set. Also available by Storage Class.

43 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information File I/O Tuning – VSAM RLS…NOTES  “LRU Status” status of local buffers under Buffer Manager Facility (BMF) control  GOOD = BMF at or below goal   ACCELERATED = buffer aging algorithms accelerated because BMF is over goal   RECLAIMED = buffer aging bypassed accelerated because BMF is over goal  “BMF Valid %” percent of BMF reads that were valid NOTE: BMF read hits is sum of valid and invalid hits. Buffers can be invalid because (A) data altered, or (B) CF lost track of buffer status  BMF READ HIT% = BMF READ% / BMF VALID% * 100  BMF INVALID READ HIT% = BMF READ HIT% - BMF READ%

44 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Summary  We examined just 7 main types of reports out of the 90+ available from RMF real-time or via post-processor. They are:  RMF Delay Report  CPU Activity Report  LPAR Activity Report  CF Activity Report  Workload Activity Report  I/O Device Activity Report  VSAM File I/O Activity Report  With practice, you should be able to find the “gold” and solve performance “mysteries” by looking at just 1 – 3 RMF reports.

45 © 2007 Gelb Information Systems Corp Think Faster with Gelb Information Need/Want to Know More…and  Start at  Documentation:  SC RMF Report Analysis  SC RMF Performance Management Guide  SA z/OS MVS Planning: Workload Management  RMF Newsletters  IBM and SHARE presentations  – Computer Measurement Group  Large Systems Performance Reference:


Download ppt "Mining Gold from the RMF Data Mountain Ivan L. Gelb Gelb Information Systems Corp. Phone: 732-303-1333 CMG ‘007 – San Diego, CA."

Similar presentations


Ads by Google