Divide and Conquer: Dealing with 140GB of SMF Data Daily Chuck Hopf Merrill Consultants

Slides:



Advertisements
Similar presentations
Building an MXG PDB from NTSMF Data
Advertisements

Turing Machines January 2003 Part 2:. 2 TM Recap We have seen how an abstract TM can be built to implement any computable algorithm TM has components:
Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Lecture 12 Reduce Miss Penalty and Hit Time
Performance of Cache Memory
Video Rental Store M.S. Access Module CAS 133 Basic Computer Skills/MS Office Russ Erdman.
Software Performance Engineering - SPE HW - Answers Steve Chenoweth CSSE 375, Rose-Hulman Tues, Oct 23, 2007.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Performance Evaluation
Chapter 1 and 2 Computer System and Operating System Overview
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
1.A file is organized logically as a sequence of records. 2. These records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating.
“SAS macros are just text substitution!” “ARRRRGGHHH!!!”
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Redundant Array of Independent Disks
To Compress or not to Compress? Chuck Hopf. What is your precious? Gollum says every data center has something that is precious or hard to come by –CPU.
Offline Performance Monitoring for Linux Abhishek Shukla.
MXG - Simplified (MXG for Dummies) Chuck Hopf Merrill Consultants.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Copyright 2001 Oxford Consulting, Ltd1 January Storage Classes, Scope and Linkage Overview Focus is on the structure of a C++ program with –Multiple.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
© Copyright 2014 BMC Software, Inc.1 — Chief Architect for DB2, BMC IBM Information Champion 2015 May / 2015 Jim Dee Stop Wasting Time With Utilities!
JCL Day 2. 2 Copyright © 2005, Infosys Technologies Ltd ER/CORP/CRS/OS02/003 Version No: 1.0 Agenda for Day 2  DD statement  Syntax  Parameters  Procedures.
Distributed Backup And Disaster Recovery for AFS A work in progress Steve Simmons Dan Hyde University.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Chapter 5 Files and Exceptions I. "The Practice of Computing Using Python", Punch & Enbody, Copyright © 2013 Pearson Education, Inc. What is a file? A.
13-1 Sequential File Processing Chapter Chapter Contents Overview of Sequential File Processing Sequential File Updating - Creating a New Master.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
CS333 Intro to Operating Systems Jonathan Walpole.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
13- 1 Chapter 13.  Overview of Sequential File Processing  Sequential File Updating - Creating a New Master File  Validity Checking in Update Procedures.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Efficient SAS programming with Large Data Aidan McDermott Computing Group, March 2007.
Lecture 12 Page 1 CS 111 Online Using Devices and Their Drivers Practical use issues Achieving good performance in driver use.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
Sequential Processing to Update a File Please use speaker notes for additional information!
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Moving MXG off the Mainframe. Why move MXG? n If MXG is the only SAS usage on the mainframe the SAS license cost on a per user basis can be very high.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
JPMorganChase1 Care and Feeding of SMF in a Large System Environment Joe Babcock.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
Memory Management.
Automation in IMS Can it help the shrinking talent pool
Operating Systems Chapter 5: Input/Output Management
Dataset Explorer Training
Please use speaker notes for additional information!
Database Systems (資料庫系統)
Presentation transcript:

Divide and Conquer: Dealing with 140GB of SMF Data Daily Chuck Hopf Merrill Consultants

Gathering Data - Steps n Determining what data is available n Determining what data is needed n Determining data retention requirements n Gathering the data n Dealing with increasing volumes

Data Availability n Different OS’s have different sources n For MVS – SMF – RMF – Vendor SMF records – Other software logs/sources

SMF/RMF n Job accounting n System utilization n CICS Transactions n CICS Statistics n DB2 Accounting n DB2 Statistics n Dataset activity

Vendor SMF Records n NETSPY n Stop X37 n MXG Tape Mount Monitor n IND$FILE n TPX n HSM n many many more

Other Data Sources n DCOLLECT - DASD space utilization n CA1 - Tape utilization n TMON - alternative CICS/DB2 transaction data source n SYSLOG - console commands n many more

Data Requirements n Not all data is always useful n Largely driven by reporting requirements n But… some can be discarded outright and others cannot – type are obsolete – other types are not generally useful – for our purposes, SMF types and 69 are suppressed

Data Rentention n How long do you keep the raw data? – Ask your internal auditors – At least partially driven by volume n at 120GB/day, keeping CICS detail for very long is impractical and DB2 is not far behind n Third largest volume is type 74 RMF data n Fourth is type 42

Gathering the Data n SMF records to a SYS1.xxxxxx dataset n When the dataset is full, it is automatically switched to a non-full dataset and (if the IEFU29 exit is coded) a dump/clear process is started

Gathering the Data n Best case - a single MAN dataset lasts all day – Dump and clear after a switch at ‘midnight’ n Worst case - a MAN dataset fills every minutes or less – Dump and clear frequently and consolidate after ‘midnight’

Gathering the Data n ‘Midnight’ may not be midnight. It may depend on a batch cycle and the availability of storage devices (tape drives) n The process is driven by the volume of data

Flies in the Ointment n Lost data can occur – When buffer expansions occurs – When MANx dataset fills during an interval ‘pop’ – When something slows down the IO to the MANx volume

Flies in the Ointment n CICS and DB2 may complain about SMF not being available n Can result in CICS slowdowns and response time problems

Flies in the Ointment n At 10MB/second, it takes about 270 seconds to fill a n At 10MB/second, it takes about 600 seconds to dump and clear a volume n We are running at about seconds at peak loads to fill a volume n What happens if volumes double? n Can the SMF writer handle the load?

Fly Swatters n Must have at least two SYS1.Manx datasets and three is better – As volumes increase more may be needed n Make each SYS1.MANx dataset a full volume – Eliminates IOSQ delays – Eliminates device pending delays – Prevents problems

Fly Swatters n Add an auto operations rule: – 5 minutes before the intervals are due to ‘pop’ issue a D SMF command and if the active MAN volume is more than 85% full, issue an I SMF command. – Eliminates filling the MAN volume during the interval processing in all but some extreme (and bizarre) cases

Fly Swatters n Use Maximum Values for Buffer sizes n Tune CI Size on MAN datasets to match your workload n APARs are in progress to change the buffer acquisition algorithms

Fly Swatters n Turn off CACHE in RMF on all but a single system. It will not only reduce the volume of data but will reduce the CPU time consumed by RMF. Do the same with RMF III (it is a BIG CPU reduction with RMF III.)

The Flies Will Win n Unless changes are made in the way the SMF writer works, we are going to break it in the near term (1-2 years?) It will simply not be able to keep up with the arrival rate of data.

Gathering the Data - Simple Case MAN dataset IFASMFDP Dump and Clear Daily SMF

Dealing with Volume n Treat it as you would any other application n Parallelism is the answer as volumes increase

Dealing with Volume n Split the data using IFASMFDP into chunks of a manageable size n No arcane exits just simple control statements – OUTDD(CICS,TYPE(110)) – OUTDD(DB2,TYPE(100:102)) – OUTDD(SMFDUMP,NOTYPE(100:102,110))

Dealing with Volume MAN Dataset IFASMFDP Dump and Clear Daily CMF Data Daily DB2 Data Daily SMF Data Less Simple Case

Dealing with Volume n As volumes grows, the structure can grow with it - up to a point

Dealing with Volume MAN Dataset IFASMFDP Other SMF DB2 Data CICS Data IFASMFDP Daily CICS Data Daily DB2 Data Daily SMF Data Complex Case IFASMFDP

Dealing with Volume n At some point, it gets too huge to process in a single piece. n Time to use the IFASMFDP exits n Break CICS/DB2 up by APPLID/SUBSYSTEM

Dealing with Volume MAN dataset IFASMFDP Other SMF DB2 CICS3 CICS2 CICS1

Dealing with Volume n SMF data can be piped with batch pipes – Must be VB NOT VBS and it must be set to VB at the initial dump of the MAN volume – Best not to pipe when dumping MAN volumes n The dump runs at the speed of the downstream process which may cause problems (dump/clear process running slower than next MAN volume fills.)

Piping SMF n If you intend to pipe SMF data, you would be wise to put in an IFASMFDP exit to catch any records larger than and either route them to a separate VBS file or discard them. There are not supposed to be any but there are (type 8 and type APARs pending.)

Building the PDB n Before you start – Customizing – Retention – What cycles to run

Customizing n Define workloads n Define shifts n Define accounting fields n Define user records to be added n Define variables to be kept n To compress or to not compress - that is a question

Defining Workloads n Used to be limited to 15 workloads new limit is 114 (but that does not mean you should have more than 20 becomes cumbersome.) n Was restricted to control performance groups may now use report performance groups (be careful not to double dip.)

Defining Workloads n Two members affected – IMACWORK - old method – RMFINTRV - new method

Defining Workloads - IMACWORK n Simple IF THEN ELSE logic – IF PEFGRP=2 THEN WORK=‘TSO’; – IF SRVCLASS=‘TSO’ then WORK=‘TSO’;

Defining Workloads - RMFINTRV n Workloads defined using a parameter passed to a MACRO n Each workload (WORK1-WORK99) parameter has five components separated with a / – Name character description – 9 characters to be used in label describing workload – list of performance groups to include in the workload – list of service classes to include in the workload – number of periods in perfgrp/service class

Defining Workloads - RMFINTRV n Performance groups and service classes can be mixed n Report and control groups can be mixed – But… it is not a good idea. If you are going to use report groups use NOTHING but report groups

Defining Workloads - RMFINTRV %VMXGRMFI( …. WORK1=BATT/Test Jobs/1 3/BATCHLO, WORK2=TSPD/Dev TSO/2/DEVTSO/2, WORK3=DB2A/DB2A/ /DB2A, WORK4=DB2B/DB2B/5 … );

Defining Shifts n How are shifts important in your world? – Batch cycle vs online – Operations shifts – Weekends – Holidays

Defining Shifts - Holidays

n Holidays usually look like a weekend day n Can distort weekly data and plans n Should you or should you not exclude them n Let the best technicians in your organization (the managers) make the call

Defining Shifts - IMACSHFT n More simple IF THEN ELSE logic IF 8 LE HOUR(TIMEPART(DATETIME)) LE 16 AND 2 LE WEEKDAY(DATEPART(DATETIME) LE 6 THEN DO: SHIFT=‘P’; DATETIME=DHMS(DATEPART(DATETIME),8,0,0); END; ELSE IF...

Defining Shifts - $SHIFT n The first question when you present a report will be ‘What does this P mean under SHIFT?’ It will not matter how many other reports show the same thing or how many times you explain it. n A user format solves the problem.

Defining Shifts - $SHIFT PROC FORMAT LIB=LIBRARY; VALUE $SHIFT ‘P’=‘First ’ ‘S’=‘Second’ ‘T’=‘Third ’ other=‘Weekend/Holiday’;

Defining Accounting Fields n How many do you really really need? n More than 2 or 3 will start to be redundant n Limit the size and number

Defining Accounting Fields - IMACACCT n Assume 3 fields each 5 bytes long DROP ACCOUNT4-ACCOUNT9 SACCT4-SACCT9 LENACCT4-LENACCT9 ; LENGTH ACCOUNT4-ACCOUNT9 SACCT1-SACCT9 $ 5 ;

Define User Records n Originally required multiple exits n Now done ‘instream’ in BUILDPDB process n Still invokes the original exits – EXPDBINC - include the source code – EXPDBVAR - build the datasets/variables – EXPDBCDE - read the data – EXPDBOUT - sort the data into the PDB

Define User Records n Each user record must have an SMF ID defined n These should be stored in IMACKEEP MACRO _IDTPX 205 % /* TPX */ MACRO _NSPYID 132 % /* NETSPY */ MACRO _SYNCID 208 % /* SYNCSORT */ etc...

Define User Records n Modify SYSIN %LET EPDBINC=%QUOTE( VMACNSPY VMACTPX VMACSYNC... ); %LET EPDBVAR=%QUOTE( _VARNSPY _VARTPX _VARSYNC... );

Define User Records %LET EPDBCDE=%QUOTE( _CDESPY _CDETPX _CDESYNC... ); %LET EPDBOUT=%QUOTE( _SNSPY _STPX _SSYNC... );

Defining Kept Variables n Can be done using the ‘MACKEEP’ macro variable to redefine the _Vdddddd macro for a dataset or by using the _Kdddddd macro for the dataset.

Defining Kept Variables n Using MACKEEP - drop ZDATE UNITADR UCBTYPE %LET MACKEEP=%QUOTE( MACRO _VTY21 _WTY21 /* TYPE21 */ (LABEL='TY21: TYPE 21 SMF - TAPE ERROR STATS' KEEP=BLKSIZE BYTEREAD BYTEWRIT CLEAN DCBOFLG DENSITY DEVICE DEVNR ERASE ERRORS LCU NOISE OPEN PERMREAD PERMWRIT SIOCOUNT SMFTIME SYSTEM TAPCUSER TEMPREAD TEMPRBER TEMPRFER TEMPWRER TEMPWRIT VOLSER ) % );

Defining Kept Variables n Using _Kdddddd - drop ZDATE UNITADR UCBTYPE %LET MACKEEP=%QUOTE( MACRO _KTY21 DROP=ZDATE UNITADR UCBTYPE % );

Defining Kept Variables n Using MACKEEP - ADD variable DRIVE %LET MACKEEP=%QUOTE( MACRO _VTY21 _WTY21 /* TYPE21 */ (LABEL='TY21: TYPE 21 SMF - TAPE ERROR STATS' KEEP=BLKSIZE BYTEREAD BYTEWRIT CLEAN DCBOFLG DENSITY DEVICE DEVNR ERASE ERRORS LCU NOISE OPEN PERMREAD PERMWRIT SIOCOUNT SMFTIME SYSTEM TAPCUSER TEMPREAD TEMPRBER TEMPRFER TEMPWRER TEMPWRIT UCBTYPE UNITADR VOLSER ZDATE DRIVE ) % );

Defining Kept Variables n Using _Kdddddd - add variable DRIVE %LET MACKEEP=%QUOTE( MACRO _KTY21 DRIVE % );

Defining Kept Variables n Using MACKEEP - ADD variable DRIVE %LET MACKEEP=%QUOTE( MACRO _VTY21 _WTY21 /* TYPE21 */ (LABEL='TY21: TYPE 21 SMF - TAPE ERROR STATS' KEEP=BLKSIZE BYTEREAD BYTEWRIT CLEAN DCBOFLG DENSITY DEVICE DEVNR ERASE ERRORS LCU NOISE OPEN PERMREAD PERMWRIT SIOCOUNT SMFTIME SYSTEM TAPCUSER TEMPREAD TEMPRBER TEMPRFER TEMPWRER TEMPWRIT UNITADR VOLSER ZDATE DRIVE ) % );

Defining Kept Variables n Using _Kdddddd - add variable DRIVE and drop variable UCBTYPE %LET MACKEEP=%QUOTE( MACRO _KTY21 DRIVE DROP=UCBTYPE % );

Which Technique is Correct? n Both – First listing all variables makes it more clear – Second is a lot less typing

Confused? n Use UTILBLDP – based on parameters you pass, constructs the SYSIN for a tailored BUILDPDB

Reducing Confusion n Assume: – No CICS processing – No DB2 processing – No TYPE74 processing – Add SYNCSORT records as ID=200 – Add TSO/MON records as ID=205/206

Reducing Confusion %UTILBLDP( SUPPRESS=110 DB2 74, SPINCNT=7, USERADD=SYNC/200 TSOM/205, TIMEDIF=0, BUILDPDB=YES, OUTFILE=MYBUILD );

Reducing Confusion /**********************************************************/ /* COPYRITE 1999 MERRILL CONSULTANTS DALLAS TX USA */ /* ARTIFICIALLY CONSTRUCTED SYSIN FOR MXG */ /* SMF PROCESSING. BUILT USING UTILBLDP. */ %LET MACKEEP=%QUOTE( MACRO _SPINCNT 7 % /* SPIN COUNTER */ MACRO _SPINUOW 0 % /* UOW SPIN COUNTER */ MACRO _TIMEDIF 0 % /* TIME DIFFERENCE */ /* MXG STRONGLY RECOMMENDS PUTTING THE FOLLOWING*/ /* MACRO DEFINITIONS FOR THE ID MACROS IN YOUR */ /* USERID.SOURCLIB RATHER THAN INSTREAM. */ MACRO _SYNCID 200 % /* YOU MAY NEED TO CHANGE THESE MACROS*/ /* BECAUSE THERE IS MORE THAN A SINGLE*/ /* SMF TYPE FOR THESE RECORDS AND THE*/ /* UTILBLDP SYNTAX CAN ONLY HANDLE ONE.*/ MACRO _TSOMCMD 205 % MACRO _TSOMSYS 206 %

Reducing Confusion MACRO _VARDB2 % /* SUPPRESS SMF RECORD TYPE(S)*/ MACRO _CDEDB2 IF 99 = -99 THEN RETURN;% MACRO _SDB2 % /* NO SORT */ MACRO _DIFFDB2 % /* NO DIFF CODE */ MACRO _VAR110 % /* SUPPRESS SMF RECORD TYPE(S)*/ MACRO _CDE110 IF 99 = -99 THEN RETURN;% MACRO _S110 % /* NO SORT */ MACRO _SCICEXC % /* CICS RECORDS BYPASSED */ MACRO _SCICSYS %

Reducing Confusion MACRO _ETY74 % /* NO OUTPUT */ MACRO _ETY74CA % /* NO OUTPUT */ MACRO _ETY74CF % /* NO OUTPUT */ MACRO _ETY74CO % /* NO OUTPUT */ MACRO _ETY74LK % /* NO OUTPUT */ MACRO _ETY74ME % /* NO OUTPUT */ MACRO _ETY74OM % /* NO OUTPUT */ MACRO _ETY74PA % /* NO OUTPUT */ MACRO _ETY74ST % /* NO OUTPUT */ MACRO _ETY74SY % /* NO OUTPUT */ MACRO _ETY74TD % /* NO OUTPUT */ MACRO _ETY746B % /* NO OUTPUT */ MACRO _ETY746F % /* NO OUTPUT */ MACRO _ETY746G % /* NO OUTPUT */

Reducing Confusion MACRO _LTY74 _WTY74 % /* STAY IN WORK */ MACRO _LTY74CA _WTY74CA % /* STAY IN WORK */ MACRO _LTY74CF _WTY74CF % /* STAY IN WORK */ MACRO _LTY74CO _WTY74CO % /* STAY IN WORK */ MACRO _LTY74LK _WTY74LK % /* STAY IN WORK */ MACRO _LTY74ME _WTY74ME % /* STAY IN WORK */ MACRO _LTY74OM _WTY74OM % /* STAY IN WORK */ MACRO _LTY74PA _WTY74PA % /* STAY IN WORK */ MACRO _LTY74ST _WTY74ST % /* STAY IN WORK */ MACRO _LTY74SY _WTY74SY % /* STAY IN WORK */ MACRO _LTY74TD _WTY74TD % /* STAY IN WORK */ MACRO _LTY746B _WTY746B % /* STAY IN WORK */ MACRO _LTY746F _WTY746F % /* STAY IN WORK */ MACRO _LTY746G _WTY746G % /* STAY IN WORK */ MACRO _S74 % /* NO SORT */

Reducing Confusion /* WARNING: ONE OR MORE OF THE RMF RECORDS NEEDED */ /* WARNING: BY RMFINTRV HAS BEEN SUPPRESSED. SOME */ /* WARNING: FIELDS MAY BE EMPTY IN RMFINTRV. */ /* WARNING: SUPPRESSED RECORDS ARE: 74 */ MACRO _VARUSER /* USER SMF _VAR DEFINITIONS */ _VARSYNC _VARTSOM % MACRO _CDEUSER /* USER SMF _CDE DEFINITIONS */ _CDESYNC _CDETSOM % MACRO _EPDBOUT /* USER SMF OUTPUT DEFINITIONS */ _SSYNC _STSOM % );

Reducing Confusion %LET PTY74=WORK; /* NO OUTPUT */ %LET PTY74CA=WORK; /* NO OUTPUT */ %LET PTY74CF=WORK; /* NO OUTPUT */ %LET PTY74CO=WORK; /* NO OUTPUT */ %LET PTY74LK=WORK; /* NO OUTPUT */ %LET PTY74ME=WORK; /* NO OUTPUT */ %LET PTY74OM=WORK; /* NO OUTPUT */ %LET PTY74PA=WORK; /* NO OUTPUT */ %LET PTY74ST=WORK; /* NO OUTPUT */ %LET PTY74SY=WORK; /* NO OUTPUT */ %LET PTY74TD=WORK; /* NO OUTPUT */ %LET PTY746B=WORK; /* NO OUTPUT */ %LET PTY746F=WORK; /* NO OUTPUT */ %LET PTY746G=WORK; /* NO OUTPUT */

Reducing Confusion /* NOW RUN BUILDPDB */ %INCLUDE SOURCLIB(BUILDPDB); %INCLUDE SOURCLIB(ASUM70PR); /* RECOMMENDED */ %INCLUDE SOURCLIB(ASUMTAPE); /* RECOMMENDED */ %INCLUDE SOURCLIB(ASUMTMNT); /* RECOMMENDED */ %INCLUDE SOURCLIB(ASUMTALO); /* RECOMMENDED */

Compression? n Space can become a problem but compressing the data comes at a CPU cost of 2-3% minimum n Two options: – SAS compression – Hardware compression

SAS Compression n Applies only to DASD format SAS datasets n Invoked at the dataset level via a dataset option or at the system level via the COMPRESS=YES option n With version 6 of SAS, it may help you avoid the problems of multi-volume WORK datasets

Hardware Compression n Applies only to tape format SAS datasets on disk n May be striped n May be compressed n May be striped and compressed n Only a single SAS dataset per DD can be open at any point in time n Useful for things like CICSTRAN DB2ACCT

Retaining Data n Weekly/Monthly Processing vs WTD/MTD processing n TREND processing n How long?

Processing Cycle BUILDPDB Monday? WEEKBLD 1st Month? MONTHBLD YES NO YES STOP NO Daily SMF 7 DAILY PDBs 5 Weekly PDB Monthly PDB

Processing Cycle n Original Implementation Circa 1983 n Worked when SMF volume was small n Breaks down as volumes increase without major changes – primarily a space issue - how large is the largest dataset when you try to create it at a monthly level

Weekly Processing n Uses previous 7 daily PDBs n Builds the datasets in sequence n May be to tape or to disk n Volume can quickly become a problem

WTD Processing n Same logic as weekly but run each day n At end of week, the same datasets are created but the resource consumption is spread out n A key is to reduce the number of variables kept

Monthly Processing n Use last 5 weekly and last 7 daily PDBs to construct the previous calendar month. n Logic MUST run on 1st day of month n Can be extraordinarily resource intensive n Uses tape format datasets on disk to avoid multiple mounts

MTD Processing n Like the WTD processing, runs each day with a job at the end of the month to create the next MTD PDB. n Reduce the variables even further than the WTD process n Reduce the datasets retained

Processing Cycle Daily BUILD MTD Build WTD Build Daily PDB TREND MTD PDB !st of Week? !st of Month? Copy to Month Copy to Week STOP Last Week Last Month TREND Build WTD PDB Daily SMF SPIN

WTD/MTD Processing n Both rely on a new macro VMXG2DTE – You specify input and output Ddnames, dataset name, by list if appropriate, whether or not to use PROC APPEND, and what cycle and when to start the cycle.

WTD/MTD Processing %VMXG2DTE( DDIN=MTDPDB, DDOUT=MTDPDB, APPEND=YES, PDB=PDB, DATASET=JOBS, INITIT=M1, DROPPER=var1 var2 );

WTD/MTD Processing %VMXG2DTE( DDIN=WTDPDB, DDOUT=WTDPDB, PDB=PDB, DATASET=RMFINTRV, BYLIST=SYSPLEX SYSTEM STARTIME, INITIT=W2, KEEPER=VAR1 var2 … );

Retention - One Man’s View n Daily PDB is a GDG with 255 generations n Weekly PDB is a GDG with 255 generations with a reduced set of variables and datasets n Monthly PDB is a GDG with 255 generations but a drastically reduced set of variables and datasets

TRENDing n Most of the important data has a TREND component (member TRNDxxxx) n Utilizes VMXGSUM to do summarization n Radical reduction in space for highly summarized data (6 years history in just under 1000 cylinders)

TRENDing n Designed to run weekly but can be done daily by changing the WEEK.xxxxxxxx to PDB.xxxxxxxx in the member %VMXGSUM(INVOKEBY=TRNDRMFI, INDATA= WEEK.RMFINTRV (IN=INWEEK) TREND.TRNDRMFI, ….. );

What Cycles Should You Run? n ‘It depends…’ – Volume is the key. If volume is small, the canned structures work fine. As volume grows, these structures become untenable. The run time becomes longer than can be tolerated. A daily job that runs for a day is not practical.

So?? What do you do? n It is after all, just an application not unlike any other application n Apply all of the tricks and techniques you use on applications n Parallelism may be the answer - if it won’t run serially, run it in parallel

So?? What do you do? n Parallel Jobs - example CMF Data DB2 Data SMF/ RMF TYPE110 TYPEDB2 BUILDPDB BLDIOPDB CICSTRAN DB2PDB Daily PDB Daily IO PDB ASUMUOW WTD/MTD MTD PDB WTD PDB

So??? What do you do? n No WEEKLY process n No MONTHLY process n TREND updated daily (which means the current week is incomplete) n Reduced variable counts n No detail at the WEEKLY/MONTHLY level

Parallel Streams n Stream 1 - BUILDPDB – Processes normal BUILDPDB but excludes all CICS, DB2, and IO related data n is the average response time to DASD across 10,000 volumes a significant metric in RMFINTRV? Probably not.

Parallel Streams n Stream 2 - BLDIOPDB – Process IO related data n type 42 n type 73 n type 74 n type 75 n type 78 n etc.

Parallel Streams n Stream 3 - BLDCISTA – Process CICS Statistics n Stream 4 - DALYDSET – MXG member ANALDSET n Stream 5 - BILDDCOL – Process DCOLLECT data

Parallel Streams n Stream 6 - BUILDTMS – Process TMS Catalog n Stream 7 - BILDDSNS – Combine TMS and DCOLLECT data n Stream 8 - DB2PDB (3 times daily) – Process DB2 data

Parallel Streams n Stream 9 - BLDCISTR (3 times daily) – Process CICS transaction data n Stream 10 - ASUMUOW – Combine CICS/DB2 into Unit of Work summary

CICS Volume Problem n Processing of CICS/DB2 SMF data into ASUMUOW is time consuming n 3 times/day 5-7 hours each – CICS volume is 25GB of SMF data 30.1M observations (and this was a light day) – DB2 volume is 4.7GB SMF data 2.0M obs in DB2ACCT n Volume is growing 30-40%/year

CICS Volume Problem n CICSTRAN and DB2ACCT must be sorted prior to merge n TAPE IO is bottleneck - CICSTRAN dataset is 5-6 volumes of tape at minutes/volume to move data n Data is read/written by data step, by sort then read by data step - 5 full passes

CICS SMF DB2 SMF CICSTRAN SORTED CICSTRAN DB2ACCT SORTED DB2ACCT TYPE110 ASUMUOW SORT TYPEDB2 ASUMUOW 6 Hours Original Architecture 5-6 tapes 1-2 tapes Last Year

CICS SMF DBACCT SORTED DB2 SMF CICSTRAN SORTED ASUMUOW TYPE110/ SORT TYPEDB2/ SORT ASUMUOW 4.5 Hours Using VIEW from DATA Step to SORT Today

CICS SMF DB2 SMF ASUMUOW TYPE110/ SORT TYPEDB2/ SORT ASUMUOW 2.3 Hours Using VIEW from DATA Step to SORT PIPE from SORT to ASUMUOW CICSTRAN SORTED DB2ACCT SORTED fitting DB2 Processing pipe Next Step

Solving the CICS Volume Problem n Requires batch pipes for SAS - that will mean at least V8.2 and maybe V9 n DB2 Processing must be in 2 steps – the fitting for DB2ACCT can’t be reopened for input within the same job step. There may be a way to get around this but I haven’t found it yet.

Summary n Whether it is SMF or the processing of SMF, it is just an application program n What works for the standard every day application also works here n Parallel processing solves most of the problems (but not without raising a few issues itself.)

????????????????????????