Presentation is loading. Please wait.

Presentation is loading. Please wait.

Issues after Workload Moves And why we implemented IRD to fix.

Similar presentations

Presentation on theme: "Issues after Workload Moves And why we implemented IRD to fix."— Presentation transcript:

1 Issues after Workload Moves And why we implemented IRD to fix

2 Workload Separation For Licence Savings Original setup Mixture of licences across LPARs Licence charges based on LPAR size, not product usage within the LPAR SYSA IMS DB2 CICS BATCH DB2 SP SYSB IMS DB2 CICS BATCH DB2 SP SYSK DB2 CICS BATCH DB2 SP MQ Z9 2094 – 713

3 Workload Separation For Licence Savings New Setup IMS, CICS and general LPAR IMS subsystems merged DB2 datasharing MSU licence cost on SYSB is substantially cheaper. Move as much non-IMS, non-CICS work to SYSB as possible. SYSA IMS DB2 DLIBATCH SYSB DB2 BATCH DB2 SP SYSK DB2 CICS MQ BMPBATCH IMS LPAR BATCH LPAR CICS LPAR

4 Working day 1 Lots of batch, overnight and during day Always our peak day of month Peak occurs during overnight batch Batch failures and overruns common Online systems are also busiest on this day System can often run capped

5 Workload Moves No major issues with online No major issues with normal batch, but Major issue with WD1 batch Many DB2 batch jobs elapsed time increased 2-3 times when moved to SYSB Changing priority made no difference in most cases If reran on SYSA, would run much faster Jobs not delayed for CPU Nothing in RMFWDM or DB2 thread stats to explain Dasd response times similar SYSA/B LPARs on same CEC, shared channels

6 Thoughts and Guesses Delay in servicing I/O interrupts Higher TPI% on SYSB, but not much CPENABLE setting (10,30) – in line with IBM recommendations Logical processor share% very low on SYSB 16% vs 80% SYSA Same number of CPUs online SYSB as SYSA Weightings not changed to reflect workload moves Significant when capped Weightings need to be more dynamic overnight vs online LPAR dispatch time 12.5 ms

7 Thoughts and Guesses Found ‘Short CP’ paper by Kathy Walsh Describes how high priority work may be delayed In relation to online CICS I/O not mentioned But posting I/O complete IS high priority work Recommends reducing online CPUs to match workload Recommends making sure weights appropriate Recommends using IRD to monitor and automate changes

8 What is a short CP? SYSB weight set to expect 120 MSU (approx 2 engines) When capped, this is enforced SYSB Z/OS thinks it has 11 engines, but each dispatched only about 11/2 of the time High priority work is not dispatched All dispatched CPs might be disabled for interrupts due to CPENABLE Short CP ratio – MVSBUSY/LPARBUSY

9 Metrics From 01/02/2012 Short CP RatioDASD RESPONSEDASD RATEActual MSU IntervalGCL remSYSASYSBSYSASYSBSYSASYSBSYSASYSBRatio 03.00.003211. 03.30.002651. 04.00.002341. 04.30.001941. 05.00.001661. 05.30.001081.001.584.19.68008420720923047.61% 06.00.00491.001.095.286012477727734244.75% 06.30.00671. 07.00.00761. 07.30.00771.001.183.8 6627497323219354.59% 08.00.00291.093.753.23.76661306953615377.79% 08.30.00- 09.00.00-371.344.793.63.55073244840313774.63% 09.30.01-251.564.813.8 4911185441612477.04% 10.00.00-111.515.873.9 5523204543410081.27% 10.30.0001.183.874.13.75198237344614375.72% 11.00.00-21.092.954.33.84651291033115468.25% 11.30.00-211.193.793.62.95984288140012676.05% 12.00.00-

10 2/1/2012 Notes on Metrics DASD responses appear similar between systems I/O rate appears depressed on SYSB when capped, more than SYSA 90% of I/Os serviced by 4 out of 11 CPs But each CP only has 16% logical processor share Weights are about 80/20 between SYSA/SYSB When capped MSUs moved toward these ratios SYSB needs more than SYSA in the batch window SYSA needs more than SYSB in the online window

11 Recreation of problem Lower the cap on development system Run several low priority looping jobs Run I/O intensive Job at high priority – no CPU delays Problem recreated only when System was capped Weight was too low for system MSU Additional CPs were varied online

12 Metrics From 01/03/2012 Short CP RatioDASD RESPONSEDASD RATEActual MSU IntervalGCL remSYSASYSBSYSASYSBSYSASYSBSYSASYSBRatio 03.00.001632.101.016.910.95287612931641143.47% 03.30.00791. 04.00.00241. 04.30.0051.081.003.445503785523924349.59% 05.00.00281. 05.30.00111. 06.00.00- 06.30.00-391. 07.00.00-71.00 3.34.1514743313318080.54% 07.30.00421.00 3.62.1263259161757869.17% 08.00.00871. 08.30.00971. 09.00.001011.021.013.556570363726013965.16% 09.30.011191. 10.00.001171. 10.30.001181.041.0044.6489325273308280.10% 11.00.001121. 11.30.00881. 12.00.00541.

13 Comparison SYSB logical processor share 60-100%, compared to 16% previously Helps processor cache SYSA logical processor share 60% compared to 80% Job FNMDMP05 15h17mins elapsed, 55mins cpu, 642K excp 01/02 8h35mins elapsed, 48mins cpu, 648K excp 01/03 DB2 DBM1 address space (does database I/O) 499,271,506 I/Os take 02:50:58 CPU hours 01/02 549,277,707 I/OS take 02:28:45 CPU hours 01/03

Download ppt "Issues after Workload Moves And why we implemented IRD to fix."

Similar presentations

Ads by Google