Java on z/OS: A fresh look

Slides:



Advertisements
Similar presentations
Performance of Cache Memory
Advertisements

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Chapter 9: Batch processing and the Job Entry Subsystem (JES)
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Peter Plevka, BMC Software Managing IT and Your Business – Optimizing Mainframe Cost and Performance.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
Chapter 4 Assessing and Understanding Performance
1 Programming & Programming Languages Overview l Machine operations and machine language. l Example of machine language. l Different types of processor.
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Page 1 © 2001 Hewlett-Packard Company Tools for Measuring System and Application Performance Introduction GlancePlus Introduction Glance Motif Glance Character.
Software Development Unit 6.
Z/OS V14 March Copyright © 2012 PKWARE, Inc. and its licensors. All rights reserved. PKWARE, SecureZIP, and PKZIP are registered trademarks of PKWARE,
Mark Nesson June, 2008 Fine Tuning WebFOCUS for the IBM Mainframe (zSeries, System z9)
Performance and Capacity Experiences with Websphere on z/OS & OS/390 CMG Canada April 24, 2002.
Software design and development Marcus Hunt. Application and limits of procedural programming Procedural programming is a powerful language, typically.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
CS110/CS119 Introduction to Computing (Java)
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Lecture Note 3: ASP Syntax.  ASP Syntax  ASP Syntax ASP Code is Browser-Independent. You cannot view the ASP source code by selecting "View source"
To Compress or not to Compress? Chuck Hopf. What is your precious? Gollum says every data center has something that is precious or hard to come by –CPU.
MCTS Guide to Microsoft Windows 7
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Review of Memory Management, Virtual Memory CS448.
Instruction Set Architecture
Chapter 9 Database Management Discovering Computers Fundamental.
Highlights Builds on Splunk implementations – extending enterprise value to include mission-critical IBM mainframe data. Unified mainframe data source.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
CS 102 Computers In Context (Multimedia)‏ 01 / 23 / 2009 Instructor: Michael Eckmann.
Managing Monthly License Charges Connecticut CMG Andrew Jepeal April 2015.
1 Instant Data Warehouse Utilities Extended (Again!!) 14/7/ Today I am pleased to announce the publishing of some fantastic new functionality for.
IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.
IT253: Computer Organization
June , Enterprise Computing Community - ECC 2010 The Changing Role of the System Programmer Jon L. Veilleux
How Computers Work … and how you can work them. Art 315 Lecture 03 Dr. J Parker Fall 2010.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Robert Crawford, MBA West Middle School.  Explain how the binary system is used by computers.  Describe how software is written and translated  Summarize.
National Taiwan University Department of Computer Science and Information Engineering National Taiwan University Department of Computer Science and Information.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Click to add text © 2012 IBM Corporation Design Manager Server Instrumentation Instrumentation Data Documentation Gary Johnston, Performance Focal Point,
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
 Introduction to SUN SPARC  What is CISC?  History: CISC  Advantages of CISC  Disadvantages of CISC  RISC vs CISC  Features of SUN SPARC  Architecture.
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 7: Batch processing and the Job Entry Subsystem (JES) Batch processing and JES.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Software Development. Software Development Loop Design  Programmers need a solid foundation before they start coding anything  Understand the task.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
1 Running Experiments for Your Term Projects Dana S. Nau CMSC 722, AI Planning University of Maryland Lecture slides for Automated Planning: Theory and.
Our Programming and Batch Systems Company Name 1 |Copyright © Interskill Learning Pty Ltd 2011 – Commercial in Confidence z/OS Application Development.
Object Oriented Software Development 4. C# data types, objects and references.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Today… “Hello World” ritual. Brief History of Java & How Java Works. Introduction to Java class structure. But first, next slide shows Java is No. 1 programming.
CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
1Q2009 z/OS and Workload Management TGVL: System z Foundation 1 The Value of z/OS and Workload Management (WLM) IBM System z z10 ECz10 BC David E Brown.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
ITP 109 Week 2 Trina Gregory Introduction to Java.
J. Templon Nikhef Amsterdam Physics Data Processing Group Nikhef Multicore Experience Jeff Templon Multicore TF
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Assignment 5 is posted. Exercise 8 is very similar to what you will be doing with assignment 5. Exam.
Saving Software Costs with Group Capacity Richard S. Ralston OHVCMGMay 13, 2010.
From the Trenches OHVCMG May 13, 2010 Richard S. Ralston Antarctica.
SQL Database Management
The heavyweight parts of lightweight languages
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Jonathan Gladstone, P. Eng
CPU Explorer Training 2014.
Presentation transcript:

Java on z/OS: A fresh look Scott Chapman American Electric Power

Important notes I don’t really like Java as a language I’m not a Java expert Results presented herein may be installation-dependent There’s a lot of moving parts here I understand there’s zAAP on zIIP “zAAP” used generically here All trademarks of IBM, Oracle, and everybody else hereby recognized

Why Java on z/OS? Because programmers want to use it http://xkcd.com/801/

Why Java on z/OS Because it enables open source projects that are cool/useful/interesting Key trick: run the JVM in ASCII -Dfile.encoding=ISO8859-1 Many things will just run with that run-time option!

What about a GUI? Turns out that that just works too! Start Xming X server on your PC Check the “No Access Control” option Set the DISPLAY environment variable Run the code S147774:/u/s147774: >export DISPLAY=10.97.131.15:0 S147774:/u/s147774: >java -Xmx320m -jar ga33.jar

Debugging Javascript code running in Helma on the mainframe with the GUI connected to Xming on my laptop Works better than I expected

Why Java on z/OS Because it enables more programming language choices Javascript built in to Java 6 Rhino interpreter from Mozilla In theory, should be able to run any JVM- based language (I haven’t tested these) Jython Groovy Clojure Scala Ruby (via JRuby)

Why Java on z/OS It may perform better It may save you money If you are on a sub-capacity machine It may save you money Pretty unlikely Only if you can take some work away from your peaks

Which job is better?

How cheap are zAAP/zIIPs? $100K/SE (z196, zEC12) How much is $100K? Consider adding 1 engine to z196-710: 710 = 10,250 MIPS, 1191 MSUs 711 = 11,073 MIPS, 1286 MSUs 710+1 zIIP = 10,302+1,000 MIPS z/OS (base) at this level costs $62/MSU Scenario B, z/OS base goes up almost $6K/month zIIP costs < 17 months of z/OS Base Not to mention features, DB2, CICS, etc.

What about accessing z/OS services? JZOS Classes to easily access z/OS specific constructs z/OS datasets RACF Respond to operator commands Access JES Spool

Ways to Run Java on z/OS WebSphere CICS DB2 Stored Procedures Batch Started Tasks Unix shell

Batch / Started Task options BPXBATC BPXBATCH (traditional alias) BPXBATSL (local spawn alias) Traditional approach Difficulty with 100-byte JCL Parm JZOS Ships with z/OS Avoids 100-byte parm limit Adds a lot of flexibility

Measuring Java

zAAP vs. GCP time Watch the normalization factor! Most SMF values not normalized Tools/reports may normalize for you Consider IFAHONORPRIORITY=NO Avoid using GCPs to help zAAPs Can result in >99% of Java CPU time executed on zAAP

SDSF zAAP vs. GCP columns This data comes from RMF JOBNAME CPU-Time GCP-Time zAAP-Time zACP-Time zAAP-NTime P3SR01BS 1514.11 9.53 772.02 2.26 1501.82 P3SR01AS 1706.50 12.82 868.75 1.95 1690.00 P3SR01B 788.55 197.66 281.64 1.53 547.87 P3SR01A 763.01 192.47 272.33 1.10 529.77 P3SR02A 2953.37 422.62 1188.79 5.39 2312.56 P3SR02B 3051.88 437.74 1226.02 6.55 2385.00 P3SR01AS 7281.39 62.56 3698.72 11.47 7195.17 P3SR02BS 2805.58 123.85 1316.22 22.15 2560.45 P3SR01BS 7783.21 63.38 3955.54 14.38 7694.77 P3SR02AS 2591.27 118.60 1216.36 10.74 2366.21 RTMSERVE 2661.39 3.85 1363.45 1.03 2652.34 TCB + SRB real zAAP on GCP normalized

SMF 30 Accounting BPXBATCH vs. BPXBATSL vs. JZOS Important due to spawned OMVS tasks Single step job results: BPXBATSL: 1 step, 1 job record BPXBATCH: 6 step, 4 job records CPU time collected on type OMVS records JZOS: 2 step, 2 job records CPU time almost completely on JOB types

Some interesting calculations zAAPn = SMF30_TIME_ON_IFA * SMF30ZNF / 256 percent work done on zAAP = zAAPn / (zAAPn + SMF30CPT + SMF30CPU) (“Generosity” or “offload” factor) percent zAAP sent to GCP = SMF30_TIME_IFA_ON_CP / (SMF30_TIME_ON_IFA+SMF30_TIME_IFA_ON_CP) (“Fallback” percentage—can be <1%, although some fallback is normal and expected)

Other SMF records RMF records WAS 120 records HIS type 113 records Look for breakdown of processor types for both hardware and report / service classes WAS 120 records New subtype 9s for WAS 7+ much better! HIS type 113 records GCP vs. zAAP vs. zIIP

Java Performance

What about performance? Java on the mainframe has a history of performance problems Java is inherently “heavy” due to the JVM Scott’s Law: “The easier you make it on the programmer, the harder it is on the system” Today’s z hardware and software are up to the task! (But you probably want zAAPs!)

Heard at WAS Week 200x… “Our goal is to get JVM startup time down to about 1 second.” Seemed like a stretch at the time! WAS startup took several minutes

Today: WAS Servant Startup <1 min 15.49.15 STC14327 ---- MONDAY, 18 APR 2011 ---- 15.49.15 STC14327 $HASP373 P3SR02AS STARTED 15.49.15 STC14327 IEFUSI BPXBATSL-P3ASRU ABOVE REGION SET TO 1536MB 15.49.15 STC14327 IEF403I P3SR02AS - STARTED - TIME=15.49.15 15.49.16 STC14327 +BBOO0004I WEBSPHERE FOR Z/OS SERVANT PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A IS STARTING. 15.49.16 STC14327 +BBOO0239I WEBSPHERE FOR Z/OS SERVANT PROCESS p3cell/p3nodea/p3sr02a IS STARTING. 15.49.16 STC14327 +BBOO0308I SERVANT PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A IS EXECUTING IN 64-BIT ADDRESSING MODE. 15.49.16 STC14327 +BBOM0007I CURRENT CB SERVICE LEVEL IS build level 7.0.0.12 (cf121027.08) release WAS70.ZNATV date 07/09/10 11:02:02. ... 15.49.56 STC14327 +BBOO0222I: WSVR0001I: Server SERVANT PROCESS p3sr02a open for e-business 15.49.57 STC14327 +BBOO0020I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT PROCESS P3SR02A. 15.49.57 STC14327 +BBOO0248I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A. Not much in that particular servant

Today: HelloWorld in <2 seconds 10.08.55 JOB47259 IEF403I S147774B - STARTED - TIME=10.08.55 10.08.57 JOB47259 - --TIMINGS (MINS.)-- ----PAGING COUNTS--- 10.08.57 JOB47259 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO 10.08.57 JOB47259 -S147774B RUNOMVS 00 59 .00 .00 .02 2524 0 0 0 0 10.08.57 JOB47259 IEF404I S147774B - ENDED - TIME=10.08.57 10.08.57 JOB47259 -S147774B ENDED. NAME-BPXBATCH TEST TOTAL CPU TIME= .00 TOTAL ELAPSED TIME= .02 10.08.57 JOB47259 $HASP395 S147774B ENDED z10 EC 504 with zAAP Output Hello Scott Java runtime: IBM Corporation 1.6.0, vm version 2.4 Running on: s390 z/OS 01.10.00 Running for: S147774 Classpath: /usr/lpp/java/J6.0/lib:/usr/lpp/java/IBM/J1.3/l JCL //RUNOMVS EXEC PGM=BPXBATCH, // PARM='SH java -Xms32M -Xmx32M HelloWorldApp Scott' //SYSOUT DD SYSOUT=* //SYSPRINT DD SYSOUT=* //SYSUDUMP DD SYSOUT=* //STDENV DD * //STDOUT DD SYSOUT=* //STDERR DD SYSOUT=*

Small machine Not surprising that ~50 MIPS engines can’t keep up with 10.51.53 JOB10901 IEF403I S147774B - STARTED - TIME=10.51.53 10.52.04 JOB10901 - --TIMINGS (MINS.)-- ----PAGING COUNTS--- 10.52.04 JOB10901 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO 10.52.04 JOB10901 -S147774B RUNOMVS 00 86 .00 .00 .18 2252 0 0 0 0 10.52.04 JOB10901 IEF404I S147774B - ENDED - TIME=10.52.04 10.52.04 JOB10901 -S147774B ENDED. NAME-BPXBATCH TEST TOTAL CPU TIME= .00 TOTAL ELAPSED TIME= .18 10.52.04 JOB10901 $HASP395 S147774B ENDED z10 BC E02 without zAAPs Not surprising that ~50 MIPS engines can’t keep up with 450 / 900 MIPS engines

What about doing real work? Days of assuming it will run faster on your PC are over Have seen H2 perform better on z/OS Still, it is Java, it’s not CPU-free Performance may depend on: zAAP and GCP capacity System settings (USS, zFS, WLM) Application code Java Settings (heap size, GC policy) Random luck

Application code Application code is always important Regardless of the language! BufferedReader or ZFile? Classic “it depends” BufferedReader seems like it should be faster But they provide different results: byte array vs. string What you want to do with the result may impact which is best for any given situation Java has lots of similar but slightly different ways of doing things

Heap settings Heap settings always seen as an issue Size is the usual suggestion Is bigger always better? Does anybody know how much heap they really need? (no) Min / Max sizes same or different? Garbage collection policy options

Memory is an issue Java’s memory usage can be an issue “Requirements” for 100s of MBs are not unusual Often “requirements” seem to be a SWAG Java heap size can’t be reliably predicted from the code & expected volumetrics Test with reasonable numbers before assuming the requirements are real Be sure to get all processing scenarios!

Garbage Collection Options (IBM Java 6) optthruput – default Probably best for batch gencon – generational / concurrent maybe good for large heap, transactional workloads (WAS) optavgpause – reduces long pauses subpool – “improved” object allocation For important workloads, may want to test all of them at various size Lots of other heap/gc options too See IBM JDK Diagnostics Guide!

heap size may not matter For some workloads, heap size may not matter

Too small of a heap can cause CPU increase

There might be a slight benefit to a fixed heap size

Heap size most important, but GC Policy also can be significant

Don’t mess with the JIT!

Could be good for certain workloads

So what’s the random thing? Much more variation in CPU time measurements with today’s CPUs Superscalar pipeline and cache issues Seems to impact my Java work more than I expected Consistently ran same workload Extremely lightly utilized LPAR Lightly utilized zAAPs Same variability over time So I tried some more tests…

One zAAP Two zAAPs Zero zAAPs

Why is this? I don’t know, but best guess is CPU cache and memory access effects But I thought I’d look at the 113 records to see if I could find anything interesting….

Data from Test period 1 (One zAAP) Proc 0 = GCP Proc 2 = zAAP

Proc 0 = GCP Proc 2 = zAAP Seems to confirm our SMF30 data

Proc 0 = GCP Proc 2 = zAAP

L1.5 Improvement corresponds to dip in machine usage Proc 0 = GCP Proc 2 = zAAP

Dip in GCP TLB Miss overhead due to machine less busy Proc 0 = GCP Proc 2 = zAAP

Proc 0 = GCP Proc 2 = zAAP

My Guesses… My test Java workloads were too cache and superscalar friendly Perhaps makes it more susceptible to pipeline hazards But: Wouldn’t the REXX workload be even more superscalar and cache friendly? Why were the 113 measurements so consistent? Or Java is really doing variable amounts of work? Or… something isn’t right someplace? Take away: Java CPU measurements might be more variable than you expect

Most recent testing Repeated testing later in the year z/OS 1.12 vs. 1.10 1 Year more recent Java 6 (Fall 2010 vs. Fall 2009) Still saw variability, but worst of it was closer to 25-30% instead of upwards of 75% Saw similar variability when testing on a z9 with zAAPs Saw at least one instance in a production LPAR with similar variability: (in 3 executions of the same job, 1st consumed just over half as much CPU of the later runs) Could not readily replicate on a WSC system running under z/VM

Summary Java enables all sorts of cool things you might not have thought could run on the mainframe Mainframe’s Java performance not significantly worse than any other platform (Assuming adequate zAAP capacity) Lots of tuning knobs for Java Java CPU time measurements might be more variable