1 Best Practices for Performance Evaluation and Diagnosis of Java Applications Prashanth K Nageshappa Venkataraghavan Lakshminarayanachar IBM.

Slides:



Advertisements
Similar presentations
© 2009 IBM Corporation iEA16 Defining and Aligning Requirements using System Architect and DOORs Paul W. Johnson CEO / President Pragmatica Innovations.
Advertisements

Copyright © IBM Corp., All rights reserved. The presentation is licensed under Creative Commons Att. Nc Nd 2.5 license. RESTful Service Oriented.
© 2010 IBM Corporation ® IBM Software Group Assistive Technology As applied to the workplace Niamh Foley.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
© 2014 IBM Corporation IBM Tivoli Storage Manager Virtual Appliance Smarter Data Protection for Cloud Environments Cyrus Niltchian, Product Management.
® IBM Software Group © 2013 IBM Corporation Innovation for a smarter planet Timeboxes in a New Paradigm of Behavior Modeling Barclay Brown, ESEP IBM
Please Note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information.
® IBM Software Group © 2012 IBM Corporation OPTIM Data Studio – Jon Sayles, IBM/Rational November, 2012.
© 2014 IBM Corporation The insights to transform the business with speed and conviction Kevin Redmond Head of Information Management Central & Eastern.
Copyright © IBM Corp., All rights reserved; made available under the EPL v1.0 | March 20, 2008 | Short Talk Standards based systems management: An.
Optimizing your Java Applications for multi-core hardware
© 2009 IBM Corporation ® IBM Lotus Notes and Domino Product Roadmap April 2009.
© 2006 IBM Corporation Flash Copy Solutions im Windows Umfeld TSM for Copy Services Wolfgang Hitzler Technical Sales Tivoli Storage Management
Copyright © IBM Corp., All rights reserved; made available under the EPL v1.0 | March 19, 2008 | Short Talk Extending TPTP for TTCN-3 Paul Slauenwhite.
© 2013 IBM Corporation Practical Performance Understand and improve the performance of your application San Hong Li – Technical lead of JTC Shanghai 23.
Copyright © IBM Corp., The Eclipse™ Babel Project Translation Server Kit Lo IBM™ Corporation.
IBM Software Group AIM Enterprise Platform Software IBM z/Transaction Processing Facility Enterprise Edition © IBM Corporation 2005 TPF Users Group.
Session objectives Discuss whether or not virtualization makes sense for Exchange 2013 Describe supportability of virtualization features Explain sizing.
IBM ISPF Productivity Tool © 2008 IBM Corporation IBM ISPF Productivity Tool for z/OS V 5.10 More Than Just ISPF.
IBM Software Group ® Jazz Storage Service Thomas.
Pradeep Kumar C Support Escalation Engineer Windows Azure Diagnostics Logging and Monitoring in the Cloud.
June 5–9 Orlando, Florida IBM Innovate 2011 Session Track Template Rainer Ersch Senior Research Scientist Siemens AG ALM-1180.
© 2011 IBM Corporation January 2011 Pam Denny, IBM V7 Reporting.
© 2012 IBM Corporation Introducing IBM Cognos Insight.
© 2015 IBM Corporation Big Data Journey. © 2015 IBM Corporation 2.
IBM eServer iSeries © 2003 IBM Corporation ™™ iSeries Solutions for Business Continuity IBM eServerJ iSeriesJ © 2003 IBM Corporation.
® IBM Software Group © 2011 IBM Corporation Innovation for a smarter planet IBM SOA Overview for MITRE “Driving SOA Program Success and Efficiency” April.
© 2012 IBM Corporation IBM Security Systems 1 © 2012 IBM Corporation Cloud Security: Who do you trust? Martin Borrett Director of the IBM Institute for.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Brad Adams IBM Software, Rational 05/13/14
Domino iSeries Multi-Versioning © 2002 IBM Corporation | Lotus software © 2002 IBM Corporation Domino Multi-Versioning Mike Gordon – IBM Global Services.
Click to add text © 2012 IBM Corporation Session # INV305 Getting beyond “good enough” with Microsoft Sharepoint Louis Richardson Worldwide Social Business.
Tuning Threaded Code with Intel® Parallel Amplifier.
© 2013 IBM Corporation IBM UrbanCode Deploy v6.0.1 Support Enablement Training Source Configuration and Database Upgrades Michael Malinowski
I want stress-free IT. i want control. i want an i. IBM System i ™ Session: Secure Perspective Patrick Botz IBM Lab Services Security Architecture Consulting.
IBM Innovate 2013 Define and Manage Requirements with IBM Rational Requirements Composer Peter Luckey North America Requirements Management & Quality Management.
IBM Software Group ® Jazz Team Build – Part 1 Overview Jonathan.
RealTimeSystems Lab Jong-Koo, Lim
IBM Systems Group © 2004 IBM Corporationv 3.04 This presentation is intended for the education of IBM and Business Partner sales personnel. It should not.
IBM Innovate 2012 Title Presenter’s Name Presenter’s Title, Organization Presenter’s Address Session Track Number (if applicable)
Comparison between EPF Composer and Rational Method Composer
© 2013 IBM Corporation IBM UrbanCode Deploy v6.0 Support Enablement Training Jenkins plug-in 1 November 2013.
Work smarter, keep connected with Lotus Software Jon Crouch | Senior Technical Specialist, Lotus Software Matt Newton | Senior Technical Specialist, Lotus.
David Hatten Developer, UrbanCode 17 October 2013
COMBINED PAGING AND SEGMENTATION
Virtualization Engine console Bridge Concepts
Deploy Plugins Developer 29 October 2013
Introduction Enosis Learning.
IBM System z9 109 Availability Eye Opener
Integrating Data With Cognos
Many-core Software Development Platforms
Flight Recorder in OpenJDK
Introduction Enosis Learning.
IBM Blockchain An Enterprise Deployment of a Distributed Consensus-based Transaction Log Ben Smith & Kostantinos Christidis 1 ©2016 IBM Corporation.
Embedded Software (ESW) Engineering Practices Introduction
Presentation transcript:

1 Best Practices for Performance Evaluation and Diagnosis of Java Applications Prashanth K Nageshappa Venkataraghavan Lakshminarayanachar IBM

2 Agenda Inside a High Performance Java Virtual Machine (JVM) Performance Issues – Diagnosis Techniques The Healthcenter

3 Inside a High Performance JVM

4 DebuggerProfilersJava Application Code JVMTI SE5 Classes SE6 Classes Harmony Classes User Natives GCJITClass Library Natives Pluggable VM InterfacesJava Native Interface (JNI) Core VM (Interpreter, Verifier, Stack Walker) Trace & Dump Engines Port Library (Files, Sockets, Memory) Thread Library AIXLinuxWindowsz/OS PPC-32 PPC-64 x86-32 x86-64 PPC-32 PPC x86-32 x Lifting the Hood Overall Architecture User Code VM Extensions Core VM Portability Layer Operating systems = User Code = Java Platform API = VM-aware = Core VM

5 Java: Adaptive Compilation in J9/TR Methods start out being interpreted After N invocations (or via interpreter sampling) methods get compiled at ‘cold’ or ‘warm’ level Low overhead sampling thread is used to identify hot methods Methods may get recompiled at ‘hot’ or ‘scorching’ levels (for more optimizations) Transition to ‘scorching’ goes through a temporary profiling step cold hot scorching profiling interpreter warm

6 Code Example public static int total = 55; public static int dummy(int i, int j, int N, int[] a) { int k = 0; for (i = 0; i < N; i++) { k = k + j + a[i] + (total + foo()); } return k; } public static int foo() { return 75; }

7 Optimization and Effects Opt level Code Size (bytes) Compilation Time (us) Wall clock runtime (ms) Cold Warm Hot Profiling n/a Scorching57811,

8 Garbage Collection - Goals Tidying up… Fast allocation path – Large contributor to overall JVM performance. Low pause times and concurrent operation – Fit for purpose – different algorithms with different tradeoffs. Hardware exploitation – Multiple CPUs & varying memory architectures. – Algorithmic and processor parallelism. Accurate garbage collection – Earlier IBM JVMs did a ‘partially conservative’ GC, which was suboptimal.

9 Compressed References > 32-bit Object (24 bytes – 100%) clazzflagsmonitor int field object field ClazzFlagsPadMonitor int field Padobject field ClazzFlagsMonitor int field object field > 64-bit Object (48 bytes – 50%) > 64-bit Compressed (24 bytes – 100%) > Use 32-bit values (offsets) to represent object fields With scaling, between 4 GB and 32 GB can be addressed > To enable the feature : -Xcompressedrefs

10 Threading and Monitors Java uses monitors everywhere – Good – easy to use, safety built-in for many cases! – Bad – there’s a tax, even when there’s no contention. Central to performance in JVMs – Avoid it? Escape analysis (but remember JSR 133!). – Make it cheaper? Tasuki locks Lock reservations

11 Bimodal lock – ‘thin’ or ‘inflated’ Single atomic operation (on enter) A Study of Locking Objects with Bimodal Fields (Tamiya Onodera & Kiyokuni Kawachiya, IBM Research, OOPSLA 1999) Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations (Kiyokuni Kawachiya, Akira Koseki, Tamiya Onodera, IBM Research, OOPSLA 2002) Tasuki locks 0 0 1Inflated Monitor Thread ID 0 Unowned Thin owned Inflated owned

12 Historical Perspective Is it just the hardware? We’ve come a long, long way… Why? –Processors – better control & understanding of the memory hierarchy –Language understanding (idiom recognition) –Processing budget (new instructions, more cores)

13 SPECjbb Trademarks and Results SPEC and SPECjbb are registered trademarks of the Standard Performance Evaluation Corporation. Results referenced are current as of June, The SPECjbb2005 results are posted at which contains a complete list of published SPECjbb2005 results. SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Data Pt Leap vs prev Accum Leap vs base JVMHardwareXeonChips Core (s) GHz SPECjbb 2005 bops SPECjbb 2005 bops/jvm wwwmath 1 base1.0JRockit R25.2Jun 05Dell PowerEdge SC1425DP ,208 link 2 5%1.05J9 5.0GAOct 05IBM eServer xSeries 346dual ,585 link1.11 – 1.06 = a 6%1.11HotSpot 5u5Dec 05FSC PRIMERGY TX300 S2DP ,314 link28314/24208*3.6/3.8=1.11 3bHotSpot 5u5Dec 05FSC PRIMERGY RX300 S2dual ,986 link41986/39585= %1.3JRockit P26.0Mar 06FSC PRIMERGY TX300 S2dual ,233 link49233/41987=1.17; 1.11*1.17=1.30 5a 7%1.39JRockit P26.4Jun 06Dell PowerEdge 1850DP ,503 link35503/28314=1.25; 1.25*1.11=1.39 5bJRockit P26.4Jun 06FSC PRIMERGY BX620 S ,407 link 614%1.59J9 5.0sr2July 06IBM System X ,941 link114941/100407=1.14; 1.39*1.14=1.59 7a14%1.81JRockit P27.1Nov 06Dell PowerEdge ,589 link130589/114941=1.14; 1.59*1.14=1.81 7bJRockit P27.1Nov 06Dell PowerEdge ,065105,033link 84%1.88J9 5.0sr5Feb 07IBM System X ,032109,016link /210,065=1.04; 1.81*1.04=1.88 9a1%1.90JRockit P27.2Mar 07Dell PowerEdge ,648110,324link220648/218032=1.01; 1.88*1.01=1.90 9bJRockit P27.2Aug 07Dell PowerEdge ,47259,618link 10a6%2.01JRockit P27.4Nov 07Dell PowerEdge ,40363,101link252403/238472=1.06;1.90*1.06= bJRockit P27.4Nov 07Dell PowerEdge 2950 III ,13075,783link %2.01HotSpot 6u5pFeb 08Sun fire X ,29775,824link303297/303130= a7%2.14J9 6sr1Mar 08IBM System X ,17280,793link323172/303297=1.07; 2.01*1.07= bJ9 6sr1Sep 08IBM System x ,60582,651link 134%2.23J9 6sr3Oct 08IBM System x ,43686,109link344436/330605=1.04; 2.14*1.04= a7%2.38JRockit P28.0Mar 09FSC PRIMERGY RX200 S ,03492,009link368034/344436=1.07; 2.23*1.07= aJRockit P28.0Mar 09Cisco UCS B200-M ,792278,396link %2.38HotSpot 6u14pMar 09Sun Fire X ,822278,411link556822/557/792= %2.58J9 6sr5Mar 09IBM BladeCenter HS ,417151,104link604417/556822=1.09; 2.38*1.09=2.58

14 Performance Issue : Diagnosis Techniques

15 Debugging Performance Problems Four layers of deployment: – Operating System / Infrastructure – Java Runtime / Garbage Collection – Application Code – External Delays Simple process is to start at the bottom, and eliminate layers

16 Infrastructure and Java Runtime Issues

17 Application and External Issues

18 MustGather

19 “MustGather” Diagnostics Set of data requested by IBM Support initial problem diagnosis – Specified on a per-scenario basis Requests only the data relevant to the scenario – Specified on a per-platform basis Leverages OS specific tools and capabilities – Split into two parts: Setup: to be done before starting the Java application Gather: to be done when the problem has occurred Linked to from product support pages – Java: – WAS:

20 System Resource Contention

21 Resource Contention: Physical Memory Lack of physical memory will cause paging/swapping of memory Swapping is very costly for a Java process – Particularly affects Garbage Collection performance Garbage collection touches every point of memory in the process All memory therefore would need to be paged back in Leads to long “mark” and “sweep” phases of GC

22 Resource Contention: CPU Insufficient CPU time availability will reduce performance – Normally surfaces when something periodically takes CPU time on the box, eg. Cron Jobs running batch applications Database backups

23 System Resource Contention: Solutions Ensure there are enough resources! Where resource can contention occurs it is important to ensure the Java application has its pool of resources Isolation be achieved on some platforms using LPARs/WPARs/ Zones Otherwise move other applications onto separate machines

24 Garbage Collection Performance

25 Garbage Collection Performance GC performance issues can take many forms Definition of a performance problem is very user centric – User requirement may be for: Very short GC “pause” times Maximum throughput A balance of both First step is ensure that the correct GC policy has been selected for the workload type – Helpful to have an understanding of GC mechanisms Second step is to look for specific performance issues

26 Object Allocation Requires a contiguous area of Java heap Driven by requests from: – The Java application – JNI code Most allocations take place in Thread Local Heaps (TLHs) – Threads reserve a chunk of free heap to allocate from Reduces contention on allocation lock Keeps code running in a straight line (fewer failures) Meant to be fast – Available for objects < 512 bytes in size Larger allocates take place under a global “heap lock” – These allocations are one time costs – out of line allocate – Multiple threads allocating larger objects at the same time will contend

27 Object Reclamation (Garbage Collection) Occurs under two scenarios: – An “allocation failure” An object allocation is requested and not enough contiguous memory is available – A programmatically requested garbage collection cycle call is made to System.GC() or Runtime.GC() the Distributed Garbage Collector is running call to JVMPI/TI is made Two main technologies used to remove the garbage: – Mark Sweep Collector – Copy Collector IBM uses a mark sweep collector – or a combination for generational

28 Global Collection Policies Garbage Collection can be broken down into 2 (3) steps – Mark: Find all live objects in the system – Sweep: Reclaim unused heap memory to the free list – Compact: Reduce fragmentation within the free list All steps are in a single stop-the-world (STW) phase – Application “pauses” whilst garbage collection is done Each step is performed as a parallel task within itself Four GC “Policies”, optimized for different scenarios – -Xgcpolicy:optthruputoptimized for “batch” type applications – -Xgcpolicy:optavgpauseoptimized for applications with responsiveness criteria – -Xgcpolicy:genconoptimized for highly transactional workloads – -Xgcpolicy:subpoolsoptimized for large systems with allocation contention

29 Introduction to GCMV Garbage Collection and Memory Visualizer – Verbose GC data visualizer – Eclipse based tool available as plugin in ISA and as a standalone tool. – Parses and plots all verbose GC logs – Extensible to parse and plot other forms of input – Provides graphical display of wide range of verbose GC data values – Handles optthruput, optavgpause, and gencon GC modes – Has raw log, tabulated data and graph views and can save data to jpeg or.csv files (for export to spreadsheets)

30 GCMV usage scenarios Investigate performance problems – Long periods of pausing or unresponsiveness Evaluate your heap size – Check heap occupancy and adjust heap size if needed Garbage collection policy tuning – Examine GC characteristics, compare different policies Look for memory growth – Heap consumption slowly increasing over time – Evaluate the general health of an application

31 Application Code Performance

32 The Healthcenter

33 Evaluating Your Application through the Healthcenter Answers to.. – What is my Java application doing ? – Why is it doing that ? – Why is my application going so slowly ? – Is my application scaling well ? – Do we need to tune the JVM ? – Am I using the right options? Available from/as a part of – –

34 Health Center Overview

35 Environment Subsystem Shows – Version information for the JVM – Operating system and architecture information for the monitored system – Process ID – All system properties – All environment variables

36 – Shows all loaded classes – Shows classes loaded time – Visualizes classloading activity – Identifies shared classes – Makes recommendations Classes Subsystem

37 GC Subsystem - Shows Used Heap (after collection) & GC pause times - Identify memory leaks - Provides tuning recommendations and analysis of GC data

38 Locking Subsystem - Always-on lock monitoring - All lock usage is profiled such as lock request totals, blocking requests and hold times - Helps to identify points of contention that prevents the application from scaling

39 Profiling Subsystem - Sampling based profiler - Instantly identifies hottest methods in an application - See full call stacks to identify where methods are being called from and what methods they call

40 Features (New) I/O – Provides File open events – Provides File close events – Provides Details of files that are currently open Native Memory – Provides native memory usage of the process and system monitored – Does not provide a native memory perspective view for the z/OS® 31- bit or z/OS 64-bit platforms.

41 Merci Grazie Gracias Obrigado Danke Japanese English French Russian German Italian Spanish Brazilian Portuguese Arabic Traditional Chinese Simplified Chinese Hindi Tamil Thai Korean Teşekkürler turkish Thank You

42 Special notices © IBM Corporation All Rights Reserved. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. The following are trademarks of the International Business Machines Corporation in the United States and/or other countries: ibm.com/legal/copytrade.shtmlAIX, CICS, CICSPlex, DataPower, DB2, DB2 Universal Database, i5/OS, IBM, the IBM logo, IMS/ESA, Power Systems, Lotus, OMEGAMON, OS/390, Parallel Sysplex, pureXML, Rational, Redbooks, Sametime, SMART SOA, System z, Tivoli, WebSphere, and z/OS. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.