JVM Monitoring – Best Practices

Slides:



Advertisements
Similar presentations
Tales from the Lab: Experiences and Methodology Demand Technology User Group December 5, 2005 Ellen Friedman SRM Associates, Ltd.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Topics to be discussed Introduction Performance Factors Methodology Test Process Tools Conclusion Abu Bakr Siddiq.
Performance Testing - Kanwalpreet Singh.
Web Performance Tuning Lin Wang, Ph.D. US Department of Education Copyright [Lin Wang] [2004]. This work is the intellectual property of the author. Permission.
Networking Essentials Lab 3 & 4 Review. If you have configured an event log retention setting to Do Not Overwrite Events (Clear Log Manually), what happens.
JVM Monitoring – Best Practices
® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
G. Alonso, D. Kossmann Systems Group
Introduction to DBA.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Keeping our websites running - troubleshooting with Appdynamics Benoit Villaumie Lead Architect Guillaume Postaire Infrastructure Manager.
©2011 Quest Software, Inc. All rights reserved.. Database Management Martin Rapetti Business Development Manager.
Capacity Planning and Predicting Growth for Vista Amy Edwards, Ezra Freeloe and George Hernandez University System of Georgia 2007.
Swami NatarajanJune 17, 2015 RIT Software Engineering Reliability Engineering.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 10: Collect and Analyze Performance Data.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
©Company confidential 1 Performance Testing for TM & D – An Overview.
What Can You do With BTM? Business Transaction Management touches the following disciplines:  Performance Management  Application Management  Capacity.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Chapter Ten Performance Tuning. Objectives Create a performance baseline Create a performance baseline Understand the performance and monitoring tools.
Understanding and Managing WebSphere V5
Performance and Capacity Experiences with Websphere on z/OS & OS/390 CMG Canada April 24, 2002.
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
Introduction and simple using of Oracle Logistics Information System Yaxian Yao
MCTS Guide to Microsoft Windows 7
2 Copyright © 2006, Oracle. All rights reserved. Performance Tuning: Overview.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Module 7: Fundamentals of Administering Windows Server 2008.
Copyrighted material John Tullis 10/6/2015 page 1 Performance: WebSphere Commerce John Tullis DePaul Instructor
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
*** CONFIDENTIAL *** © Toshiba Corporation 2008 Confidential Creating Report Templates.
Event Management & ITIL V3
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
® IBM Software Group © 2007 IBM Corporation Best Practices for Session Management
Click to add text © 2012 IBM Corporation Design Manager Server Instrumentation Instrumentation Data Documentation Gary Johnston, Performance Focal Point,
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
© 2013 IBM Corporation IBM Tivoli Composite Application Manager for Transactions Transaction Tracking Best Practice for Workspace Navigation.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Windows Server 2003 系統效能監視 林寶森
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
LOAD RUNNER. Product Training Load Runner 3 Examples of LoadRunner Performance Monitors Internet/Intranet Database server App servers Web servers Clients.
If you have a transaction processing system, John Meisenbacher
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Improve query performance with the new SQL Server 2016 query store!! Michelle Gutzait Principal Consultant at
SQL Database Management
Understanding the New PTC System Monitor (PSM/Dynatrace) Application’s Capabilities and Advanced Usage Stephen Vaillancourt PTC Technical Support –Technical.
Practical Database Design and Tuning
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
性能测试那些事儿 刘博 ..
Software Architecture in Practice
Architecture & System Performance
Architecture & System Performance
SQL Server Monitoring Overview
Software Architecture in Practice
Chapter 9: Virtual-Memory Management
Predictive Performance
Practical Database Design and Tuning
Presentation & Demo August 7, 2018 Bill Shelden.
Performance And Scalability In Oracle9i And SQL Server 2000
Presentation transcript:

JVM Monitoring – Best Practices Steve Klopfer Technical Specialist, IBM scklopf@us.ibm.com

Definitions Monitoring – Observing performance data in real time to find and correct resource, throughput, or response time problems. Trending – The analysis of data with the intention of identifying discernable patterns. Forecasting – The projection of those identified patterns on business growth patterns to understand the impact on business processes. Capacity planning – The response to forecasts that ensures the integrity of business processes. Usage patterns Heap usage patterns Resource utilization patterns Response time Patterns

Capacity/Load model

Typical WAS/J2EE Application Components Component interactions J2EE components Back-end connectors CPU (AIX, Solaris, Windows) Production JVM (AIX, AS400, HP-UX, Linux, Solaris, Unix, Windows, OS/390, z/OS) J2EE Application Servlet EJB CICS Transaction Gateway HTTP Server Mainframe MQSeries Connector plugin EJB JDBC Driver Back-end systems Thread Pool EJB Pools JDBC Pools Application Server Database Memory Management J2EE Services File and Network I/O Customer Transactions

NOTES This is a typical WAS/J2EE application component overview. Add to this the topology of your environment especially in an ND environment that will add deployment manager, Node agents and clusters etc. What we see from there is there are various configuration points and more the configuration points more the chances of problems. It is good to keep this in perspective in trying to troubleshoot problems in the WAS environment.

What kinds of Problems does JVM Monitoring Help Solve? Request / Transaction problems Slow or Hung requests Intermittent performance problems Correlation to remote EJB containers, CICS, IMS, MQ Real time diagnostics In flight request search and diagnose capability with Java stack trace and thread dumps in real time Memory leaks Monitor JVM heap size, memory usage and garbage collection patterns, Heap snapshots Resource monitoring Connection Pools, JDBC, Thread pool, etc Non-intrusive Diagnostic data collection for key application components JMS, SCA, Portlets (ITCAM for WS only), Web Services, etc. Problem Situation Automation Alerts and Traps for hard to re-create problems and problem context for later diagnosis Problem recreation Provides production data for hard to re-create problems via integration with Rational Performance Tester (RPT) and IBM Performance Optimization Toolkit (IPOT) How is it doing today and how will it do tomorrow? Historical and Trending reports

NOTES Here are some of the problems that ITCAM for WebSphere and J2EE Help Solve. -- many more… drill down to a specific request, it’s method and stack trace, use L3 to collect method profiling to get data for every method entry and exit. A new monitoring level called method profiling which is NOT method tracing. Method profiling is light weight than method tracing and works at level L2. When setting L2 it asks for this special level which can be turned on – on demand without restarting the application server.

Questions to Ask when troubleshooting Is the problem re-creatable? Did it ever work? If it did, what changed – configuration, additional installation, product upgrade etc. Does environment matter e.g. works in test/development but not in production What is the topology of the environment What external systems are involved? Any connectivity (firewall), security – authentication, expired passwords issues? Is there any workload considerations Is the problem happening under heavy workloads? Network or bandwidth issues? Is there a pattern to the problem e.g. every Monday morning at 10 AM? These are generic troubleshooting checklist that will apply to any kind of problem not just in the WebSphere environment… Most of the time errors are configuration changes where human typing is involved or a process is not followed.

NOTES These are generic troubleshooting checklist that will apply to any kind of problem not just in the WebSphere environment… Most of the time errors are configuration changes where human typing is involved or a process is not followed.

What must a good monitoring product do? A clever person solves a problem. A wise person avoids it. -- Einstein It must monitor the environment 24 X 7. Real time visualization tools are not adequate unless you plan on having highly paid analysts monitoring these tools 24 X 7. It must support intelligent alerting Alerting tools must acquire and correlate metrics from multiple sources. It must exhibit a depth of monitoring across the breadth of technologies that spans, at minimum, end-user experience (both real and synthetic), application servers, and data base servers.

Monitoring Levels Vertical levels, not Horizontal levels Monitoring On Demand Change monitoring level as needed without restarting either the applications or the application servers No need to pinpoint specific classes or methods in advance (i.e., no need to designate what needs to be monitored) “Level 1” – Request Level - Production 100% of System Resource information 100% of incoming requests/transactions “Level 2” – Component Level – Problem Determination View major application events (EJB’s, servlets, JDBC, JNDI, etc.) “Level 3” – Method Level - Tracing Adds method trace information for problem determination and performance analysis.

Using the Tool Efficiently Everyone assumes they need method level data for every transaction in Production What would you do with that much data? Gain Application/Transaction Understanding in Test/QA, workload understanding in Production Use Traps and Alerts to find anomalies and collect detailed data Test/QA Use L2/L3 for Transaction/Application Analysis Top Methods Used (L3) Most CPU Intensive methods (L3) Top Slowest Methods (L3) Transaction Component (L2) Trace Transaction Method (L3) Trace SQL Profile (L2)

Application Performance Analysis Work with Defined Objectives Throughput / Response time Goals from SLA’s Identify and Fix any Performance Problems Early Slow Transactions, Memory Leaks, WebSphere Performance Tuning Best Practices for Performance Tuning and Analysis Collect the information about the applications and the environment. Identify Key Transactions Conduct Transaction Profiling Conduct Workload Profiling Measure the baseline matrix for various performance parameters before tuning Leverage your tools in conjunction with load testing tools to analyze and tune application performance

Focus on Best Practices Identify all key transactions in the workload mix Most frequently used Most important to application Set workable limit e.g. 10-20 Conduct Transaction Profiling to obtain basic understanding of what these key transactions do Code Flow (component and method level) Component Profile Method Profile Event timings for each component and method

NOTES for Slide 14 To conduct a performance analysis on a WebSphere application, here are the commonly used procedures: Pick a meaningful workload and form a workload mix using a simulator (RPT or LoadRunner). Pick most frequently used transactions, or the most important ones. The most heavy transactions (CPU wise) may not be relevant if only used rarely. Limit to 10-20 and represent majority of the production work. First conduct Transaction Profiling using a single client run to run through the selected workload mix (e.g. scripts). Turn on IT CAM for WebSphere at L1 for total CPU cost L2 for Component tracing and profiling (flow in terms of EJB, JDBC, JNDI, JMS, MQI, JSP, Servlet) and L3 for Method tracing and profiling. Then conduct load testing using IT CAM for WebSphere at L1 for systems resources analysis, identify bottleneck and tuning, L2 for component performance under load, and L3 for method behavior when in contention. Use PAR for all sorts of analyses

Transaction Profiling Transaction Profiling refers to tracing the entire execution of a selected request (HTTP or EJB invocation) Normally the best practice is to prepare a single user automated test script that fires off such transactions with a think time in between invocations At L2 monitoring level, the data is shown at J2EE component Level with contextual data JSP, EJB, JMS, MQI, JDBC, JNDI At L3, full application class/method trace will be collected by default

NOTES for Slide 16 Transaction Profiling is achieved by a L2 or L3 tracing of a request. A request can be a URL or inbound remote EJB invocation in ITCAM for WAS. At L2, full trace of the transaction in terms of Java component events are fairly easy to capture without much work. At L3, the amount of trace records generated will be so fast that there are times that records will have to be dropped from the DC side (bounded memory queue) before they got pumped out to the publish server. Hence running such an exercise using a single client and putting in a reasonable think time between transactions will pace out the flow of the traffic and help the capturing of the whole flow. Also some tuning will help: Ensure methods per request is set to high number via the administration pane (1 or 2M is fine) Ensure DB2 on the Management server has 200-300K Sort Space Wall clock times are shown running from one entry to the next entry. Timing is in terms of millisecond. CPU clock times are shown running from one entry to the next entry, and is only related to the thread that is executing the code (and not the other subsystem service such as DB2), and this is particularly true in L3 trace and not always accurate for L2 trace (some L2 trace could show events under one thread but in fact they could have happened under a different thread, uncommon in J2EE but does happen if coder starts a new thread to do so). Timing is in terms of a tenth of millisecond. CPU clock is obtained through JVMPI API or JVMTI soon. Some platforms tend to have limited resolution of CPU clock time. For example, in Windows 2000 JDK 1.3 the resolution appears to be 15.625 ms.

Workload Analysis Workload Analysis refers to running the applications via a Traffic Simulator with a number of clients Monitoring Tool is normally running at L1 for this type of analysis, with a sampling rate under 10% Normally the best practice is to prepare a multi-user automated test script that fires off transactions in the right mix that represents the ‘production’ workload

NOTES for Slide 18 Workload analysis is important to understand the behavior of the applications under load and the systems resources required to sustain the demand from the applications at a certain throughput rate. This can be used to identify bottlenecks in the systems as well as the applications. ITCAM for WAS is normally running at L1 or L2 for this type of analysis, with a sampling rate under 10% To run L3 under load one would have to accept the fact that some trace records will be lost and a longer run will be required to collect a good set of sampled records in the database for analysis. Also, extra tuning on the DC and MS will be required. Each run should be at least 30-60 minutes long to observe the systems at steady state.

Workload Analysis Each run should be at least 30-60 minutes long to observe the system at Steady State During steady state, analysis can be conducted on a large number of metrics: Heap, CPU, paging, throughput, response time, WebSphere resource pools, GC activities etc.. At the end of the run, a graph of CPU% vs. Throughput Rate should be plotted. Any non-linearity of the behavior of the workload should be explained, bottlenecks eliminated, and a re-run until a relatively linear line is obtained More reports can be drawn from Performance Analysis & Reporting (PAR)

Additional Performance Tuning Tips - 1 Here are a few other things that we can try to help improve performance. Please note, that these suggestions are given without detailed knowledge of the environment / architecture / open issues. Increase web container max keep-alives. Increase web container thread pool. Increase database connection pool. Adjust maximum and minimum heap sizes. Disable explicit garbage collection. Enable concurrent I/O at o/s level. Pre-compile JSPs. Increase the priority of the app server process at o/s level.

Additional Performance Tuning Tips - 2 If there are many short living objects, tuning NewSize and MaxNewSize JVM parameters would help. Changing ulimit for operating system (AIX, Solaris) may help improve performance. Enable dynamic caching, if possible. Creating new indexes or re-organizing indexes will help improve performance of database intensive transactions. Adjusting prepared statement cache size may also help. Adjust O/S parameters: tcp_time_wait_interval and tcp_fin_wait_2_flush_interval.

Example: Workload Analysis Let’s move on to perform some systems wise analysis on the workload.

Verify System, Java and App Server Runtime Environment Check Environmental Consistency Ensure Platform Can Support Application Verify System, Java and App Server Runtime Environment

NOTES for Slide 24 It is always a good idea to know (or validate) the environment (hw/sw) under which the application is running. Use Software Consistency Check -> Environment Check and you will be able to get more details of the server where WebSphere is running. In this case, the box is a 4-way 3.4GHz Intel Linux box and have memory up to 1.2G. Very powerful. This also set the psychology of the analysis – don’t always start blaming that the box is likely to be small because it is not !!

Check Server Statistics Compare key performance metrics side-by-side Quick overview of application impact on monitored servers Shows paging and load balancing in clustered deployments Ensures overall throughput matches expected results from load generator

NOTES for Slide 26 Next we want to observe the server behavior over a period (e.g. 5-10 minutes interval) - # of transactions going through, JVM CPU, Platform CPU, and heap usage etc. Follow the link to get to Server Statistics Overview (SSO) and you should get the details. The following observations should be made during this 5 minutes: If during the observation excessive paging rate (>10/s) or 90-100% total CPU% is recorded you should realize that the workload is exceeding the capacity (memory or CPU power) of the box. At this point you either have to 1) reduce the throughput rate or # of clients 2) increase the amount of real memory or upgrade CPU power 3) look into why such a large footprint or high CPU usage by the application code which entails a more in depth look into the code If total CPU% is much bigger than JVM CPU% then you might also want to check if there are other jobs/processes running on the same box that you do not intend to include.

NOTES for Slide 26 3. The delta volume of transactions are transactions completed over a period of normally 15 seconds (unless you have changed the default refresh period). Calculate the throughput rate per second. Also calculate the maximum theoretical throughput rate as: # of simulated sessions x 1 / (think time in second between each transaction). If there is a big discrepancy that means transactions are either taking too long to complete within WebSphere or they are not getting into the server due to ‘thread pool size constraint’ resulting in queuing. Verify the thread pool usage using the Systems Resource Overview page and the average response time of the transaction in the Recent Requests page. If you are not getting the target throughput rate these two questions must be answered. If the internal response of these transactions are short that means the requests are not getting into the web container fast enough. If the thread pool is indeed full you should increase the thread pool size to allow more throughput (thread pool could be the web container one or the ORB pool for inbound remote EJB). If the thread pool is not full but the response time is excessively long then this warrants further work to decompose the transaction pathlength. If thread pool is not full and response time is looking pretty good then the only reason is that the network between the simulation box and the application server box is slow, or the simulator box is not generating traffic fast enough because it is constrained by its hardware performance. In this case you either have to increase more memory or CPU power to the simulator or reduce the # of simulated clients. 4. If there are multiple WebSphere instances/regions running in the server or server group also check if the total # of transactions are relatively evenly distributed across them. If not you should learn that load balancing scheme is not working as desired and you may want to adjust the scheme accordingly.

Validate Throughput vs Response Time Quantify Application Scalability Graphical report showing number of requests over time Correlated plot of response time during stress test relative to request rate

NOTES for Slide 29 Generally for a good, unconstrained system (a well tuned and well resourced system) the internal response of a transaction should not vary too much even as throughput rate goes up. This is of course depending on whether the resource supply can cater for the increase in the throughput rate. In this graph, it will appear so. The response time is ranging at 34ms even at a throughput rate of 110 TPS. Also verify the responsiveness against that reported by the simulator.

Request rate during stress run (same as prior slide) Calculate Throughput vs. JVM CPU% Verify target transaction per second rate achievable Request rate during stress run (same as prior slide) Correlated plot reveals low JVM CPU consumption even as throughput increases Generally as throughput rate goes up the JVM CPU% should go up accordingly. You cannot get more work done without burning more fuel. In Java, you will get certain CPU% fluctuation as classes are compiled and hence execution becoming more efficient as throughput goes up. Hence, you will need to observe for a longer period to let the system stabilizes. Also verifies the throughput rate with that reported by the simulator. As the throughput rate stabilizes, note the average CPU%. This gives you a level of confidence if you have enough CPU power to reach your target throughput rate assuming you can tune around other bottlenecks (ultimately CPU is your biggest enemy). In this graph the SMP is running at 35% at 110 TPS. If the target is 160 TPS there is ample evidence that this target rate is highly achievable assuming the software bottlenecks can be tuned or avoided.

NOTES for Slide 31 Generally as throughput rate goes up the JVM CPU% should go up accordingly. You cannot get more work done without burning more fuel. In Java, you will get certain CPU% fluctuation as classes are compiled and hence execution becoming more efficient as throughput goes up. Hence, you will need to observe for a longer period to let the system stabilizes. Also verifies the throughput rate with that reported by the simulator. As the throughput rate stabilizes, note the average CPU%. This gives you a level of confidence if you have enough CPU power to reach your target throughput rate assuming you can tune around other bottlenecks (ultimately CPU is your biggest enemy). In this graph the SMP is running at 35% at 110 TPS. If the target is 160 TPS there is ample evidence that this target rate is highly achievable assuming the software bottlenecks can be tuned or avoided.

Throughput vs. Garbage Collection (GC) Tune JVM to minimize GC frequency Increased heap size impacting GC rate although < = 6 per minute appears to be affordable as response time remains < 34 ms Request rate during stress run ! GC frequency not in steady state as throughput rises

NOTES for Slide 33 Frequent GC can cause excessive CPU% as it is a relatively expensive operation. Also if coupled with paging a long GC operation can cause delay to transactions. You should use this analysis to see if during the steady state at a certain throughout rate if your JVM is experiencing high GC frequency. In this graph the system is performing GC 6 times a second which is sort of high. But in previous analysis we learnt that the JVM CPU% is not too excessive. Also overall response time is 34ms. Seems like this GC rate is still affordable by the system. The GC frequency is clearly related to the throughput rate in this case according to the graph. The more throughput the more heap used hence the more frequent hitting the max heap size threshold resulting in more GC operations. By raising the max heap size of the JVM one can reduce the # of GC rate and perhaps bring down JVM CPU% a little.

Throughput vs. Total GC time Avoid paging (has large effect on end user response time) Request rate ramps and tops out ! Total time for GC to complete per cycle correlated with request rate Excessive and persistently high total GC time

NOTES for Slide 35 Prolonged GC time can cause delay to response time of transactions and excessive CPU utilization. Normally a GC cycle takes fraction of a second to finish. If it is taking seconds, you need to find out the reason. A big heap size is one of the reasons (e.g. 2G), but it only becomes detrimental if it is coupled with paging (not enough real memory to back the virtual heap). There are times that there are too many JVMs with huge heap size (e.g WAS/z) such that the amount of available real memory is not enough to sustain such demand - paging during GC becomes inevitable. In this case, lower the # of servant regions to create a better working set would be a better solution. Fragmentation is another one. Using min=max heap is no longer the best practice in the new JDK. In the graph, the total GC time per minute ranges about 1 second per minute so it isnt bad at all. Also using previous analysis there were about 6 GCs per minute so each GC takes average 125ms, which does not indicate a problem.

Throughput vs. Heap size after GC Good indicator of potential memory leaks Request rate during stress run (same as prior slide) No growth in heap under increased load proves no detectable leaks If you can maintain a steady throughput rate into your system, this analysis gives you a definitive answer if your application leaks memory or not. The amount of heap left after GC, given a steady throughput rate, over a period of time (e.g. minutes), should not change much, if GC works the way it is expected and your application does not leak objects. In this graph, the application ran from 0 to 110 TPS and throughout the half an hour period the amount of heap left after each GC remains pretty constant, hence no memory leak symptom! Shows well-tuned heap size as little if any growth during high throughput

NOTES for Slide 37 If you can maintain a steady throughput rate into your system, this analysis gives you a definitive answer if your application leaks memory or not. The amount of heap left after GC, given a steady throughput rate, over a period of time (e.g. minutes), should not change much, if GC works the way it is expected and your application does not leak objects. In this graph, the application ran from 0 to 110 TPS and throughout the half an hour period the amount of heap left after each GC remains pretty constant, hence no memory leak symptom!

WebSphere Resources Utilization Analysis Verify application does not over-tax app server resources Overall we see low J2EE resource consumption Saturated thread pool – good candidate for tuning !

NOTES for Slide 39 Use Systems Resources page to visualize the utilization of the software resources that are provided within WebSphere. In general WebSphere runtime is not too complicated in its design and usage. Hence, not too much obvious tuning opportunities. Web Container Thread Pool utilization, ORB thread pool utilization (EJB), and JDBC Connection Pool utilization (and thread wait time) are probably the 3 most frequently used analysis. One word of caution. Seeing pool full situation does not mean that one should be increasing the resources pool size to avoid the situation. Pool full could be a symptom of slow response time – longer residency time hence occupying pools resources for a longer time. This should be analyzed in conjunction with other things such as response time trends of the resource concerned: for example, for WebContainer thread pool it will be the average servlet response time, for JDBC pool it will be the average JDBC response time, for ORB pool it will be that of the remote EJB. When heap gets filled it triggers GC. After GC free heap should get re-used hence as long as the heap pool ‘utilization’ is not constantly on the high side it is normal to see heap pool utilization occasionally goes up to 90%+ and down. The utilization of heap pool should constantly go up and down like a yoyo.

Check Average CPU time per Transaction Based on threads running application classes in workload mix Otherwise normal consumption rates Spikes showing high consumption at random intervals

NOTES for Slide 41 In IT CAM for WebSphere, average CPU time per transaction is computed on the thread that runs the application classes. The PAR report can be used to compile the average CPU time per transaction across a time interval. Given that one can drive different throughput rates in different periods of a run PAR can report average CPU time per transactions at a lower and higher throughput rates. One should expect the variation not to be huge, perhaps only within 10-20 ms. Anything higher indicates some part of the system or application has code whose CPU performance is load dependent and deserves further investigation. Let’s look at the graph. Let’s assume each transaction takes 10ms CPU time. That means each CPU can take maximum 1/0.01= 100 TPS assuming no bottleneck and everything runs linearly, and the simulator can sustain the drive. Assuming all conditions stay the same, for a 4-way SMP one would estimate the maximum throughput rate this system can sustain will be between 300 and 400, likely to be 320/330 as a practical estimate.

Transaction with very high CPU in spike interval Check Average CPU time per Transaction Based on threads running application classes in workload mix Transaction with very high CPU in spike interval In IT CAM for WebSphere, average CPU time per transaction is computed on the thread that runs the application classes. The PAR report can be used to compile the average CPU time per transaction across a time interval. Given that one can drive different throughput rates in different periods of a run PAR can report average CPU time per transactions at a lower and higher throughput rates. One should expect the variation not to be huge, perhaps only within 10-20 ms. Anything higher indicates some part of the system or application has code whose CPU performance is load dependent and deserves further investigation. Let’s look at the graph. Let’s assume each transaction takes 10ms CPU time. That means each CPU can take maximum 1/0.01= 100 TPS assuming no bottleneck and everything runs linearly, and the simulator can sustain the drive. Assuming all conditions stay the same, for a 4-way SMP one would estimate the maximum throughput rate this system can sustain will be between 300 and 400, likely to be 320/330 as a practical estimate.

In IT CAM for WebSphere, average CPU time per transaction is computed on the thread that runs the application classes. The PAR report can be used to compile the average CPU time per transaction across a time interval. Given that one can drive different throughput rates in different periods of a run PAR can report average CPU time per transactions at a lower and higher throughput rates. One should expect the variation not to be huge, perhaps only within 10-20 ms. Anything higher indicates some part of the system or application has code whose CPU performance is load dependent and deserves further investigation. Let’s look at the graph. Let’s assume each transaction takes 10ms CPU time. That means each CPU can take maximum 1/0.01= 100 TPS assuming no bottleneck and everything runs linearly, and the simulator can sustain the drive. Assuming all conditions stay the same, for a 4-way SMP one would estimate the maximum throughput rate this system can sustain will be between 300 and 400, likely to be 320/330 as a practical estimate.

Example: Transaction Analysis Methodology Let’s move on to perform some systems wise analysis on the workload.

Analyze Transaction Instances of Interest Show “Level 2” J2EE component-level events High-precision timing measurements for each event call Sequential view of event execution / flow Highlighted JCA calls exhibit high delta CPU timing difference ! L2 trace provides an in-context flow view of a request. You should be able to identity how long it takes for all the major services used by a typical web request – hence a #1 killer function to analyze responsiveness of a transaction confirming if the delay comes from the application or not. Also L2 trace is a lot more powerful than just observing summary statistics through PMI or other resource monitors, as this level of data is at instance level and get help pin-point instance based performance problem which cannot easily be caught by average based summary data. The entry/exit trace are produced by the xml based BCM code that are applied to the WebSphere classes with agreement from the WebSphere development. It is out of the box ability. If time is spent between an entry/exit pair it indicates that it is the service itself and not the application that is generating the delay or cost. However, certain service performance is due to the way the service is invoked too. For example, long JDBC time does not necessarily mean the problem being in the database server. It can be the way the query is constructed, hence the JDBC string. It is not the intent for this analysis to save you the trip to go back to the code to explain the observations. It is not always possible to do that (collecting the entire URL and the full sql string). However, IT CAM for WebSphere wants to give you enough hints to get back to development to describe the problem. IT CAM for WebSphere does not replace developers. If time is spent between an entry/entry, or exit/entry, or exit/exit pair, it indicates that it is the application code (method) or the system that is taking the time.

NOTES for the previous slide L2 trace provides an in-context flow view of a request. You should be able to identity how long it takes for all the major services used by a typical web request – hence a #1 killer function to analyze responsiveness of a transaction confirming if the delay comes from the application or not. Also L2 trace is a lot more powerful than just observing summary statistics through PMI or other resource monitors, as this level of data is at instance level and get help pin-point instance based performance problem which cannot easily be caught by average based summary data. The entry/exit trace are produced by the xml based BCM code that are applied to the WebSphere classes with agreement from the WebSphere development. It is out of the box ability. If time is spent between an entry/exit pair it indicates that it is the service itself and not the application that is generating the delay or cost. However, certain service performance is due to the way the service is invoked too. For example, long JDBC time does not necessarily mean the problem being in the database server. It can be the way the query is constructed, hence the JDBC string. It is not the intent for this analysis to save you the trip to go back to the code to explain the observations. It is not always possible to do that (collecting the entire URL and the full sql string). However, IT CAM for WebSphere wants to give you enough hints to get back to development to describe the problem. IT CAM for WebSphere does not replace developers. If time is spent between an entry/entry, or exit/entry, or exit/exit pair, it indicates that it is the application code (method) or the system that is taking the time.

Further Analyze Transactions Show discreet “Level 3” method-level and nested method events Each row shows method flow and depth Good candidate for tuning due to high delta CPU consumption !

NOTES L3 trace provides an in-context flow view of a request. You should be able to identity how long it takes for all the application methods used by a typical web request – hence a #2 killer function to analyze responsiveness and CPU usage of a transaction confirming if the delay or CPU usage comes from the application or not. The entry/exit trace are produced by the xml based BCM code that are applied to the application code under the installed application directory. It is out of the box ability. The bytecode injected works like a wrapper of a method simply starts and stops a timer. If time is spent between a consecutive entry/exit pair it indicates that it is the application method itself that is causing the delay or CPU usage. If time is spent between a consecutive entry/entry pair it indicates that it is code that got executed after the first method entry and before the second method entry that is causing the delay or CPU usage. This can be the application code in the first method, code that got filtered out by the IT CAM for WebSphere filters, or the non-application methods (IBM, Sun ..) that got invoked as a result of it. You will require IBM service to help customize the xml files in order to track the non-application classes/methods in the L3 trace.

NOTES If time is spent between a consecutive exit/entry pair it indicates that it is code that got executed after the exit of the first method (so it gets back to the outer method) and before the same outer method invokes the second method (entry) that is causing the delay or CPU usage. This can be the application code that is in the outer method, code that got filtered out by the IT CAM for WebSphere filters, or the non-application methods (IBM, Sun ..) that got invoked in between. You will require IBM service to help customize the xml files in order to track the non-application classes/methods in the L3 trace. If time is spent between a consecutive exit/exit pair it indicates that it is code that got executed after the exit of the first method (so it gets back to the outer method first) and before the same outer method exits that is causing the delay or CPU usage. This can be the application code in the outer method, code that got filtered out by the IT CAM for WebSphere filters, or the non-application methods (IBM, Sun ..) that got invoked. You will require IBM service to help customize the xml files in order to track the non-application classes/methods in the L3 trace. Since the amount of trace in typically runs into tens of thousands use the highlighting trick to spend time on the ‘worthwhile’ hotspots only.

Analyze SQL Profile Check the response time for various queries. Use the data in conjunction with Top used queries report. Tune queries.

NOTES for Slide 51 Summary reports on SQL (JDBC calls) are available for delay analysis. This can be done by table name, sql type, source of the JDBC call (class/method). This type of analysis can also be obtained in a loaded environment as L2 type data is likely to be quite easy to collect even under medium load. Some SQLs only perform badly when under contention. Again it is not possible to show the entire sql string so some reference back to the source code may be required in some cases. Both class and method are listed as the source of the JDBC call.

Check for Top Methods Used Identify hot methods by count Total Invocation Count Names of hot methods !

NOTES A series of single-threaded tests were conducted with a single-client simulation scripts when IT CAM for WebSphere was run at L2 and L3. Trace at different levels were turned on and data collected. One common IT CAM for WebSphere analysis at the code level is ‘hot methods’. We believe the biggest bang for the bucks comes from improving the CPU and response times of the MOST frequently used methods, and not necessarily the most expensive methods. This type of analysis is better accomplished with the development staff.

CPU consumption for each method Check for Most CPU-Intensive Methods Correlate hot methods by CPU cost with highest count methods CPU consumption for each method Names of hot methods !

Notes After you have picked the top 5 or 10 most frequently used methods use this league table to find out how much damage they have cost to your hardware. If the objective is to reduce CPU usage this is the right analysis to complement the Top Methods Used Analysis. You must be reminded that the timings of a method include all the individual inner methods that are invoked/repeatedly invoked. IT CAM for WebSphere does not give you a pure method timing but a total one. Also these timings are recorded on a specific type of CPU hardware when the test was done. If you were mixing hardware platforms (in a group analysis) when you compiled this report the values will not be useful. You must always choose a set of boxes of the same hardware type if you want to compile CPU timings out of a group of servers. By the same token if you are to project this CPU timings on a different environment you should be aware of the same issue.

High average response time per method Check for Slowest Methods Correlate with hot methods to evaluate total contribution to response time High average response time per method Names of slow methods !

Notes Similar to the previous analysis except this is based on wall clock time. After you have picked the top 5 or 10 most frequently used methods use this league table to find out how slow they have cost to your transaction. If the objective is to improve responsiveness this is the right analysis to complement the Top Methods Used Analysis. Wall clock timings are the same everywhere so the issues of hardware dependency are irrelevant here in ‘group’ analysis, as long as none of your boxes are traveling at near light speed when your test was doe, 8-)

Example: Memory Leak Analysis Let’s move on to perform some systems wise analysis on the workload.

Memory Analysis Reporting Quick check to detect presence of a leak Upward slope indicates possibility of a “slow” memory leak Constant request rate correlated with JVM Heap Size

Memory Leak: Avg. Heap Size after GC vs. Requests Average Heap Size after GC vs. Number of Requests: Verify that a leak exists with the Avg. Heap Size After GC Graph. Check to see if it is due to an increasing number of requests. To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Change Metrics.

Memory Leak: Average Heap Size after GC vs. Live Sessions Average Heap Size after Garbage Collection (GC) vs. Live Sessions: Verify that a leak exists with the Avg. Heap Size After GC Graph Check to see if it is due to an increasing number of users. To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Select Metrics.

Notes Additional analysis on Avg Heap Size after GC vs Live Sessions complements. This analysis will reveal heap leaking due to # of live sessions if Avg Heap Size after GC vs Throughput analysis does not indicate leakage. Logged in users might not be doing much throughput but still can contribute to heap loss after they have logged off. This analysis exists in pre-3.2 release.

Find Leaking Candidates Production-friendly heap-based analysis Comparison of heap snapshots shows suspected leak candidates Application class that appears to have some growth Class name filters

Zero in on leaking code View suspected classes and allocating methods Each ‘allocation pattern’ uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code ! Indicates the specific point in the application code where this object set was allocated from

Zero in on leaking code (scroll from previous page) View suspected classes and allocating methods Each ‘allocation pattern’ uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code Large number of surviving objects since last GC Additional code and GC performance details help developers isolate leak and optimize JVM

View References to Live Objects Confirm Allocating Class Also shows other objects on the heap which contain references to the set of objects being analyzed. Helps pinpoint why objects in question are not getting garbage collected Allocating method and line number in the code