Presentation on theme: "IBM Tivoli JVM Monitoring – Best Practices Steve Klopfer Technical Specialist, IBM"— Presentation transcript:
IBM Tivoli JVM Monitoring – Best Practices Steve Klopfer Technical Specialist, IBM firstname.lastname@example.org@us.ibm.com
IBM Tivoli Definitions Monitoring – Observing performance data in real time to find and correct resource, throughput, or response time problems. Trending – The analysis of data with the intention of identifying discernable patterns. Forecasting – The projection of those identified patterns on business growth patterns to understand the impact on business processes. Capacity planning – The response to forecasts that ensures the integrity of business processes.
IBM Software Group | Tivoli software Capacity/Load model
IBM Software Group | Tivoli software Typical WAS/J2EE Application Components CPU (AIX, Solaris, Windows) CPU (AIX, Solaris, Windows) Component interactions Production JVM (AIX, AS400, HP-UX, Linux, Solaris, Unix, Windows, OS/390, z/OS) Production JVM (AIX, AS400, HP-UX, Linux, Solaris, Unix, Windows, OS/390, z/OS) Application Server J2EE Application EJB Servlet EJB CICS Transaction Gateway MQSeries Connector JDBC Driver Thread Pool EJB Pools JDBC Pools Mainframe Back-end systems Database Memory Management J2EE Services File and Network I/O Customer Transactions J2EE components Back-end connectors HTTP Server plugin
IBM Software Group | Tivoli software Questions to Ask when troubleshooting Is the problem re-creatable? Did it ever work? If it did, what changed – configuration, additional installation, product upgrade etc. Does environment matter e.g. works in test/development but not in production What is the topology of the environment What external systems are involved? Any connectivity (firewall), security – authentication, expired passwords issues? Is there any workload considerations Is the problem happening under heavy workloads? Network or bandwidth issues? Is there a pattern to the problem e.g. every Monday morning at 10 AM?
IBM Tivoli What must a good monitoring product do? A clever person solves a problem. A wise person avoids it. -- Einstein It must monitor the environment 24 X 7. Real time visualization tools are not adequate unless you plan on having highly paid analysts monitoring these tools 24 X 7. It must support intelligent alerting Alerting tools must acquire and correlate metrics from multiple sources. It must exhibit a depth of monitoring across the breadth of technologies that spans, at minimum, end-user experience (both real and synthetic), application servers, and data base servers.
IBM Tivoli Monitoring Levels Vertical levels, not Horizontal levels Monitoring On Demand Change monitoring level as needed without restarting either the applications or the application servers No need to pinpoint specific classes or methods in advance (i.e., no need to designate what needs to be monitored) “Level 1” – Request Level - Production 100% of System Resource information 100% of incoming requests/transactions “Level 2” – Component Level – Problem Determination View major application events (EJB’s, servlets, JDBC, JNDI, etc.) “Level 3” – Method Level - Tracing Adds method trace information for problem determination and performance analysis.
IBM Tivoli Using the Tool Efficiently Everyone assumes they need method level data for every transaction in Production What would you do with that much data? Gain Application/Transaction Understanding in Test/QA, workload understanding in Production Use Traps and Alerts to find anomalies and collect detailed data Test/QA Use L2/L3 for Transaction/Application Analysis Top Methods Used (L3) Most CPU Intensive methods (L3) Top Slowest Methods (L3) Transaction Component (L2) Trace Transaction Method (L3) Trace SQL Profile (L2)
IBM Tivoli Application Performance Analysis Work with Defined Objectives Throughput / Response time Goals from SLA’s Identify and Fix any Performance Problems Early Slow Transactions, Memory Leaks, WebSphere Performance Tuning Best Practices for Performance Tuning and Analysis Collect the information about the applications and the environment. Identify Key Transactions Conduct Transaction Profiling Conduct Workload Profiling Measure the baseline matrix for various performance parameters before tuning Leverage your tools in conjunction with load testing tools to analyze and tune application performance
IBM Tivoli Focus on Best Practices Identify all key transactions in the workload mix Most frequently used Most important to application Set workable limit e.g. 10-20 Conduct Transaction Profiling to obtain basic understanding of what these key transactions do Code Flow (component and method level) Component Profile Method Profile Event timings for each component and method
IBM Tivoli Transaction Profiling Transaction Profiling refers to tracing the entire execution of a selected request (HTTP or EJB invocation) Normally the best practice is to prepare a single user automated test script that fires off such transactions with a think time in between invocations At L2 monitoring level, the data is shown at J2EE component Level with contextual data JSP, EJB, JMS, MQI, JDBC, JNDI At L3, full application class/method trace will be collected by default
IBM Tivoli Workload Analysis Workload Analysis refers to running the applications via a Traffic Simulator with a number of clients Monitoring Tool is normally running at L1 for this type of analysis, with a sampling rate under 10% Normally the best practice is to prepare a multi-user automated test script that fires off transactions in the right mix that represents the ‘production’ workload
IBM Tivoli Workload Analysis Each run should be at least 30-60 minutes long to observe the system at Steady State During steady state, analysis can be conducted on a large number of metrics: Heap, CPU, paging, throughput, response time, WebSphere resource pools, GC activities etc.. At the end of the run, a graph of CPU% vs. Throughput Rate should be plotted. Any non-linearity of the behavior of the workload should be explained, bottlenecks eliminated, and a re-run until a relatively linear line is obtained More reports can be drawn from Performance Analysis & Reporting (PAR)
IBM Tivoli Additional Performance Tuning Tips - 1 Here are a few other things that we can try to help improve performance. Please note, that these suggestions are given without detailed knowledge of the environment / architecture / open issues. Increase web container max keep-alives. Increase web container thread pool. Increase database connection pool. Adjust maximum and minimum heap sizes. Disable explicit garbage collection. Enable concurrent I/O at o/s level. Pre-compile JSPs. Increase the priority of the app server process at o/s level.
IBM Tivoli Additional Performance Tuning Tips - 2 If there are many short living objects, tuning NewSize and MaxNewSize JVM parameters would help. Changing ulimit for operating system (AIX, Solaris) may help improve performance. Enable dynamic caching, if possible. Creating new indexes or re-organizing indexes will help improve performance of database intensive transactions. Adjusting prepared statement cache size may also help. Adjust O/S parameters: tcp_time_wait_interval and tcp_fin_wait_2_flush_interval.
IBM Tivoli Example: Workload Analysis
IBM Tivoli Check Environmental Consistency Ensure Platform Can Support Application Verify System, Java and App Server Runtime Environment
IBM Tivoli Check Server Statistics Compare key performance metrics side-by-side Shows paging and load balancing in clustered deployments Ensures overall throughput matches expected results from load generator Quick overview of application impact on monitored servers
IBM Tivoli Validate Throughput vs Response Time Quantify Application Scalability Correlated plot of response time during stress test relative to request rate Graphical report showing number of requests over time
IBM Tivoli Calculate Throughput vs. JVM CPU% Verify target transaction per second rate achievable Request rate during stress run (same as prior slide) Correlated plot reveals low JVM CPU consumption even as throughput increases
IBM Tivoli Throughput vs. Garbage Collection (GC) Tune JVM to minimize GC frequency Request rate during stress run GC frequency not in steady state as throughput rises Increased heap size impacting GC rate although < = 6 per minute appears to be affordable as response time remains < 34 ms !
IBM Tivoli Throughput vs. Total GC time Avoid paging (has large effect on end user response time) Request rate ramps and tops out Excessive and persistently high total GC time Total time for GC to complete per cycle correlated with request rate !
IBM Tivoli Throughput vs. Heap size after GC Good indicator of potential memory leaks Request rate during stress run (same as prior slide) Shows well-tuned heap size as little if any growth during high throughput No growth in heap under increased load proves no detectable leaks
IBM Tivoli WebSphere Resources Utilization Analysis Verify application does not over-tax app server resources Saturated thread pool – good candidate for tuning ! Overall we see low J2EE resource consumption
IBM Tivoli Check Average CPU time per Transaction Based on threads running application classes in workload mix Spikes showing high consumption at random intervals Otherwise normal consumption rates
IBM Tivoli Check Average CPU time per Transaction Based on threads running application classes in workload mix Transaction with very high CPU in spike interval
IBM Tivoli Example: Transaction Analysis Methodology
IBM Tivoli Analyze Transaction Instances of Interest Show “Level 2” J2EE component-level events Sequential view of event execution / flow High-precision timing measurements for each event call Highlighted JCA calls exhibit high delta CPU timing difference !
IBM Tivoli Further Analyze Transactions Show discreet “Level 3” method-level and nested method events Each row shows method flow and depth Good candidate for tuning due to high delta CPU consumption !
IBM Tivoli Analyze SQL Profile Check the response time for various queries. Use the data in conjunction with Top used queries report. Tune queries.
IBM Tivoli Check for Top Methods Used Identify hot methods by count Names of hot methods Total Invocation Count !
IBM Tivoli Check for Most CPU-Intensive Methods Correlate hot methods by CPU cost with highest count methods Names of hot methods CPU consumption for each method !
IBM Tivoli Check for Slowest Methods Correlate with hot methods to evaluate total contribution to response time Names of slow methods High average response time per method !
IBM Tivoli Example: Memory Leak Analysis
IBM Tivoli Memory Analysis Reporting Quick check to detect presence of a leak Upward slope indicates possibility of a “slow” memory leak Constant request rate correlated with JVM Heap Size
IBM Tivoli Memory Leak: Avg. Heap Size after GC vs. Requests Average Heap Size after GC vs. Number of Requests: Verify that a leak exists with the Avg. Heap Size After GC Graph. Check to see if it is due to an increasing number of requests. To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Change Metrics.
IBM Tivoli Memory Leak: Average Heap Size after GC vs. Live Sessions Average Heap Size after Garbage Collection (GC) vs. Live Sessions: Verify that a leak exists with the Avg. Heap Size After GC Graph Check to see if it is due to an increasing number of users. To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Select Metrics.
IBM Tivoli Find Leaking Candidates Production-friendly heap-based analysis Comparison of heap snapshots shows suspected leak candidates Class name filters Application class that appears to have some growth
IBM Tivoli Zero in on leaking code View suspected classes and allocating methods Each ‘allocation pattern’ uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code Indicates the specific point in the application code where this object set was allocated from !
IBM Tivoli Zero in on leaking code (scroll from previous page) V iew suspected classes and allocating methods Each ‘allocation pattern’ uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code Additional code and GC performance details help developers isolate leak and optimize JVM Large number of surviving objects since last GC
IBM Tivoli View References to Live Objects Confirm Allocating Class Helps pinpoint why objects in question are not getting garbage collected Also shows other objects on the heap which contain references to the set of objects being analyzed. Allocating method and line number in the code