Presentation on theme: "Performance and Reliability 101 Brent Cromarty Ping Identity"— Presentation transcript:
Performance and Reliability 101 Brent Cromarty Ping Identity firstname.lastname@example.org
A little about me Like – Long walks on the beach – Red wine Dislike – Mean people – Early mornings Encourage questions throughout presentation – Although I may hold off if I am going to address it with material in a future slide. – You may have to ask again
OK Seriously… Spent bulk of my career (14 years) at SAP – By way of Business Objects acquisition By way of Crystal Decisions acquisition – By way of Seagate Software IMG 5 years experience in customer support – Discovered impatience, dislike for people 9 years of performance and reliability (P&R) testing for Crystal Reports in Crystal/Business Objects Enterprise product Currently in my second year with Ping Identity
Why are we here? Types of testing that make up P&R – Design What is the goal of each test type? What does it prove/disprove? – Execution How is the test run? – Results Analysis How to figure out if the test passed or failed? Best Practices (Tips/Tricks/Suggestions/Filler) Suggestions for root cause analysis
So… What are these test types that I speak of? Types of P&R tests – Load – Scalability – Endurance – Stress – Reliability
Load Testing Performance equivalent of functional “smoke” test Functional test/workflow executed under “load” – Typically “load” is in the form of concurrent users Executed with a Load Generator tool – Load Runner, JMeter, QALoad, Grinder, etc… – Does the component stand up? Does the test pass functionally? For all users? Does it crash? Does the system grind to a halt? – Metrics to consider Response time (average, 90 th percentile, min, max) Throughput CPU and memory utilization on the target system
Scalability Testing Executed as a series of Load Tests – Workload Scalability Vary the user load from test to test – Resource Scalability Vary the resources from test to test – Functional success Error rate “too high”, scalability results are meaningless – How does performance change from test to test? Response time (average, 90 th percentile, min, max) Throughput CPU and memory utilization on the target system – Do not discount single user performance A system can exhibit linear scalability, but still perform poorly
Endurance Testing Also know is “Soak” testing Load test executed over an extended duration – Typically overnight or over the weekend Proves “reliability” of the system – Consistency of functional results Very first result same as very last and all those in between? Depending on requirements, error rate > 0 can be acceptable – Consistency of performance Does response time or throughput degrade over time? – Consistency of resource utilization Are we leaking memory? How does CPU usage look over the duration?
Stress Testing Often mistakenly referred to as “Load” testing Best thought of as “extreme” load testing – Resiliency of the system when pushed beyond limits 150% to 200% of the “nominal” load for the system Half the system resources suggested for a given load – CPUs, memory, network bandwidth, etc… Looking for “graceful failure” – Best: System returns “Too Busy” – Acceptable: System slows down, maybe some requests time out Better: effective error messaging so that uses know system is maxed out – Bad: Crash – Worst: Unpredictable results, misleading error messages
Reliability Testing Negative condition Load Testing Test resiliency under error conditions – Error condition code paths typically don’t get the same coverage as the “happy path” – Is the system consistent under constant error conditions? Are results consistent and predictable over time? Consistency of resource utilization – Error conditions are notorious for resource leaks Security tests – i.e.: Denial of Service
Random Suggestions (Time Filler) Choose workflows that fit the “80/20 rule” – Some workflows need P&R testing, others don’t. Choose wisely. Use sufficient hardware for your Load Generator application – Size your client hardware like you would your target system Don’t use “intrusive” validation in your test cases – Heavy test validation will slow down your test and affect concurrency Avoid use of “intrusive” monitoring when possible Beware of logging – Logging is useful, but can kill performance Visualize your results – A picture is worth a thousand words. Who doesn’t like charts? – Include context (resource utilization of the systems under test)
So what do I do if I think there is a problem? Too slow? – Is your system tuned? Ensure you have not configured a bottleneck in your deployment – Try a profiling tool Can show which areas of the code are taking the most time – Add some lightweight logging to code Add “timing code” to log out elapsed time in functions/paths – Use a stack dumping utility Repeated stack dumps can show where you are “stuck” Using too much memory or leaking? – Try a profiling tool Can show – Add “size” logging for container classes Can show you if your containers are growing unbounded
Your consent to our cookies if you continue to use this website.