Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Testing in Nutshell

Similar presentations


Presentation on theme: "Performance Testing in Nutshell"— Presentation transcript:

1 Performance Testing in Nutshell
with Apache JMeter Example Tomas Veprek Software Performance Tester Tieto, Testing Services

2 Agenda 2. Why do we bother? 1. What is it? 3. How do we do it?
Performance Testing 4. When do we do it? 6. Where do we do it? 5. Who benefits from it?

3 About Me … Graduated from Technical University of Ostrava in 2003
Since 2005 employed at Tieto Czech Functional software tester 3rd tier support of EMC Documentum applications Performance tester ( present) Proponent of Context-Driven School of Testing People who originally developed Context-drive School of Testing: Cem Kaner, James Bach, Bret Pettichord, Brian Marick Seven basic principles of Context-driven school (source: The value of any practice depends on its context. There are good practices in context, but there are no best practices. People, working together, are the most important part of any project’s context. Projects unfold over time in ways that are often not predictable. The product is a solution. If the problem isn’t solved, the product doesn’t work. Good software testing is a challenging intellectual process. Only through judgment and skill, exercised cooperatively throughout the entire project, are we able to do the right things at the right times to effectively test our products.

4 1. What is performance testing?

5 Software testing is … The process of evaluating a product by learning about it through exploration, experimentation, which includes to some degree: modelling, study, observation, inference, etc. -- James Bach Evaluate: how good / bad is the system (product)? In other words, we evaluate various qualities of the software system Examples of qualities: capability (functions), reliability, performance, scalability, stability, maintainability, security, installability, usability Exploration: we move through the system in a way that has a high probability to discover problems that matter to our clients. Each test we carry out should bring some valuable information about the system, in other words, each test should decrease the entropy of the system. Experimentation: conducting tests (experiments) that are to provide some useful information about the system Modelling: we create to look at the system from a different perspective. Examples of models: a set of functions that the system provides its users, user scenarios describing the way the users will interact with the system to achieve their tasks, the set of components the system consists of and the relationships between then, the data the system consumes and produces Observation: the process of overseeing how the system operates during experiments. Inference: drawing conclusions based on the results of the experiments and the observations we made. We create theories about how the system works and does not work

6 Performance testing is …
Type of software testing that evaluates a software product mainly for the following qualities: Performance Stability Scalability

7 Software Product / System
Diagram capturing the entire system stack Application (application code, web, application, database servers) System libraries Kernel (file system, memory management, task scheduling, network stack) Operating system stack Drivers Hardware

8 Performance How fast are operations performed by the system under certain load? Operations: Submitting data after clicking a button on an HTML page REST API call Disk read / write operation Different people see performance differently Different tasks imply different performance expectations Is performance only about time? Well, time is the most important metric to measure performance. In addition to time, there are other metrics we measure.

9 Key Performance Metrics
Response time Latency Throughput IOPS Utilisation Saturation

10 Response Time and Latency
Response time = time for an operation to complete, including the time spent waiting and being serviced Latency = time an operation spent waiting in a queue Response time = Latency + Service time Easily quantifying degradation and improvement Interpretation depends on the purpose of the operation Target = the purpose of the operation

11 Latency - HTTP GET Request
Assume that a GET request consists of: DNS resolution Establishing TCP connection Data transfer from the server to the client What do we mean by latency and response time? The operation is an HTTP GET request The purpose of the GET request is to get some data from the web server What do we mean by the time waiting before serviced? Serviced in this context means delivering data from the server to the client => data transfer time

12 Latency – HTTP GET Request (2)
DNS Resolution Establishing TCP connection Data transfer Latency Response Time

13 Latency – HTML Rendering
User submits the data entered into a form on HTML page and waits until the page is rendered with the server response. What do we mean by latency and response time?

14 Latency – HTML Rendering (2)
DNS Resolution Establishing TCP connection Data transfer Latency Response Time HTML page rendering

15 Throughput The rate at which work is completed
The meaning depends on the target evaluated Examples: Bytes / bits per second SQL queries per second REST calls per second Transactions per second

16 IOPS Input / output operations completed per second
Throughput-oriented metric The meaning depends on the target evaluated Examples: Network devices (TCP) – packets received / sent per second Block devices – reads / writes per second

17 Utilisation Time-based definition: How busy a resource was during a period of time Queuing network theory: U = B / T * 100 [%] CPU utilisation Disk utilisation Capacity-based definition: The extent to which the capacity of a resource was used during a time period. Time-based definition: 100% utilisation doesn’t have to lead to the saturation of the resource. We need to examine other metrics of the resource that indicates its saturation Capacity-based definition: this definition comes from the area of capacity planning and it assumes that every resource (component) has some maximum capacity it can deliver. However, it’s very difficult to determine the maximum capacity for some resources, for instance hard disks. To determine the maximum capacity of hard disks, the other parts of computer systems that disks use must be evaluated. The time-based definition of utilisation is the most used.

18 Saturation The extent to which a resource has queued work because it cannot accept more work Capacity-based utilisation >= 100% Increases latency and response time Bottleneck = the resource that limits performance Bottleneck does not have to be a system resource. It can be a software component that limits performance because it is misconfigured. Difference between time-based utilisation and capacity-based configuration can be understood from the following example: Let’s imagine we are observing the operation of a cable car at its intermediate station According to the time-based definition, the cable car might be 100% utilized (busy), but passengers can still be allowed to aboard => it’s not saturated According to the capacity-based utilisation, if the cable car is 100% utilized (the full capacity is used), the cable car is saturated. For CPU, disk and other resources that use the time-based utilisation, we need other metrics to determine whether the resource is saturated or nor.

19 Scalability How does the performance of the product change under increasing load? Load Hourly number of users accessing the product API calls per second Amount of data uploaded to the product per second System may perform and may not be scalable

20 Throughput vs. Load Graph
Graphs with slow and fast performance deterioration

21 Response Time vs Load Faster increase – memory thrashing (virtual memory in constant paging – pages in memory are transferred to the disk and the required pages are read to memory), saturation of disk devices Slower increase – CPU saturation (high context switches) How do we protect the system against an excessive workload? We can set a throughput limit (throughput throttle) => HTTP 503 errors

22 Stability Does the system perform over long time?
Does performance gets back to normal after an exceptional situation? Memory leaks Excessive logging on disk devices

23 Memory Leak The outcome of an endurance test generating a stable workload that was carried out on a document management application The test was supposed to run for 12 hours but ended after 3 hours and 40 minutes because of the lack of heap memory

24 2. Why to do performance testing?

25 It is all about risks and consequences
Performance testing isn’t for free How do we know that performance testing will pay off? We need to analyse performance risks, their consequences and decide whether the benefits of mitigating them will outweigh the costs Not only economical aspects should be evaluated. Knowing whether the clients of performance testing will use the information from performance testers and how they will use it is also important It doesn’t make sense to do any performance testing if the clients will not act upon the information provided from performance testers When we build a software application for 5 users, why would we spent time and costs on performance testing? Costs Benefits

26 Risks and Consequences
Risk: System may crash after being released for public use Consequences: Company will lose money Company may lose its customers and its credit

27 Risks and Consequences (2)
Risk: System may be slower than our competitors’ systems Consequences: Users may complain Company may lose money An example to illustrate this situation might be the planned connection between the London and New York stock exchanges Expected latency improvement: 6 ms The overall costs of the solution is going to be $300 million The benefits of the solution outweigh the costs

28 Risks and Consequences (3)
Risk: The new software component integrated in our system may degrade its performance Consequences: User experience will adversely be affected Additional costs for the company to fix the problem An example: Recently, I was called in to participate in finding the root cause of the problem that occurred after a new piece of code was integrated one of the software components of a large software system. The new piece of code was supposed to regularly check the availability of two servers (software implementation of load balancing). Once the new version of the software component was deployed in production, the system became irresponsive in a few minutes later. As a result, a rollback had to be done. In the attempt to find the root cause, the component with the problematic piece of code was installed in a test environment. Together with my colleague who is the maintainer of the component, we tried to create a picture of how the component is used in production. Apache JMeter was applied as a load generation tool. It would have been meaningless, if no monitoring had been set up beforehand. In addition to the key performance metrics, more specific metrics were observed, for example Java heap utilisation and the number of outbound connections to the load balanced servers.

29 Component as a Performance Issue
Database server 1 Web service Software load balancer Database server 2

30 Risks and Consequences (4)
Risk: System may not be able to handle occasionally high loads throughout the year Consequences: Customers will complain loudly Company will lose money and credit Customers may leave for competitors’ systems Example: A few years ago, I was called in to find out whether an e-shop can handle an exceptionally high load after releasing a new version of iPhone (iPhone 6). There was not enough time to study the nitty-gritty details of how the system is used in production, so the character of the load was estimated based on a conversation with the business people rather than based on a more comprehensive study. Having the most critical actions identified, a scalability test was designed to find out how much load the e-shop can handle before it degraded in performance. The results of the test showed that the e-shop would most likely not be able to handle the expected load due to the insufficient capacity of the web servers.

31 Risks and Consequences (5)
Risk: System will not be able to run over long time Consequences: System has to be restarted during the evening hours Users may complain about system unavailability

32 3. How to do performance testing?

33 Common Performance Testing Activities
Problem statement clarification Workload modelling Understanding performance requirements Performance test design Load generation tool selection

34 Common Performance Testing Activities (2)
Performance test implementation Test data creation Monitoring setup Performance test execution Test results analysis Reporting to stakeholders

35 How does it all begin? Client Performance tester

36 Conversation with Client
Client: “We are just about to release a new version of software application and we want to do performance testing.” P.Tester: “When exactly are you releasing?” Client: “In one week.” P.Tester: “What do you want to learn from performance testing?” Client: “How fast is the application compared to the previous version.”

37 Conversation with Client (2)
P.Tester: “Did you performance test the previous version?” Client: “No. We didn’t. I thought you would do it now.” P.Tester: “Do you realise there’s only one week left before deploying the new version in production?” Client: “Yes, the schedule is very tight, but you can just do simple performance tests.” P.Tester: “Uff!… What do you mean by simple tests?” …. What are the issues? The client has some idea what they want to get from performance testing, but they don’t realise how time-consuming performance testing is, particularly preparation activities It happens very often that the client thinks of performance testing as only pressing the button and getting the results automatically answering the question they’re after. Another problem resulting from the conversation is the fact that performance testing is going to be done so late in the development cycle. Problems found so late in the development cycle are usually connected with the system architecture and any attempt to fix such problems poses a lot of risks.

38 Activity #1: Clarify Problem Statement
What is the system to be performance tested? What is the mission of performance testing (risks to be reduced, information to be collected)? What context will performance testing be done in? Clients Time and budget System status Development team & lifecycle Experience and skills of performance testers Why is the clarification of the problem important? Unless we understand the system (software product) that is going to be performance tested, we cannot test anything What do we need to know about the system: Almost-ready software system or component? Architecture of the system (3tier web based applications) Interfaces Communication protocols Data processed and produced by the system

39 Problem Statement: Example
Simple air ticket purchase system Web-based application Features: User registration Search for outbound and return flights Purchase an air ticket Browse the itinerary Mission: Test how the application performs under the anticipated load in production.

40 Flights and purchased tickets
System Architecture Web server (Xitami) CGI scripts in Perl Flights and purchased tickets Physical machine

41 Air Ticket System: Login Page

42 Air Ticket System: Landing Page

43 Air Ticket System: Flights

44 Air Ticket System: Choose Flight

45 Air Ticket System: Make Payment

46 Air Ticket System: Invoice Summary

47 Air Ticket System: Itinerary

48 Activity #2: Workload Modelling
Analyse how the system under test is or to be used Determine how much load will be applied System under test Input Resulting performance Disturbances

49 Workload Modelling: Input
Web applications: User actions over web pages Web services: REST / SOAP requests Relational database: SQL queries Java component: Method calls SMTP server: Requests sending s

50 Workload Modelling: Input (2)
Which operations do we consider as input? Classify operations using the following criteria: Frequency (popularity) Business criticality Data amount Execution time The input can refer to a high number of various operations We can’t cover them all What do we do? We need to choose the best representatives (group the operations that have similar performance impact)

51 Workload Modelling: Input (3)
Determine the distribution of operations Operations do not occur with the same frequency Distribution may significantly affect performance Determine the range of anticipated load Focus on peak load (usually calculated per hour) Sources of usage information: Discussion with stakeholders Analysis of access log files (e.g. Google Analytics) Best guess based on similar software systems

52 Workload Modelling: Disturbances
Scheduled jobs (e.g. database backups, search engine indexing) Multiple tenants on virtualised servers Distributed system architectures

53 Workload Modelling: Example
What is the input and how do we model it? Identify who and what interacts with the web application View user actions as user scenarios Create behavioural diagram Analyse the load applied over time

54 Workload Modelling: Example (2)

55 Workload Modelling: Example (3)
Sessions started every hour -> either login actions or registration => user arrival rate

56 Activity #3: Understanding Performance Requirements
Clients’ expectations of how the system should perform Reduce the subjectivity of performance Depends on the system under test Examples: E-shop must handle 1200 orders every hour and 98% of all orders should be placed within 10 s. Web service must handle searches for available phone numbers every hour in less than 5 s. The system must have enough capacity to accommodate the load created by 500 more users without any affect to the performance. How to deal with the subjectivity of performance? Quantify to some extent client’s expectations Performance requirements strongly depend on the software system under test Performance requirements should be defined by the people who defines the quality of the system Examples of poorly defined performance requirements: E-shop must handle 500 concurrent users (concurrent users might be misleading and it needs to be clarified what it actually means) Web service must complete SOAP requests in 5 s (the load isn’t mentioned at all)

57 Performance Requirements: Example
Air ticket sales application must handle the peak load of 400 users arriving every hour with the following conditions met: Login must be completed in less than 3 s In 98% cases, available flights should be found in less then 5 s Purchase of flight tickets should not take longer than 7 s Notice the requirements are incomplete What about the performance of the other operations not explicitly mentioned? What if the flight purchase takes 7.1 s. Is it still acceptable or not? Is it a problem or not? To answers these questions, human judgement is needed. Always clarify what the numbers mean by asking the person or people who defined them!

58 Activity #4: Performance Test Design
What tests do we carry out in order to fulfil our mission? Load intensity and profile Test duration Ramp-up period Performance test types: Load test Scalability test Stability test High availability test Load intensity and profile are provided by the workload model created for the system under performance scrutiny Ramp-up period: For load tests, the ramp-up period usually equals the longest user flow. For scalability tests, the ramp-up period takes longer (dozens of minutes, hours) for us to be able to observe the effect on the performance and spot under which load intensity the performance degrades.

59 Performance Test Design: Example
Load intensity: 400 sessions started every hour Think times: 10 s between page transitions (+- 25%) How many virtual users (threads) do we need? Ramp-up period: the longest session Duration: 2 hours In reality, think times vary across users and even the same user interacts with the application differently every time. To reflect this behaviour in performance tests, think times should be randomised. The workload of load, scalability and stability tests should be as realistic as possible Stress tests (more general than scalability tests) in general do not have to apply realistic loads on the system because they primary purpose is to explore how sensitive the system is to exceptional situations (e.g. extreme loads, regular peaks in the load). Most load test tools work with the term virtual users. From the operating system point of view, virtual users are implemented as threads or processes. Threads are more preferred as they do not consumer as much system resources as processes. How many virtual users do we choose for the target load of 400 sessions started every hour? We have 2 options: Choosing the same number of virtual users Estimating the number of virtual users based on the average duration of each session (user flow) At 1. Straightforward approach, but if the duration of user sessions (user flows) is a fraction of an hour, the threads spends too much time waiting => a waste of system resources. At 2. If we determine the average duration of each user session by experiments or by leveraging available usage statistics, we can work out the number of times the session occurs within an hour. Knowing this, we can then calculate the number of virtual users. It’s important to realise that this approach might fail in the situation when one of the thread waits for a long time until the response is delivered by the server, which might lead to not fulfilling the target load. We might increase the number of virtual users to deal with such a situation, but even that might not always help. In the new version of JMeter plugins (1.4.0), a new thread group element (Arrivals Thread Group) was added, allowing the number of virtual users to change over time based on the actual performance. For more information, see

60 Activity #5: Load Generation Tool Selection
In majority of cases, the use of software tool for load generation is undeniable Commercial or free depends on the budget Software tool is the tip of an iceberg => not the centre of performance testing Choose whatever load generation tool that does the job

61 Load Generation Tool: JMeter
Java-based, free and open source load generation tool Suitable for HTTP(S), FTP, SMTP, TCP, JDBC, MongoDB, shell scripts, … Scripts stored as XML files Currently version 2.13 (version 3.0 soon coming) Out-of-box bundle lacks reporting functions JMeter Plugins enhance JMeter functionalities JMeter cloud solutions: Blazemeter, Octoperf, flood IO @jmeter_plugins

62 Load Generation Tool: HP LoadRunner
Commercial Free only the community version with 50 virtual users Great range of supported protocols Scripts in the C language Advanced reporting of performance test results Applications: VuGen, Controller, Analysis, Load Agent Licences determined based on the number of virtual users and protocol bundles Licences are relatively costly

63 Activity #6: Performance Test Implementation
Workload models transformed into test scripts (one-to-one relationship between scripts and paths or one complex script for web applications) Performance tests are composed of test scripts Supportive tools (e.g. HTTP proxies) used to implement test scripts Examples of HTTP proxies: Chrome development tools, Firebug, Fiddler

64 Performance Test Implementation: Example
Paths in the workload model: Path #1: Login -> Flights -> Find a flight -> exit Path #2: Login -> Flights -> Find a flight -> Choose a flight -> Make a payment -> exit Path #3: Login -> View Itinerary -> exit Path #4: Login -> View Itinerary -> Cancel Flight -> exit Path #5: Signup -> Register -> exit Probability of exercising path #1: 0.8 * 0.75 * 0.05 = 0.03 (3%) Probability of exercising path #2: 0.8 * 0.75 * 0.95 = 0.57 (57%) Probability of exercising path #3: 0.8 * 0.25 * 0.05 = 0.01 (1%) Probability of exercising path #4: 0.8 * 0.25 * 0.95 = 0.19 (19%) Probability of exercising path #5: 0.20 (20%)

65 Activity #7: Test Data Creation
Amount, diversity and accuracy of test data may significantly affect performance test results Recyclable and non-recyclable data Static and dynamic data Using the same user name all the time may lead to not covering certain parts of the system due to caching Examples of recyclable data: address, first names, last names (depends on the situation) Examples of non-recyclable data: registering a mobile phone number in a customer-self application Examples of static data: list of user names and password Examples of dynamic data: randomly chosen birth date, timestamp

66 Test Data: Example User names and passwords – CSV file
Available flights – chosen randomly on the fly Seat preference and type – chosen randomly on the fly Number of passengers – kept constant during the test

67 Activity #8: Monitoring Setup
Observing how the system performs System-level monitoring: CPU, memory, disk, network, IOPS Application-level monitoring Heap memory utilisation Web server active connections Most-time consuming SQL queries Software tools: built-in OS tools, HP Sitescope, New Relic, Nagios Examples of Windows performance tool: perfmon Examples of Linux performance tools: vmstat, iostat, mpstat, pidstat, top, free, sar, dstat

68 Activity #9: Performance Test Execution
Ensure the sufficient amount of test data is created Activate the monitoring of the software system Launch the performance test manually or schedule its execution If possible, actively monitor the software system under load (e.g. log files, additional monitoring tools, profiling) The test may be stopped earlier before its planned completion as a result of a high number of errors

69 Activity #10: Performance Test Result Analysis
Depends on the purpose of the performance test Was the mission of the performance test met? If not, why? Load tests: Was the course of transaction response times stable during the test? If peaks occurred, what was the cause? In complex software systems, the interpretation of test results and any suspicious behaviour is team work

70 Activity #11: Reporting to Stakeholders
Summarising the observations made from performance the results in a way understandable and helpful for the stakeholders we report to Stakeholders: Programmers Product owner Project manager Users Form: verbal or written (agreed beforehand with the stakeholders) Examples of written forms: , PPT presentation, Word document (less or more formal)

71 Useful Links Apache JMeter plugins: http://jmeter-plugins.org/
Scott Barber’s User Experience, not Metrics Series: Blazemeter JMeter Cloud: James Bach’s blog (Context-driven School of Testing):

72 Tomas Veprek Software Performance Tester Tieto, Testing Services


Download ppt "Performance Testing in Nutshell"

Similar presentations


Ads by Google