Experts Helping Experts: Rapid Bottleneck Identification (R.B.I.) Methodology as used in Load Testing Managed Services Dan Koloski CTO and Director of.

Slides:



Advertisements
Similar presentations
Tales from the Lab: Experiences and Methodology Demand Technology User Group December 5, 2005 Ellen Friedman SRM Associates, Ltd.
Advertisements

Testing Relational Database
Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group
QA practitioners viewpoint
Performance Testing - Kanwalpreet Singh.
1 Effective, secure and reliable hosted security and continuity solution.
Chapter 1 Business Driven Technology
Which server is right for you? Get in Contact with us
© 2008 PerfTestPlus, Inc. All rights reserved. Performance Tuning in a Virtual Environment Page 1 StickyMinds.com and Better Software.
Planning Ahead for Optimal Contact Center Deployment Phil Odence, VP Contact Center Business, Empirix.
MIS 2000 Class 20 System Development Process Updated 2014.
Iulian Mitrea 26 th June 2014 Salesforce a quality journey to happy customers.
© 2014 VMware Inc. All rights reserved. BlazeMeter Load Testing Solution with vCloud Air High-level Overview Jan 2015.
Welcome to RAI, the future of collaborative Project Risk Management Overview of Project Risk and Issue Management RAI for the Project Manager RAI for the.
Module 10: Troubleshooting Active Directory, DNS, and Replication Issues.
VoIP: Full Lifecycle Management Russell M. Elsner APM Technology Director OPNET Technologies, Inc.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Performance Engineering Methodology Chapter 4. Performance Engineering Performance engineering analyzes the expected performance characteristics of a.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
1 CSSE 477 – A bit more on Performance Steve Chenoweth Friday, 9/9/11 Week 1, Day 2 Right – Googling for “Performance” gets you everything from Lady Gaga.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
Software Testing and Quality Assurance Testing Web Applications.
Performance Evaluation
1 Chapter 7 IT Infrastructures Business-Driven Technology
Copyright © 2002 Pearson Education, Inc.
©Company confidential 1 Performance Testing for TM & D – An Overview.
Agile Testing with Testing Anywhere The road to automation need not be long.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
A PRODUCT COPYRIGHT © 2014 PATENT PENDING
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Low Cost Load and Performance Testing. Example Test.
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Checkpoint Technologies Corporate Overview Faraz Syed VP of Technical Sales Office:
LAYING OUT THE FOUNDATIONS. OUTLINE Analyze the project from a technical point of view Analyze and choose the architecture for your application Decide.
DNN Performance & Scalability Planning, Evaluating & Improving : Part 2.
Semester 1, 2003 Week 7 CSE9020 / 1 Software Testing and Quality Assurance With thanks to Shonali Krishnaswamy and Sylvia Tucker.
by Marc Comeau. About A Webmaster Developing a website goes far beyond understanding underlying technologies Determine your requirements.
Chapter 3 – Agile Software Development 1Chapter 3 Agile software development.
Introduction Optimizing Application Performance with Pinpoint Accuracy What every IT Executive, Administrator & Developer Needs to Know.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
OFC 200 Microsoft Solution Accelerator for Intranets Scott Fynn Microsoft Consulting Services National Practices.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
August 01, 2008 Performance Modeling John Meisenbacher, MasterCard Worldwide.
Your First Azure Application Michael Stiefel Reliable Software, Inc.
COMP3121 E-Commerce Technologies Richard Henson University of Worcester November 2011.
The Performance mission is to ensure a positive user experience by monitoring and eliminating system bottlenecks, establish response time baselines for.
7-1 Management Information Systems for the Information Age Copyright 2004 The McGraw-Hill Companies, Inc. All rights reserved Chapter 7 IT Infrastructures.
WEEK TWO, Session 2 Information Gathering. Helpdesk metrics must be reprioritized from measuring internal efficiencies to evaluating customer retention.
CONFIDENTIAL INFORMATION CONTAINED WITHIN 9200 – J2EE Performance Tuning How-to  Michael J. Rozlog  Chief Technical Architect  Borland Software Corporation.
“Load Testing Early and Often” By Donald Doane Presentation to the Rockville MDCFUG.
Apache JMeter By Lamiya Qasim. Apache JMeter Tool for load test functional behavior and measure performance. Questions: Does JMeter offers support for.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Capacity Planning Plans Capacity Planning Operational Laws
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
Chapter 10 Information Systems Development. Learning Objectives Upon successful completion of this chapter, you will be able to: Explain the overall process.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
If you have a transaction processing system, John Meisenbacher
1 Presented by: Val Pennell, Test Tool Manager Date: March 9, 2004 Software Testing Tools – Load Testing.
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
Software Architecture in Practice
Software System Testing
Algorithms for Selecting Mirror Sites for Parallel Download
Presentation transcript:

Experts Helping Experts: Rapid Bottleneck Identification (R.B.I.) Methodology as used in Load Testing Managed Services Dan Koloski CTO and Director of Strategy Web Business Unit, Empirix IMAGE

Agenda About Empirix and e-LoadExpert Understanding the value proposition of Managed Services The Theory - Methodology Overview Importance of Throughput testing The Experience – Case Study – A New Online Ticketing System Summary Questions

Empirix – At a Glance Testing & Monitoring Solutions for: Privately held, founded employees Worldwide presence; HQ in Boston & offices in Japan, UK, Germany and South Korea Patents in test, monitoring and VoIP diagnostics methodology Web Applications & SOA Contact Center & CRM Carrier-Class VoIP & Telco

Quality Solutions to Fit Your Needs Empirix Offerings Delivered Outcome Empirix Offerings Delivered Outcome PRODUCTS MANAGED SERVICES EXPERT SERVICES Onsite Consulting, Training, Mentoring e-TEST suite ™ OneSight™ e-LoadExpert™ Do-it-yourself Remote Validation Performance Tuning Supplement Your Team We Do it for You

“Communication” NEQ “Reports” in Load Testing Collaboration during testing is essential for efficiency … … so your load tools should allow for collaboration –Access and manipulate load test data live over the web Individual component owners (DBAs, Middleware, Apps, Network, ISP, Hoster, LOB, etc.) need to analyze load test data during the test…...but the data each one needs is different! –Multiple users logged into the same load session separately –No desktop sharing!

Managed Services: An often-forgotten approach Load testing is… –high-intensity –short-duration –low-frequency Challenge to maintain… –Tool skills –Environment availability –Proper methodology –Consistent team Sometimes build/buy/lease and execute yourself isn’t the best approach

Managed Services: A different approach Complement to in-house testing –or a complete alternative in some cases Our testers (tool and methodology experts) Our load infrastructure (high-availability, short notice) Our tools (no need to buy) Your production or staging environment Your application and infrastructure experts WHOLE IS GREATER THAN THE SUM OF THE PARTS

In Load Testing, Communication is Real-Time e-Load 8 is Web-enabled to allow collaboration Massive tuning efficiency gains when used properly Invite the business to attend! Each user can view results from the same load test as it runs

e-LoadExpert: Managed Load Test Services Fully managed, outsourced performance & tuning service Award-winning high-capacity infrastructure built on Empirix’s e-Load Superb methodology delivered by expert consultants with thousands of hours of experience Ongoing reporting throughout the test and a comprehensive summary report upon completion A hosted performance tuning service managed and executed by Empirix experts using Empirix infrastructure.

The Art of Load Testing Aristotle said, ‘Theory plus Experience equals Art’ The Theory is Empirix’ RBI Load Testing The Experience is work of the eLoadExpert team presented here The Art of Load Testing then is the skill you will begin to develop through implementing RBI load testing with the added benefit of our experience.

RBI Load Testing Fast – Focusing on Throughput Testing Take less Time (minutes vs hours) Simple – Much testing can be done without even looking at the application Modular & Iterative – Focus on one piece of the puzzle until it works, then proceed The ‘Knowledge’ is built in – when problems each previous step has already been ruled out Easy to Plan

How the Methodology Can Help You This methodology is based on years of performance testing experience. Something will be learned from every test run. By uncovering bottlenecks in this stepped approach you can isolate issues in components of which you have limited knowledge. Very easy to replicate for future testing. Easy to re-test. By focusing on throughput testing, you save time.

Our Theory: RBI Load Testing What is Load testing? “Testing conducted to isolate and identify the system and application issues (bottlenecks) that will keep the application from scaling to meet its performance requirements”

What is a Bottleneck? Any resource (hardware, software or bandwidth) that places defining limits on data flow or processing speed: your application is only as efficient as its least efficient element On the Web, bottlenecks directly affect performance and scalability Most untested systems have more than one bottleneck, but they can only be identified and resolved one at a time.

Where is the Bottleneck in this example? The more variables in a test, the harder it is to determine what is causing the bottleneck. 3 Scripts 10+ pages each Browsing, Searching, Adding to Shopping Cart and Purchasing

Our Theory: RBI Load Testing Rapid Bottleneck Identification (RBI) Load testing starts with some basic assumptions: 1.All web applications have bottlenecks. 2.These bottlenecks can only be uncovered one at a time. 3.Focus on where the bottlenecks are most likely to be found. At a high level: –Isolates bottlenecks quickly by starting with the simplest possible tests, working to build in complexity, so that when an issue is uncovered all of the simpler causes have been ruled out. –Its fast because we focus on the bottlenecks first, and where they are most likely to be found – in the throughput….

Testing Web Applications The Key to successful performance testing of any application (not just for the web) is to understand how real users use the application. For web applications, this means knowing: 1.What pages of the application users hit, and in what percentage 2.How many users access the application, and how quickly they navigate the application. This gives us two key sets of data: –What to test –How to test it

What to test: For an application that is currently deployed, this can be gathered from web tracking tools like WebTrends For example: These daily averages tell us that the most commonly hit pages are the Homepage and the search page. And that currently 6200 orders are placed per day. Two more piece of data we’ll use later: Users, on average, spend 9 minutes on the site and look at 17 pages.

How to test it: Empirix recommends testing key functionalities independently at first, and then together in real world scenarios. Using the previous example: –Test the homepage –Test the search page –Test the business critical Checkout pages –Put them all together in a scenario

Where are the Bottlenecks? In our experience, bottlenecks can be found anywhere in the system infrastructure / network, application code, or the database.

Where are the Bottlenecks? But the vast majority of these issues are caused by throughput limitations – not concurrency.

Throughput Vs Concurrency Throughput: The measurement of the flow of data, e.g.hits/second, pages/second, Mbps (megabits/second). Throughput tests are run with 1 second page delays between requests, ramping until the throughput rate fails to keep up with the increase in load. It’s not JUST bandwidth. Concurrency: The measurement of independent users that a system can support. On the system level, concurrency is limited by sessions, and socket connections. On the application level, flaws in the code can limit concurrency, as can incorrect settings in server configuration. Concurrency tests are run by ramping a number of users running with realistic page delay times, ensuring that the ramp up is slow enough to gather data at meaningful points.

Throughput Vs Concurrency Pages/Second ThroughputConcurrency 100 Vus 1 sec delay 1000 Vus 10 sec delay 100 sessions 1000 sessions Is the load generated from a 100 Virtual User load test, with 1 second think times, equivalent to a 1000 Virtual User load test with 10 second think times? So when we say an bottleneck is related to throughput not concurrency, we mean it is a limitation on the amount of data the system can supply at any given time regardless of the number of users requesting the data.

Using RBI Load Testing Going back to our example application: –Start with throughput testing of the most commonly hit pages to ensure they meet the application goals. –Start simply, running tests to hit each page or functionality (homepage, search, checkout) on their own before putting them together. –Then run a real world scenario concurrency test based on the percentages seen in the logging tool: –50% of users hitting the homepage and browsing –25% of users search and browsing –25% adding to cart and checking out* * This is a higher percentage than seen in the statistics, but takes into account higher sales conversions during the peak Christmas season

Concurrency tests must accurately reflect what users are doing and how quickly they are doing it. The last piece then is how quickly should users run the scripts. Using the data presented earlier we can give an example: –If users spend 9 minutes in a typical session –And look at 17 pages –Then the user think time should be: 31 seconds - (9*60)/17 Run the scenario, ramping slowly (1 user every few seconds) and watch response times as users are added. When response times increase a bottleneck has been reached, and customer may look for other options… Using RBI Load Testing (cont’d)

So How Would We Test an Application? Step into the system. Start simple. If test passes requirements then move on. If not, fix the problem and retest. If a step fails a requirement there is no sense to proceed because adding in the next layer of complexity can only make performance worse.

Basic Network Tests What are common network issues? 1. Bandwidth 2. Hit Rate 3. Connections How to test for network issues: 1. Throughput test of very large image (small number of hits = lots of bandwidth) 2. Throughput test of very small file (large number of hits but no danger of saturating bandwidth) 3. Concurrency test of a test file with long page view delay (open lots of connections that do very little work)

Network Sample Issue

Web Servers Mostly covered as part of the network tests, but can have issues of their own. Typically not a problem unless using SSL, in which problems will manifest as either high CPU utilization or connection errors (12000 level).

Application Servers Where the code is processed to deliver dynamic pages. Common Application Server Issues: –High CPU utilization –Memory consumption, memory leaks A thought to keep in mind… the difference in page per second rate between a blank application page (ASP, JSP, etc.) and your actual application pages represents how much code tuning can increase your throughput.

Application Servers How to test for Application Server issues: 1. Throughput test of test application page. TEST APPLICATION PAGE 2. Throughput test of homepage (if an application page) or representative application pages. Just changing the extension on an HTML page (to.ASP or.JSP for example) will be sufficient to cause a decrease in throughput.

Application Server Sample Issue

Database Servers How do database issues manifest? 1.High CPU utilization on DB Server 2.Queued Requests on Application Servers Usually caused by inefficient SQL queries or poor DB optimization. To test the database separate from the application create a simple query to pull a single piece of data from the database

Now We Get to the Application… If we have come this far without encountering issues, any problems are caused by the application itself. Poorly performing pages (errors or high response times) need to be investigated for code optimization. –Response times will be the same at 1000 users as they were at 1 user if no bottlenecks are encountered. –Therefore, response times at this point in the testing are only useful as indicators of issues. It really is that simple…

Testing the Application - Throughput Again test iteratively and step into the application: For example, if users login to your site, search then add items to the cart and make purchases, use the following series of tests: 1. Homepage 2. Homepage + login 3. Homepage + login + search 4. Homepage + login + search + add to cart 5. Homepage + login + search + add to cart + check out As steps are added, degradation in response times or page throughput will be caused by the newly added step, making it easier to isolate what code needs to be looked at.

Putting it All Together – A Sample Test Plan Throughput Tests BandwidthRamp up users, 1 VU every 1 second, requesting a large file (loadtest.bmp) with a 1 second page view delay, run until bandwidth bottleneck is reached. Hit RateRamp up users, 1 VU every 1 second, requesting a small file (1kb.txt) with a 1 second page view delay, run until page/sec bottleneck is reached. Test Application Page (with & without DB access) Ramp up users, 1 VU every 1 second, requesting a test application page (loadtest.jsp) with a 1 second page view delay, run until page/sec bottleneck is reached. HomepageRamp up users, 1 VU every 1 second, requesting the application homepage with a 1 second page view delay, run until page/sec bottleneck is reached. Concurrency TestsHomepageRamp up users, 1 VU every 1 second, requesting the homepage with a realistic think time delay script up to maximum users. Basic System Level Testing

Putting it All Together – A Sample Test Plan, cont’d Throughput Tests BrowseRamp up users, 1 VU every 1 second, running browse script with a 1 second page view delay, run until bandwidth bottleneck is reached. SearchRamp up users, 1 VU every 1 second, running search script with a 1 second page view delay, run until page/sec bottleneck is reached. Add to CartRamp up users, 1 VU every 1 second,running add to cart script with a 1 second page view delay, run until page/sec bottleneck is reached. All 3 scriptsRamp up users, 1 VU every 1 second, running all 3 scripts with a 1 second page view delay, run until page/sec bottleneck is reached. Concurrency TestsAll 3 scriptsRamp up users, 1 VU every 1 second, running all 3 scripts with a realistic think time delay script up to maximum users. Application Level Testing

Summary The RBI methodology can be utilized for any and all web applications. The methodology can be summed up as quickly isolating bottlenecks by isolating components, starting with the simplest and building in complexity. By breaking a system into its component parts, and starting with the simplest components all of the bottlenecks can be uncovered the only way they can be uncovered – One at a time.

Where to find answers: QAZone Free resource, open to all (Empirix & non-Empirix users) –Forums (both QA-centric and Product related discussions) –Knowledge Base –Resource Center –Events, Announcements, Community & Industry News Allows members to interact with peers, industry experts and a larger Empirix audience

Q&A For more information: Dan Koloski Web BU CTO and Director of Strategy Empirix

Appendix – R.B.I in action

Our Methodology In Action Online Ticketing System for a Large Stadium The expected volume of traffic is 30,000 tickets per hour with the average user buying 3 tickets. The ticket purchasing site is really made of two sites: one for ticket purchasing and another to queue the users when the purchasing system has reached capacity. Tickets can be purchased by users that have pre- registered and also by people who will register at the time of purchase.

Creating the Test Plan & Goals Components –Queue – 15,000 in Queue –Pipeline must support 10,000 orders per hour –Login –Register – Withdrawn from this set of tests –Search –Add to Cart and Purchase A workaround was developed to bypass the queue and go directly into the purchase pipeline.

Some Results… The Queue - Concurrency Test The queue could only support 5000 sessions before the SQL server maintaining the sessions would peg on CPU and start dropping the sessions. With 6500 users on the system the error rate was almost 10%.

Some Results… Logins – Throughput Tests – The login function, the only ‘home grown’ piece, scaled to logins per hour. There were some errors, but these were at a throughput level much in excess of what is needed,

Some Results… The Search Engine – Throughput Tests The Results weren’t encouraging. When searching for 3 tickets at a time the system could support 1000 searches per hour. When looking for 1 ticket, the system could handle 3000 searches per hour. Either way, the result is only 3000 tickets per hour…

Some Results… Orders – Throughput Tests Running a combination of 1 and 3 ticket searches, the system was able to support roughly 1100 orders per hour. For a total sale of only 1700 tickets.

Some Results… Queue Re-Test After overhauling the SQL queries and application page that handles the user queuing the test was re-run. 10,000 users were added to the system and no further errors were encountered.

Some Results… Summary –By the end of the first round of tests, only the Queuing system and login had achieved their goals. –The search engine had only achieved 10% of its goal. –The purchase process was closer to 5% of its goal. The application needed more work, and unfortunately a major event was looming… The client made the hard decision to delay the release, find a new vendor for the ticket search engine, and force the purchase process vendor to fix their code.