Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experts Helping Experts: Rapid Bottleneck Identification (R.B.I.) Methodology as used in Load Testing Managed Services Dan Koloski CTO and Director of.

Similar presentations


Presentation on theme: "Experts Helping Experts: Rapid Bottleneck Identification (R.B.I.) Methodology as used in Load Testing Managed Services Dan Koloski CTO and Director of."— Presentation transcript:

1 Experts Helping Experts: Rapid Bottleneck Identification (R.B.I.) Methodology as used in Load Testing Managed Services Dan Koloski CTO and Director of Strategy Web Business Unit, Empirix IMAGE

2 Agenda About Empirix and e-LoadExpert Understanding the value proposition of Managed Services The Theory - Methodology Overview Importance of Throughput testing The Experience – Case Study – A New Online Ticketing System Summary Questions

3 Empirix – At a Glance Testing & Monitoring Solutions for: Privately held, founded 2000 380+ employees Worldwide presence; HQ in Boston & offices in Japan, UK, Germany and South Korea Patents in test, monitoring and VoIP diagnostics methodology Web Applications & SOA Contact Center & CRM Carrier-Class VoIP & Telco

4 Quality Solutions to Fit Your Needs Empirix Offerings Delivered Outcome Empirix Offerings Delivered Outcome PRODUCTS MANAGED SERVICES EXPERT SERVICES Onsite Consulting, Training, Mentoring e-TEST suite ™ OneSight™ e-LoadExpert™ Do-it-yourself Remote Validation Performance Tuning Supplement Your Team We Do it for You

5 “Communication” NEQ “Reports” in Load Testing Collaboration during testing is essential for efficiency … … so your load tools should allow for collaboration –Access and manipulate load test data live over the web Individual component owners (DBAs, Middleware, Apps, Network, ISP, Hoster, LOB, etc.) need to analyze load test data during the test…...but the data each one needs is different! –Multiple users logged into the same load session separately –No desktop sharing!

6 Managed Services: An often-forgotten approach Load testing is… –high-intensity –short-duration –low-frequency Challenge to maintain… –Tool skills –Environment availability –Proper methodology –Consistent team Sometimes build/buy/lease and execute yourself isn’t the best approach

7 Managed Services: A different approach Complement to in-house testing –or a complete alternative in some cases Our testers (tool and methodology experts) Our load infrastructure (high-availability, short notice) Our tools (no need to buy) Your production or staging environment Your application and infrastructure experts WHOLE IS GREATER THAN THE SUM OF THE PARTS

8 In Load Testing, Communication is Real-Time e-Load 8 is Web-enabled to allow collaboration Massive tuning efficiency gains when used properly Invite the business to attend! Each user can view results from the same load test as it runs

9 e-LoadExpert: Managed Load Test Services Fully managed, outsourced performance & tuning service Award-winning high-capacity infrastructure built on Empirix’s e-Load Superb methodology delivered by expert consultants with thousands of hours of experience Ongoing reporting throughout the test and a comprehensive summary report upon completion A hosted performance tuning service managed and executed by Empirix experts using Empirix infrastructure.

10 The Art of Load Testing Aristotle said, ‘Theory plus Experience equals Art’ The Theory is Empirix’ RBI Load Testing The Experience is work of the eLoadExpert team presented here The Art of Load Testing then is the skill you will begin to develop through implementing RBI load testing with the added benefit of our experience.

11 RBI Load Testing Fast – Focusing on Throughput Testing Take less Time (minutes vs hours) Simple – Much testing can be done without even looking at the application Modular & Iterative – Focus on one piece of the puzzle until it works, then proceed The ‘Knowledge’ is built in – when problems each previous step has already been ruled out Easy to Plan

12 How the Methodology Can Help You This methodology is based on years of performance testing experience. Something will be learned from every test run. By uncovering bottlenecks in this stepped approach you can isolate issues in components of which you have limited knowledge. Very easy to replicate for future testing. Easy to re-test. By focusing on throughput testing, you save time.

13 Our Theory: RBI Load Testing What is Load testing? “Testing conducted to isolate and identify the system and application issues (bottlenecks) that will keep the application from scaling to meet its performance requirements”

14 What is a Bottleneck? Any resource (hardware, software or bandwidth) that places defining limits on data flow or processing speed: your application is only as efficient as its least efficient element On the Web, bottlenecks directly affect performance and scalability Most untested systems have more than one bottleneck, but they can only be identified and resolved one at a time.

15 Where is the Bottleneck in this example? The more variables in a test, the harder it is to determine what is causing the bottleneck. 3 Scripts 10+ pages each Browsing, Searching, Adding to Shopping Cart and Purchasing

16 Our Theory: RBI Load Testing Rapid Bottleneck Identification (RBI) Load testing starts with some basic assumptions: 1.All web applications have bottlenecks. 2.These bottlenecks can only be uncovered one at a time. 3.Focus on where the bottlenecks are most likely to be found. At a high level: –Isolates bottlenecks quickly by starting with the simplest possible tests, working to build in complexity, so that when an issue is uncovered all of the simpler causes have been ruled out. –Its fast because we focus on the bottlenecks first, and where they are most likely to be found – in the throughput….

17 Testing Web Applications The Key to successful performance testing of any application (not just for the web) is to understand how real users use the application. For web applications, this means knowing: 1.What pages of the application users hit, and in what percentage 2.How many users access the application, and how quickly they navigate the application. This gives us two key sets of data: –What to test –How to test it

18 What to test: For an application that is currently deployed, this can be gathered from web tracking tools like WebTrends For example: These daily averages tell us that the most commonly hit pages are the Homepage and the search page. And that currently 6200 orders are placed per day. Two more piece of data we’ll use later: Users, on average, spend 9 minutes on the site and look at 17 pages.

19 How to test it: Empirix recommends testing key functionalities independently at first, and then together in real world scenarios. Using the previous example: –Test the homepage –Test the search page –Test the business critical Checkout pages –Put them all together in a scenario

20 Where are the Bottlenecks? In our experience, bottlenecks can be found anywhere in the system infrastructure / network, application code, or the database.

21 Where are the Bottlenecks? But the vast majority of these issues are caused by throughput limitations – not concurrency.

22 Throughput Vs Concurrency Throughput: The measurement of the flow of data, e.g.hits/second, pages/second, Mbps (megabits/second). Throughput tests are run with 1 second page delays between requests, ramping until the throughput rate fails to keep up with the increase in load. It’s not JUST bandwidth. Concurrency: The measurement of independent users that a system can support. On the system level, concurrency is limited by sessions, and socket connections. On the application level, flaws in the code can limit concurrency, as can incorrect settings in server configuration. Concurrency tests are run by ramping a number of users running with realistic page delay times, ensuring that the ramp up is slow enough to gather data at meaningful points.

23 Throughput Vs Concurrency Pages/Second ThroughputConcurrency 100 Vus 1 sec delay 1000 Vus 10 sec delay 100 sessions 1000 sessions Is the load generated from a 100 Virtual User load test, with 1 second think times, equivalent to a 1000 Virtual User load test with 10 second think times? So when we say an bottleneck is related to throughput not concurrency, we mean it is a limitation on the amount of data the system can supply at any given time regardless of the number of users requesting the data.

24 Using RBI Load Testing Going back to our example application: –Start with throughput testing of the most commonly hit pages to ensure they meet the application goals. –Start simply, running tests to hit each page or functionality (homepage, search, checkout) on their own before putting them together. –Then run a real world scenario concurrency test based on the percentages seen in the logging tool: –50% of users hitting the homepage and browsing –25% of users search and browsing –25% adding to cart and checking out* * This is a higher percentage than seen in the statistics, but takes into account higher sales conversions during the peak Christmas season

25 Concurrency tests must accurately reflect what users are doing and how quickly they are doing it. The last piece then is how quickly should users run the scripts. Using the data presented earlier we can give an example: –If users spend 9 minutes in a typical session –And look at 17 pages –Then the user think time should be: 31 seconds - (9*60)/17 Run the scenario, ramping slowly (1 user every few seconds) and watch response times as users are added. When response times increase a bottleneck has been reached, and customer may look for other options… Using RBI Load Testing (cont’d)

26 So How Would We Test an Application? Step into the system. Start simple. If test passes requirements then move on. If not, fix the problem and retest. If a step fails a requirement there is no sense to proceed because adding in the next layer of complexity can only make performance worse.

27 Basic Network Tests What are common network issues? 1. Bandwidth 2. Hit Rate 3. Connections How to test for network issues: 1. Throughput test of very large image (small number of hits = lots of bandwidth) 2. Throughput test of very small file (large number of hits but no danger of saturating bandwidth) 3. Concurrency test of a test file with long page view delay (open lots of connections that do very little work)

28 Network Sample Issue

29 Web Servers Mostly covered as part of the network tests, but can have issues of their own. Typically not a problem unless using SSL, in which problems will manifest as either high CPU utilization or connection errors (12000 level).

30 Application Servers Where the code is processed to deliver dynamic pages. Common Application Server Issues: –High CPU utilization –Memory consumption, memory leaks A thought to keep in mind… the difference in page per second rate between a blank application page (ASP, JSP, etc.) and your actual application pages represents how much code tuning can increase your throughput.

31 Application Servers How to test for Application Server issues: 1. Throughput test of test application page. TEST APPLICATION PAGE 2. Throughput test of homepage (if an application page) or representative application pages. Just changing the extension on an HTML page (to.ASP or.JSP for example) will be sufficient to cause a decrease in throughput.

32 Application Server Sample Issue

33 Database Servers How do database issues manifest? 1.High CPU utilization on DB Server 2.Queued Requests on Application Servers Usually caused by inefficient SQL queries or poor DB optimization. To test the database separate from the application create a simple query to pull a single piece of data from the database

34 Now We Get to the Application… If we have come this far without encountering issues, any problems are caused by the application itself. Poorly performing pages (errors or high response times) need to be investigated for code optimization. –Response times will be the same at 1000 users as they were at 1 user if no bottlenecks are encountered. –Therefore, response times at this point in the testing are only useful as indicators of issues. It really is that simple…

35 Testing the Application - Throughput Again test iteratively and step into the application: For example, if users login to your site, search then add items to the cart and make purchases, use the following series of tests: 1. Homepage 2. Homepage + login 3. Homepage + login + search 4. Homepage + login + search + add to cart 5. Homepage + login + search + add to cart + check out As steps are added, degradation in response times or page throughput will be caused by the newly added step, making it easier to isolate what code needs to be looked at.

36 Putting it All Together – A Sample Test Plan Throughput Tests BandwidthRamp up users, 1 VU every 1 second, requesting a large file (loadtest.bmp) with a 1 second page view delay, run until bandwidth bottleneck is reached. Hit RateRamp up users, 1 VU every 1 second, requesting a small file (1kb.txt) with a 1 second page view delay, run until page/sec bottleneck is reached. Test Application Page (with & without DB access) Ramp up users, 1 VU every 1 second, requesting a test application page (loadtest.jsp) with a 1 second page view delay, run until page/sec bottleneck is reached. HomepageRamp up users, 1 VU every 1 second, requesting the application homepage with a 1 second page view delay, run until page/sec bottleneck is reached. Concurrency TestsHomepageRamp up users, 1 VU every 1 second, requesting the homepage with a realistic think time delay script up to maximum users. Basic System Level Testing

37 Putting it All Together – A Sample Test Plan, cont’d Throughput Tests BrowseRamp up users, 1 VU every 1 second, running browse script with a 1 second page view delay, run until bandwidth bottleneck is reached. SearchRamp up users, 1 VU every 1 second, running search script with a 1 second page view delay, run until page/sec bottleneck is reached. Add to CartRamp up users, 1 VU every 1 second,running add to cart script with a 1 second page view delay, run until page/sec bottleneck is reached. All 3 scriptsRamp up users, 1 VU every 1 second, running all 3 scripts with a 1 second page view delay, run until page/sec bottleneck is reached. Concurrency TestsAll 3 scriptsRamp up users, 1 VU every 1 second, running all 3 scripts with a realistic think time delay script up to maximum users. Application Level Testing

38 Summary The RBI methodology can be utilized for any and all web applications. The methodology can be summed up as quickly isolating bottlenecks by isolating components, starting with the simplest and building in complexity. By breaking a system into its component parts, and starting with the simplest components all of the bottlenecks can be uncovered the only way they can be uncovered – One at a time.

39 Where to find answers: QAZone http://qazone.empirix.com Free resource, open to all (Empirix & non-Empirix users) –Forums (both QA-centric and Product related discussions) –Knowledge Base –Resource Center –Events, Announcements, Community & Industry News Allows members to interact with peers, industry experts and a larger Empirix audience

40 Q&A For more information: Dan Koloski Web BU CTO and Director of Strategy Empirix dkoloski@empirix.com dkoloski@empirix.com http://www.empirix.com

41 Appendix – R.B.I in action

42 Our Methodology In Action Online Ticketing System for a Large Stadium The expected volume of traffic is 30,000 tickets per hour with the average user buying 3 tickets. The ticket purchasing site is really made of two sites: one for ticket purchasing and another to queue the users when the purchasing system has reached capacity. Tickets can be purchased by users that have pre- registered and also by people who will register at the time of purchase.

43 Creating the Test Plan & Goals Components –Queue – 15,000 in Queue –Pipeline must support 10,000 orders per hour –Login –Register – Withdrawn from this set of tests –Search –Add to Cart and Purchase A workaround was developed to bypass the queue and go directly into the purchase pipeline.

44 Some Results… The Queue - Concurrency Test The queue could only support 5000 sessions before the SQL server maintaining the sessions would peg on CPU and start dropping the sessions. With 6500 users on the system the error rate was almost 10%.

45 Some Results… Logins – Throughput Tests – The login function, the only ‘home grown’ piece, scaled to 65000 logins per hour. There were some errors, but these were at a throughput level much in excess of what is needed,

46 Some Results… The Search Engine – Throughput Tests The Results weren’t encouraging. When searching for 3 tickets at a time the system could support 1000 searches per hour. When looking for 1 ticket, the system could handle 3000 searches per hour. Either way, the result is only 3000 tickets per hour…

47 Some Results… Orders – Throughput Tests Running a combination of 1 and 3 ticket searches, the system was able to support roughly 1100 orders per hour. For a total sale of only 1700 tickets.

48 Some Results… Queue Re-Test After overhauling the SQL queries and application page that handles the user queuing the test was re-run. 10,000 users were added to the system and no further errors were encountered.

49 Some Results… Summary –By the end of the first round of tests, only the Queuing system and login had achieved their goals. –The search engine had only achieved 10% of its goal. –The purchase process was closer to 5% of its goal. The application needed more work, and unfortunately a major event was looming… The client made the hard decision to delay the release, find a new vendor for the ticket search engine, and force the purchase process vendor to fix their code.


Download ppt "Experts Helping Experts: Rapid Bottleneck Identification (R.B.I.) Methodology as used in Load Testing Managed Services Dan Koloski CTO and Director of."

Similar presentations


Ads by Google