Presentation on theme: "Tales from the Lab: Experiences and Methodology Demand Technology User Group December 5, 2005 Ellen Friedman SRM Associates, Ltd."— Presentation transcript:
Tales from the Lab: Experiences and Methodology Demand Technology User Group December 5, 2005 Ellen Friedman SRM Associates, Ltd
Ellen Friedman, SRM Associates, Ltd. Testing in the Lab Experiences of a consultant Taming the Wild West Bringing order to Chaos HOW? Methodology- Capacity Planning,SPE, Load Testing, Discipline Checklists/Procedures What happens when procedures arent followed Detective Work
Ellen Friedman, SRM Associates, Ltd. Agenda Introduction Software Performance Engineering and Benefits of Testing Back to Basics: Workload Characterization/Forecasting Capacity Planning Building the Test Labs Testing Considerations Scripts and test execution Some Examples Documenting the test plan and reporting results Summary
Ellen Friedman, SRM Associates, Ltd. Software Performance Engineering Performance engineering is the process by which new applications (software) are tested and tuned with the intent of realizing the required performance. Benefit: Identify problems early-on in the application life-cycle Manage Risk Facilitates the identification and correction of bottlenecks to Minimize end to end response time Maximize application performance
Ellen Friedman, SRM Associates, Ltd. Should we bother to Test?? WE CANT PLAN FOR WHAT WE DONT KNOW
Ellen Friedman, SRM Associates, Ltd. What do we need to achieve? Scalability Predictable scaling of software/hardware architecture Do we have capacity to meet resource requirements? How many users will system handle before we need to upgrade or add web servers/app servers Stability Ability to achieve results under unexpected loads and conditions Performance vs Cost Achieving SLA and minimizing cost
Ellen Friedman, SRM Associates, Ltd. Testing throughout the application lifecycle Cost of Fixing a problem late in the development is extremely $$$$$$$
Ellen Friedman, SRM Associates, Ltd. What is a Performance Test Lab? A facility to pro-actively assess the satisfactory delivery of Service to users prior to system Implementation or roll-out. - A test drive capability.
Ellen Friedman, SRM Associates, Ltd. Lab- What is it Good For? Before you deploy the application- create an environment that simulates the production environment Use this environment to reflect the conditions of target production environment
Ellen Friedman, SRM Associates, Ltd. Evaluate system Develop Scripts Test Strategy Execute Baseline Tests Validate Baseline Run Controlled Benchmarks Analyze ResultsReport Findings SLAs, Workload Characterization, Volumes Obtain tools, methodology, build scripts Run the tests in the lab and obtain baseline Ensure that test scripts adequately represent the production environment Testing Plan Analyze Results
Ellen Friedman, SRM Associates, Ltd. Evaluate System: Workload Characterization Identify Critical Business Functions Define Corresponding System Workloads/Transactions Map business workloads to system transactions Identify flow of transactions through the system Identify current and expected future volume Determine resource requirements for business-based workloads at all architectural tiers Web server, Applications server, Database server
Ellen Friedman, SRM Associates, Ltd. Evaluate System: Workload Forecasting Define key volume indicators for What are the drivers for volume and/or resource usage for the system? Examples: Banking: Checks processed Insurance: Claims processed Financial: Trades processed Shipping: Packages processed
Ellen Friedman, SRM Associates, Ltd. Workload Forecasting: Historical Review Does the business have a set peak? December for retail, and shipping Peak/Average Ratio? 20% or 30% higher? Volume vs. Resource Usage Larger centers require greater computing resources Need to determine scaling of hardware/software resources as a function of volume
Ellen Friedman, SRM Associates, Ltd. Volume vs. Response Time Scale: Volume *1000 PPH
Ellen Friedman, SRM Associates, Ltd. Service Level Considerations e-Business System:Tracking System for Package Inquiries: WHERE IS MY PACKAGE? Call center handles real-time customer inquiries SLA- caller cannot be put on hold >3 minutes 90% of all calls should be cleared on first contact Responsiveness to customer needs Web-interface for customers Page load time and query resolution <6-8 seconds
Ellen Friedman, SRM Associates, Ltd. Lab can be used throughout the Application Lifecycle Testing throughout the Application Life Cycle Planning Design/ coding Development/testing/UAT Production Deployment Post-production-change management Optimization (performance and volume testing) Labs reduce risk to your production environment Solid testing leads to cleaner implementations !!
Ellen Friedman, SRM Associates, Ltd. How many Labs? Where to put them Locations for testing in various technical, business, or political contexts. The following factors influence the decisions you make about your test environment: Your testing methodology Features and components you will test PEOPLE, MONEY, Location Personnel who will perform the testing Size, location, and structure of your application project teams. Size of your budget. Availability of physical space. Location of testers. Use of the labs after deployment.
Ellen Friedman, SRM Associates, Ltd. Types of Labs and their Purpose Application unit testing Hardware or software incompatibilities Design flaws Performance issues Systems integration testing lab User Acceptance Testing (UAT) Application compatibility Operational or deployment inefficiencies Windows 2003 features Network infrastructure compatibility Interoperability with other network operating systems Hardware compatibility Tools (OS, third-party, or custom) Volume testing lab Performance and capacity planning Baseline traffic patterns traffic volumes without user activity Certification Lab Installation and configuration documentation Administrative procedures and documentation Production rollout (processes, scripts, and files; back-out plans)
Ellen Friedman, SRM Associates, Ltd. Testing Concepts 101 Define the problem- Test Objectives Limit the scope Establish metrics & analysis methodology Tools/analysis Establish the environment Design the test bed Simulate the key business functions Develop scripts and their frequency of execution
Ellen Friedman, SRM Associates, Ltd. Testing Process 101 Ensure that Lab mimics production ( H/W, S/W, Workload/business functions being tested ) Test measurement tools and develop analysis tools ARM the application Instrumentation to provide end to end response time Instrumentation to provide business metrics to correlate Execute controlled test Single variable manipulation Ensure repeatability Analyze data & repeat if required (e.g., tune system) Extrapolate Document Test set-up and results
Ellen Friedman, SRM Associates, Ltd. Developing the script: Meet with the Business Team, Applications Team to understand the workload. What is typical? What is most resource intensive. Determine the appropriate mix of work Typical navigation and screen flow % of time each screen is accessed by user Number of users to test with, number of different accounts to use (other factors impacting representative ness of test) Include cases to test resource intensive activities and functions Include cases where user may abandon session because r/t is too long Test for time-outs
Ellen Friedman, SRM Associates, Ltd. Load Testing Parameters Simulating Volume and distribution of arrival rate Hourly volume- distribution is not uniform, Bursty arrival rate Web sessions are only about 3 minutes long When is traffic heaviest? How long does the user spend at the site? Need to vary the number of users started over the hour/User Think Time Package Shipping Example: Different from web site- more predictable Arrival rate: highest in first hour Limited by capacity of site to load the packages/speed of belts etc. Package scanning: some automated but still has human involvement
Ellen Friedman, SRM Associates, Ltd. Note: reduction in read bytes/sec over time X read bytes/second over time How long should the test run? Need to reach steady state! Test run is four hours here!
Ellen Friedman, SRM Associates, Ltd. Creating the Test Environment in the Lab Creating the data/database Copy database from production- subset it Manually key/Edit some of the data Create image copy of system for use in each run Verifying the test conditions Utilize ghost imaging or software such as Powerquest or Live State to save the database and system state between test runs May need to also verify configuration settings that arent saved in the image copy Make sure that you are simulating the correct conditions (End of Day/Beginning of Day/Normal production flow) Scripting the key business functions Vary the test data as part of scripting Vary users/accounts/pathing
Ellen Friedman, SRM Associates, Ltd. What type of staff do we need? Programmers Korn Shell Programmers Mercury Mavens?
Ellen Friedman, SRM Associates, Ltd. Establish Metrics & Analysis Methodology Based on the testing objectives, what data do we need to collect and measure? CPU, Memory, I/O, network, response time What tools do we need for measurement? Do not over-measure Dont risk over-sampling and incurring high overhead Create a Template to use for comparison between test runs
Ellen Friedman, SRM Associates, Ltd. Build a Template for Comparison Before vs. After Comparison of Test Cases Collect the performance data- Metrics CPU: Processor Metrics System, User and Total Processor Utilization Memory: Available bytes, Page reads/second, Page Ins/second, Virtual/Real bytes Network Bytes sent/received, Packets sent/received per NIC Disk Reads and Writes/second, Read and Write bytes/second, Seconds/Read, Seconds/Write, Disk utilization Process: SQL Server (2 instances) CPU Working set size Read/Write bytes per second Database- SQL Database Reads/Writes per instance, Stored Procedure Timings Log Bytes flushed per database
Ellen Friedman, SRM Associates, Ltd. CASE STUDY Packaging- Shipping System Many centers throughout the country Same Applications Same Hardware Testing in the lab is required to identify bottlenecks and optimize performance SLA not being met in some larger centers Suspect Database Performance
Ellen Friedman, SRM Associates, Ltd. Case Study Configuration Architecture Database Server: Runs 2 Instances of SQL (Main, Reporting) Databases are configured on the X drives TempDB and Logs are configured on D drive
Ellen Friedman, SRM Associates, Ltd. Scanning the package on the Belt IF SLA not met packages arent processed automatically Additional manual work is required to handle exceptions
Ellen Friedman, SRM Associates, Ltd. Case Study – Hardware Application Server Database Servers Database 1 Database 2 Database Server- DB #1 G3 (2.4 GHz) with 4 GB memory Raid 10 Configuration Internal 1 C/D logically partitioned External (10 slots) 2 X drives- mirrored 2 Y drives- mirrored Application Server G3 (2.4 GHz) with 3 GB memory 2 Internal Drives (C/D) Database Server- DB # 2 G3 (2.4 GHz) with 4 GB memory Internal 1 C/D logically partitioned 2 X mirrored drives
Ellen Friedman, SRM Associates, Ltd. Case Study: Software and OS Windows 2000 SQL Server Database Instances Reporting Main Instance- Multiple Databases Replication of Main Instance to Reporting Instance on the same server Main Instance and Reporting Instance share same drives
Ellen Friedman, SRM Associates, Ltd. Case Study: When do we test in the Lab? Hardware Changes OS Changes Software patch level changes to main suite of applications Major application changes Changes to other applications which coexist with primary application suite.
Ellen Friedman, SRM Associates, Ltd. Checklists and Forms Test Objectives Application Groups must identify: Specific application version to be tested as well as those of other co-dependent applications Database set-up to process the data Special data Workstation set-up Volume- Induction rate/flow(arrival rate) Workflow and percentages Scripts/percentage/flow rate
Ellen Friedman, SRM Associates, Ltd. Case Study: Hardware Checklist
Ellen Friedman, SRM Associates, Ltd. Sign-offs on Procedures/Pre-flight Who? Applications team Lab group Systems groups Network Distributed Systems Database Performance
Ellen Friedman, SRM Associates, Ltd. Script Development: Collected data from Production Systems Applications to include for testing and to be used to determine resource profiles for key transactions and business functions Volumes to test with Database conditions including: database size, database state requirements (e.g. end of day conditions) Application workflow- based on operational characteristics in various centers a. Job and queue dependencies b. Requirements for specific data feeds to include
Ellen Friedman, SRM Associates, Ltd. Case Study: Developing a Script Major business functions for labeling and shipping: Verifying the name and address of the item to be shipped Interface to other system and uses algorithms for parsing names/addresses Route planning- interface with OR systems to optimize routing Scanning the package information (local operation) Determining the type of shipment: freight/letter/overnight small package for shipping the item, and the appropriate route Sorting the packages according to type of shipment Printing the smart labels how/where to load the package Tracking the package
Ellen Friedman, SRM Associates, Ltd. Case Study: Performance Testing in the Lab Production Analysis indicated: Insufficient memory to support database storage requirements Resulting in increased I/O processing OPTIONS Add memory Not feasible requires OS upgrade to address more than 4 GB of storage with Windows 2000 Standard Edition Make the I/O faster- faster drives or more drives Spread the I/O across multiple drives (external disk storage is expandable up to 10 slots available) Separate the database usage across 2 sets of physical drives Split the database across multiple servers (2 database servers) Easier upgrade then OS change Change the database design (Expected in 1Q2006, testing now)
Ellen Friedman, SRM Associates, Ltd. Planning: Testing out the configuration options Test out each of the options and provide a recommendation SLA: 99% of packages must complete their processing in under 500 milliseconds Each option was evaluated based on its relative ability to satisfy the SLA criteria.
Ellen Friedman, SRM Associates, Ltd. Validating the baseline: Taming the West! If you cant measure it, you cant manage it! (CMG slogan)
Ellen Friedman, SRM Associates, Ltd. Case Study What are we measuring? 1. End to End Response Time (percentiles, average) 2. SQL Stored Procedure Timings (percentiles, average) SQL Trace information summarized for each stored procedure for a period of time 3. Perfmon: System, Process, SQL (average, max) CPU, Memory, Disk Process: Memory, Disk, Processor SQL: Database Activity, Checkpoints, Buffer Hit etc.
Ellen Friedman, SRM Associates, Ltd. Validating the Baseline Data from two production systems was obtained to produce: Test database from multiple application systems Database states were obtained, system inter-dependencies were satisfied, application configuration files Baseline test was executed- Multiple Iterations Performance measurements from two other systems were collected and compared against baseline execution Results were compared Database and scripts were modified to better reflect production conditions
Ellen Friedman, SRM Associates, Ltd. Story: Creating a new Environment A series of performance tests were conducted in Green Environment to evaluate I/O performance To be reviewed in presentation on Thursday Green Environment was required for another project. So moved to a new Red Environment Data created from a different source (2 different production environments) Simulating high volume What happened? Different page densities Different distribution of package delivery dates Different database size for critical database Red was much fatter!
Ellen Friedman, SRM Associates, Ltd. Analysis to evaluate new Baseline Compare I/O activity for Green and Red Metrics: End to End Response Time SQL Stored Procedure Timings SQL Activity Database Page Reads/Writes overall and for each database (X drive containing database) Log Bytes Flushed per second (each database)- D-drive (logs) SQL Read and Write bytes/second SQL reads and writes is overall so it includes database I/O and log activity Disk Activity Overall Drive D/X Read/Write bytes/second
Ellen Friedman, SRM Associates, Ltd. Comparing Overall Response Time Red vs. Green and Separate Server Green and Red tests with 2 mirrored pair of X drives are baselines Results of baselines should be comparable!!!
Ellen Friedman, SRM Associates, Ltd. Comparison of Green and Red Environments (X drive –database) Read Activity 16% higher Write Activity 38% higher
Ellen Friedman, SRM Associates, Ltd. Comparison of Green and Red Environments (D drive –Tempdb/logs) Read Activity 1% higher Write Activity 13% higher I/O activity is approximately same on D drive
Ellen Friedman, SRM Associates, Ltd. Comparison of I/O Load SQL Activity: Green vs. Red Increase in Reads in Red due to Main Increase in Writes in Red caused by both
Ellen Friedman, SRM Associates, Ltd. I/O Load Change: Main Instance Separate server vs. Baseline Read Activity is reduced by 43% with separate server
Ellen Friedman, SRM Associates, Ltd. Differences between Red and Green D Drive activity is approximately the same TempDB and logging X Drive activity is increased in Red environment Most of differences are due to an increase in Reads on X drive for Main Instance Implies that the database was much fatter Confirm this by reviewing Page reads/Page Writes per database from SQL statistics Review database sizes (unfortunately didnt have this data so we inferred it based on I/O data and SQL trace data) SQL trace data showed more Page Reads for key databases
Ellen Friedman, SRM Associates, Ltd. Red Environment: Comparing Three Days Background Several large databases: Main: UOWIS, PAS Reporting: Adhoc, UW1, Distribution 4-1: Replication turned off for UW1 database 4-4: Replication on for UW1 database 4-8: Separate server for UOWIS, replication turned on for UW1 Expectations 4-1 will perform better than 4-4 reduce I/O significantly Expect significant reduction in Reporting Database I/O 4-8 separate server will separate out the critical database Expect same amount of work performed as 4-4 but a reduction in Read Activity for UOWIS because data will now be in memory
Ellen Friedman, SRM Associates, Ltd. Reviewing Log Write Activity Note: No log bytes – no replication Of UW1 database on 4-4
Ellen Friedman, SRM Associates, Ltd. RED: Comparing Three Days Database Disk Activity Note: 4-8 UOWIS results are for separate server Increase in work performed on 4-8 vs. 4-4
Ellen Friedman, SRM Associates, Ltd. Comparing Database Reads/Writes Main Instance
Ellen Friedman, SRM Associates, Ltd. Comparing Database Reads/Writes Reporting Instance Total page reads for reporting instance should remain constant Why did it increase on 4-8?
Ellen Friedman, SRM Associates, Ltd. Where are the differences on the two days ? Note: Differences in Stored Procedure- Total Reads (logical) Data Cap Summary and in Belt Summary Reports (not main functionality)
Ellen Friedman, SRM Associates, Ltd. What have we uncovered about test differences? Processor usage approximately the same Amount of Write Activity per instance is same Reviewed log bytes/flushed for each instance Reporting instance performed more I/O- more reads Additional report jobs were executed on 4-8 and not on 4-4 Reports run 4 times per hour (every 15 minutes- causes burst in I/O activity) When UOWIS database is on the same server (sharing same drives as other Main Instance and Reporting Instance work) response time is higher Response Time is directly related to physical reads and physical disk read performance Spreading the I/O across more drives and/or providing more memory for the critical database instance improves performance
Ellen Friedman, SRM Associates, Ltd. Testing Summary Need to create and follow a test plan which outlines All pre-flight procedures Confirm that environment is ready to go Validate baselines Run tests in organized fashion following the plan Do a sanity check! Do results make sense Otherwise search for the truth- dont bury the results
Ellen Friedman, SRM Associates, Ltd. Measurement Summary The nature of performance data is that it is long tailed Averages arent representative Get percentiles Need to understand the variability of tests conducted Run the same test multiple times to obtain a baseline Helps you iron out your procedures Can get a measure of variability of test case so that you can determine if the change you are testing is significant If the variability experienced between your base test runs is small that is good- you have repeatability If the variability is large You need to make sure that any change you make shows an even greater change
Ellen Friedman, SRM Associates, Ltd. Reporting the Test Results:Template Executive Summary Graphs of results- e.g., end to end response time Scalability of solution Overall findings Background Hardware/OS/Applications Scripts Analysis of Results System and application performance Decomposition of response time Web tier, Application, Database Drill down again for details as necessary e.g., database metrics Next steps
Ellen Friedman, SRM Associates, Ltd. Summary Cant always simulate everything - do the best you can. Implement the change in production and go back to the lab to understand why it matched or didnt When you discover a problem, Apply what youve learned Make necessary changes to procedures, documentation, methodology- in the lab and recommend changes for outside the lab Improve the process, dont just bury or hide the flaws! Result: better testing and smoother implementations
Ellen Friedman, SRM Associates, Ltd. Questions????????? Contact Info: Ellen Friedman SRM Associates, Ltd Part II. To be presented at CMG Conference Thursday 9:15-10:15 Session 512 Measuring Performance in the Lab: A Windows Case Study