How OPNFV Should Act Beyond Breaking Points

How OPNFV Should Act Beyond Breaking Points
Under Stress How OPNFV Should Act Beyond Breaking Points BOTTLENECKS

Contents Considerations on Stress Testing
Highlights of Discussion in Testperf Test Cases in Danube Release Comparison Results for Stress Ping What We Should Do Next

Stress Testing == Load Testing?
Stress testing is a kind of testing determines the robustness of software by testing beyond the limits of normal operation It tests under unfavorable conditions Load Testing is to test for meeting certain standards or requirements There is a need to tune the system Stress Testing != Load Testing Different purpose Different testing conditions Different testing/analyzing method Load Testing Stress Testing

Do We Break the System for Fun?
Apart from providing PASS or FAIL result. Stress testing can also provide more detailed result with reliability, the probability that system will survive during a given time interval. The reliability could be another indication of the level of confidence for the system. It also allows observation of how system react to the failures. Additional purpose behind this madness is to make sure that the system fails and recovers gracefully - a quality known as recoverability. Stress testing will provides level of confidence of the system to users.

Questions when Executing Stress Tests
What is the first thing crashed, how and why? Does it save its state or does it crash suddenly? Does it just hang and freeze or does it fail gracefully? Could the system/component recover gracefully? On restart, is it able to recover from the last good state? Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes? Is the security of the system compromised because of unexpected failures? The list goes on.

Accelerate the Maturity and Adoption of OPNFV
Stress Test Breaking Points Reliability

Stress Testing for Danube – Highlights
Stress testing principles and test cases discussion in Testperf Test Requirements The stress test should from a user perspective be Easy to understand (what the test does and how the system is being stressed) Easy to run (i.e. "out-of-the-box" ... having deployed the OPNFV platform it should be possible to run the test with minimal effort) Where possible use existing proven methods, test-cases and tools Should be possible to work with all OPNFV release scenarios For Danube the stress test result is not part of our release criteria however for future releases a stress test threshold (metric TBD) should be part of the release criteria It should be possible to increase and decrease the load ("stress") and monitor the effect on the system with a simple performance metric The application running on SUT must be highly optimized to avoid being the bottleneck

Test Cases in Discussion
VM1 VM2 spawn destroy throughput cpu limit ping TC1 TC2 TC3 TC5 TC4 Data-Plane Traffic Determine baseline test cases Life-Cycle Events Perform VM pairs/stacks testing Easy-to-Understand Easy-to-Run Heavy Load Robustness Availability 1 Hour Max? General Release criteria? confidence

Test Cases in Discussion
Categories Test Case Description Data-plane Traffic for a virtual or bare metal POD TC1 –Determine baseline for throughput Initiate one v/b POD and generate traffic Increase the package size Throughput, latency and PKT loss up to x% TC2 - Determine baseline for CPU limit Decrease the package size Measure CPU usage up to x% Life-cycle Events for VM pairs/stacks TC3 – Perform life-cycle events for ping Spawn VM pairs, do pings and destroy VM pairs Increase the number of simultaneous live VMs latency and PKT loss up to x% Testing time, count of failures, OOM killer? TC4 – Perform life-cycle events for throughput Spawn VM pairs, generate traffic and destroy VM pairs Serially or paralleled increase the package size Max load and PKT loss vs load Throughput, latency and PKT loss up to x% for either pair TC5 – Perform life-cycle events for CPU limit Serially or paralleled Decrease the package size Measure the CPU usage up to x%

Test Cases in Danube TC1 – Determine baseline for throughput
Preliminary work from Bottlenecks & Yardstick Load Manager Load Category “Data-Plane Traffic”

Test Cases in Danube TC3 – Perform life-cycle events for ping
Yardstick 12 t1 t4 t5 t0 t2 t3 T6? 5 10 20 50 100 200? Load Manager create destroy ping run in parallel Initial stress test Load Manager Run in parallel Time ends or Fail Start End Iterate Increase load Criteria Check Testing Flow Visualization Bottlenecks TC3 – Perform life-cycle events for ping Can not determine the exact time for expected number of VMs If VMs successly built Quotas Type 1

Comparison Results for Stress Ping
Testing Contents Executing Stress Test and Provide comparison results for different installers (Installer A and Installer B) Up to 100 stacks for Installer A (Completes the test) Up to 40 stacks for Installer B (System fails to complete the test) Testing Steps Enter the Bottlenecks Repo: cd /home/opnfv/bottlenecks Prepare virtual environment: . pre_virt_env.sh Executing ping test case: bottlenecks testcase run posca_factor_ping Clean the virtual environment: . rm_virt_env.sh Video Demo

Comparison Results for Stress Ping
Testing for Installer A Up to 100 stacks in configuration file for Installer B 1 stack SSH error when number of stack raised to 50 When stack number up to 100, most of the errors are heat response time out 100 stacks are established successfully in the end Testing for Installer B Up to 40 stacks in configuration file for Installer B When stack number up to 30, the system fails to create all the stacks 21 stacks are either created failure or keeping in creation To verify the system performance, we choose to do clean-up and run the test again When stack number up to 20, same situation happens as in the last test The system performance degrades Different to the test for Installer A, we do the verification step because the system clearly malfunctions. Which not shown in the demo is that after 3 rounds of the stress test, the system fail even to create 5 stacks

What Should We do Next? Stress Test Cases from Testing Projects
Collecting Requirements Joint Effort between Testing Community Analytic Tools for Breaking Points (Root Causes) Tracking the Failures Models for determine the root causes Quantitate the LOC or Reliability Long Duration PODs for Stress Testing over OPNFV Releases Feedback bugs, provide LOC or Reliability, etc.

Thank You

How OPNFV Should Act Beyond Breaking Points

Similar presentations

Presentation on theme: "How OPNFV Should Act Beyond Breaking Points"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How OPNFV Should Act Beyond Breaking Points

Similar presentations

Presentation on theme: "How OPNFV Should Act Beyond Breaking Points"— Presentation transcript:

Similar presentations

About project

Feedback