Presentation on theme: "Software Testing “Program testing can be used to show the presence of bugs, but never to show their absence!” —Edsger Dijkstra “There are only 2 hard problems."— Presentation transcript:
Software Testing “Program testing can be used to show the presence of bugs, but never to show their absence!” —Edsger Dijkstra “There are only 2 hard problems in Computer Science. Naming things, cache invalidation and off-by-one errors.” —Phil Haack
Humans are infallible; software is written by humans; expect software to have defects. Testing is the most common way of removing defects in software and improving the quality of software.
Outline Foundations; Motivations; Terminology Principles and Concepts Levels of Testing Test Process Techniques Measures Deciding when to stop
Defects are Bad At a minimum defects in software annoy users. Glitchy software reflects poorly on the company issuing the software. If defects aren’t controlled during a software project, they increase the cost and duration of the project. For safety critical systems the consequences can be even more severe.
Spectacular Failures Ariane 5, 1996 Rocket + Cargo = $500M Patriot Missile, 1991 Failed to destroy an Iraqi Scud missile which hit a barracks. Therac-25 Software defects between 1985 and 1987 lead to 6 accidents. Three patients died as a direct consequence.
Controlling defects in software
What is testing? Testing is the dynamic execution of the software for the purpose of uncovering defects. Testing is one technique for improving product quality. Don’t confuse testing with other, distinct techniques for improving product quality: –Inspections and reviews (sometimes called static testing) –Debugging –Defect prevention –Quality assurance –Quality control
Testing and its relationship to other related activities
Benefits of Testing Testing improves product quality (at least when the defects that are revealed are fixed), The rate and number of defects found during testing gives an indication of overall product quality. A high rate of defect detection suggests that product quality is low. Finding few errors after rigorous testing, increases confidence in overall product quality. Such information can be used to decide the release date. Or, it could mean…??? Defect data from testing may suggest opportunities for process improvement preventing certain type of defects from being introduced into future systems.
Errors, Faults and Failures! Oh my! Error or Mistake – human action or inaction that produces an incorrect result Fault or Defect – the manifestation of an error in code or documentation Failure – an incorrect result.
Software Bugs 1947 log book entry for the Harvard Mark II
Verification and Validation Verification and validation are two complementary testing objectives. Verification – Comparing program outcomes against a specification. “Are we building the product right?” Validation – Comparing program outcomes against user expectations. “Are we building the right product?” Verification and validation is accomplished using both dynamic testing and static evaluation (peer review) techniques.
Principles of Testing
Organization Who should do the testing? Developers shouldn’t system test their own code. There is no problem with developers unit testing their own code—they are probably the most qualified to do so—but experience shows programmers are too close to their code in order to do a good job at system testing their own code. Independent testers are more effective. Levels of independence: independent testers on a team; independent of the team; independent of the company.
The cost of finding and fixing a defect increases with the length of time the defect remains in the product
Cost to correct late-stage defects For large projects, a requirements or design error is often 100 times more expensive to find and fix after the software is released than during the phase the error was injected.
Correspondence between Development and different opportunities for Verification and Validation
Two dimensions to testing
Levels of testing Unit – testing individual cohesive units (modules). Usually white-box testing done by the programmer. Integration – verifying the interaction between software components. Integration testing is done on a regular basis during development (possibly once a day/week/month depending on the circumstances of the project). Architecture and design defects typically show up during integration. System – testing the behavior of the system as a whole. Testing against the requirements (system objectives and expected behavior). Also a good environment for testing non-functional software requirements such as usability, security, performance, etc. Acceptance – used to determine if the system meets its acceptance criteria and is ready for release.
Other types of testing Regression testing Alpha and Beta testing – limited release of a product to a few select customers for evaluation before the general release. The primary purpose of a beta test isn’t to find defects, but rather, assess how well the software works in the real-world under a variety of conditions that are hard to simulate in the lab. Customers’ impressions are starting to be formed during beta testing so the product should have release- like quality. Stress testing, load testing etc. – Smoke test – a very brief test to determine whether or not there are obvious problems that would make more extensive testing futile.
Regression Testing Imagine adding a 24-inch lift kit and monster truck tires to your sensible sedan: After making the changes you would of course test the new and modified components, but is that all that should be tested? Not by a mile!
Regression Testing [Cont]
Testing Objectives Conformance testing (aka correctness or functional testing) – does the observed behavior of the software conform to its specification (SRS)? Non-functional requirements testing – have non-functional requirements such as usability, performance and reliability been met? Regression testing – does an addition or change break existing functionality? Stress testing – how well does the software hold up under heavy load and extreme circumstances? Installation testing – can the system be installed and configured with reasonable effort? Alpha/Beta testing – how well does the software work under the myriad of real-world conditions? Acceptance testing – how well does the software work in the user’s environment?
Integration Strategies What doesn’t work? –All-at-once or Big Bang – waiting until all of the components are ready before attempting to build the system for the first time. Not recommended. What does work? –Top-Down – high-level components are integrated and tested before low level components are complete. Example high-level components: life-cycle methods of component framework, screen flow of web application. –Bottom-Up – low-level components are integrated and tested before top-level components. Example low-level components: abstract interface onto database, component to display animated image. –Incremental features
Advantages of Incremental/ Continuous Integration Easier to find problems. If there is a problem during integration testing it is most likely related to the last component integrated—knowing this usually reduces the amount of code that has to be examined in order to find the source of the problem. Testing can begin sooner. Big bang testing postpones testing until the whole system is ready.
Top-Down Integration Stubs and mock objects are substituted for as yet unavailable lower-level components. Stubs – A stub is a unit of code that simulates the activity of a missing component. A stub has the same interface as the low-level component it emulates, but is missing some or all of its full implementation. Subs return minimal values to allow the functioning of top-level components. Mock Objects – mock objects are stubs that simulate the behavior of real objects. The term mock object typically implies a bit more functionality than a stub. A stub may return pre-arranged responses. A mock object has more intelligence. It might simulate the behavior of the real object or make assertions of its own.
Bottom-Up Integration Scaffolding code or drivers are used in place of high-level code. One advantage of bottom-up integration is that it can begin before the system architecture is in place. One disadvantage of bottom-up integration is it postpones testing of system architecture. This is risky because architecture is a critical aspect of a software system that needs to be verified early.
Continuous Integration Top-down and bottom-up is how you are going to integrate. Continuous integration is when or how often you are going to integrate. Continuous integration = frequent integration where frequent = daily, maybe hourly, but not longer than weekly. You can’t find integration problems early unless you integrate frequently.
Test Process Test planning Test case generation Test environment preparation Execution Test results evaluation Problem reporting Defect tracking
Testing artifacts/products Test plan – who is doing what when. Test case specification – specification of actual test cases including preconditions, inputs and expected results. Test procedure specification – how to run test cases. Test log – results of testing Test incident report – record and track errors.
Test Plan “A document describing the scope, approach, resources, and schedule of intended test activities. It identifies test items, the features to be tested, the testing tasks, who will do each task, and any risks requiring contingency planning.” [IEEE std]
Test Case “A test case consists of a set of input values, execution preconditions, expected results and execution post-conditions, developed to cover certain test condition”
Oracle When you run a test there has to be some way of determining if the test failed. For every test there needs to be an oracle that compares expected output to actual output in order to determine if the test failed. For tests that are executed manually, the tester is the oracle. For automated unit tests, actual and expected results are compared with code.
Test Procedure “Detailed instructions for the setup, execution, and evaluation of results for a given test case.”
Incident Reporting What you track depends on what you need to understand, control and estimate. Example incident report:
Testing Strategies Two very broad testing strategies are: –White-Box (Transparent) – Test cases are derived from knowledge of the design and/or implementation. –Black-Box (Opaque) – Test cases are derived from external software specifications.
Black-Box Techniques Equivalence Partitioning – Tests are divided into groups according to the criteria that two test cases are in the same group if both test cases are likely to find the same error. Classes can be formed based on inputs or outputs. Boundary value analysis – create test cases with values that are on the edge of equivalence partitions
Equivalence Partitioning What test cases would you use to test the following routine? // This routine returns true if score is >= // 50% of possiblePoints, else it returns false. // This routine throws an exception if either // input is negative or score is > possiblePoints. boolean isPassing(int score, int possiblePoints); ID Input Expected Result ,-2 Exception 2 50,100 true...
Test Cases Write test cases covering all valid equivalence classes. Cover as many valid equivalence classes as you can with each test case. (Note, there are no overlapping equivalence classes in this example.) Write one and only one test case for each invalid equivalence class. When testing a value from an equivalence class that is expected to return an invalid result all other values should be valid. You want to isolate tests of invalid equivalence classes. Test Case #Test Case DataExpected OutcomeClasses Covered 15,10True1 230,30True1 319,40False2 4-1,10Exception4
Boundary Value Analysis Rather than select any element within an equivalence class, select values at the edge of the equivalence class. For example, given the class: 1 <= input <= 12 you would select values: -1,1,12,13.
Experience-Based Techniques Error guessing – “testers anticipate defects based on experience”
Defect Density Software engineers often need to quantify how buggy a piece of software is. Defect counts alone are not very meaningful though. Is 12 defects a lot to have in a program? Depends on the size of the product (as measured by features or LOC). –12 defects in a 200 line program = 60 defects/KLOC low quality. –12 defects in a 20,000 line program is.6 defects/KLOC high quality. Defect counts are more interesting (meaningful) when tracked relative to the size of the software.
Defect Density [Cont] Defect density is an important measure of software quality. Defect density = total known defects / size. Defect density is often measured in defects/KLOC. (KLOC = thousand lines of code) Dividing by size normalizes the measure which allows comparison between modules of different size. Size is typically measured in LOC or FP’s. Measurement is over a particular time period (e.g. from system test through one year after release) Might calculate defect density after inspections to decide which modules should be rewritten or give more focused testing. Be sure to define LOC. Also, consider weighting defects. A severe defect is worse than a trivial on.) Gives wrong incentive.
Defect Density [Cont] Defect density measures can be used to track product quality across multiple releases.
Defect removal effectiveness DRE tells you what percentage of defects that are present are being found (at a certain point in time). Example: when you started system test there were 40 errors to be found. You found 30 of them. The defect removal effectiveness of system test is 30/40 or 75%. The trick of course is calculating the latent number of errors at any one point in the development process. Solution: to calculate latent number of errors at time x, wait a certain period after time x to learn just how many errors were present at time x.
Example Calculation of Defect Removal Effectiveness
Levels of White-Box Code Coverage Another important testing metric is code coverage. How thoroughly have paths through the code been tested. Some of the more popular options are: –Statement coverage –Decision coverage (aka branch coverage) –Condition coverage –Basis path coverage –Path coverage
Statement Coverage Each line of code is executed. if (a) stmt1; if (b) stmt2; a=t;b=t gives statement coverage a=t;b=f doesn’t give statement coverage
Decision Coverage Decision coverage is also known as branch coverage The boolean condition at every branch point (if, while, etc) has been evaluated to both T and F. if (a and b) stmt1; if (c) stmt2; a=t;b=t;c=t and a=f;b=?;c=f gives decision coverage
Does statement coverage guarantee decision coverage? if (a) stmt1; If no, give an example of input that gives statement coverage but not decision coverage.
Condition Coverage Each boolean sub-expression at a branch point has been evaluated to true and false. if (a and b) stmt1; a=t,b=t and a=f;b=f gives condition coverage
Condition Coverage Does condition coverage guarantee decision coverage? if (a and b) stmt1; If no, give example input that gives condition coverage but not decision coverage.
Basis Path Coverage A path represents one flow of execution from the start of a method to its exit. For example, a method with 3 decisions has 2 3 paths : if (a) stmt1; elseif (b) stmt2; elseif (c) stmt3; if (a) stmt1; if (b) stmt2; if (c) stmt3; OR
Basis Path Coverage Loops in code make path coverage impractical for most programs. Each time through a loop is a new path. A practical alternative to path coverage is basis path coverage.
Basis Path Coverage Basis path coverage is the set of all linearly independent paths though a method or section of code. The set of linearly independent paths through a method are special because this set is the smallest set of paths that can be combined to create every other possible path through a method.
Basis Path Coverage A path represents one flow of execution from the start of a method to its exit. For example, a method with 3 decisions has 2 3 paths : if (a) stmt1; elseif (b) stmt2; elseif (c) stmt3; if (a) stmt1; if (b) stmt2; if (c) stmt3;
Path Coverage Path coverage is the most comprehensive type of code coverage. In order to achieve path coverage you need a set of test cases that executes every possible route through a unit of code. Path coverage is impractical for all but the most trivial units of code. Loops are the biggest obstacle to achieving path coverage. Each time through a loop is a new/different path.
Path Coverage How many paths are there in the following unit of code? if (a) stmt1; if (b) stmt2; if (c) stmt3;
Path Coverage What inputs (test cases) are needed to achieve path coverage on the following code fragment? procedure AddTwoNumbers() top: print “Enter two numbers”; read a; read b; print a+b; if (a != -1) goto top;
Deciding when to stop testing “When the marginal cost finding another defect exceeds the expected loss from that defect.” Both factors (cost of finding another defect and expected loss from that defect) can only be estimated. Stopping criteria is something that should be determined at the start of a project. Why?
Old Example Use equivalence partitioning to define test cases for the following function: // This function takes integer values for day, // month and year and returns the day of the // week in string format. The function returns // an empty string when given invalid inputs values. // Year must be > // Example: DayOfWeek(12,31,2009) “Tuesday” // Example: DayOfWeek(13,13,2009) “” String DayOfWeek(int month, int day, int year);
Test Cases Write test cases covering all valid equivalence classes. Cover as many valid equivalence classes as you can with each test case. Write one and only one test case for each invalid equivalence class. When testing a value from an equivalence class that is expected to return an invalid result all other values should be valid. You want to isolate tests of invalid equivalence classes. Test Case #Test Case DataExpected OutcomeClasses Covered 11,1,2010“Friday”3,5 20,1,1999“”1 345,1,1999“”3 44,1,1752“”8