▬ ▬ ▬ Engineering RW344: Software Design Bernd Fischer

▬ ▬ ▬ Engineering RW344: Software Design Bernd Fischer

Verification and Validation

Topics inspection and code reviews white-box testing black-box testing
test coverage

Verification and Validation
verification and validation: the process of showing that the system conforms to its specification and meets the user requirements Traditionally broken down into two separate activities: verification: do we build the system right? validation: do we build the right system? V&V techniques: reviews (code and documents) prototyping and simulation (documents) testing static analysis (e.g., model checking, proof)

Verification: do we build the system right?
Goal: discover situations in which the behavior of the software is incorrect, undesirable or does not conform to its specification Approach: defect testing test cases designed to expose defects test cases can be deliberately obscure test cases need not reflect how the system is normally used A successful test makes the system perform incorrectly and so exposes a defect in the system.

Validation: do we build the right system?
Goal: demonstrate to the developer and the customer that the software meets its requirements Approach: validation testing test cases designed to reflect the system’s expected use at least one test for every requirement at least one test for each system feature (plus all feature combinations) A successful test shows that the system operates as intended.

V&V aim The aim of V&V is to establish with a certain level of confidence that the system is fit for purpose. confidence level given by residual defect discovery rate “less than 10 defects discovered in a week” mean time between failures fitness for purpose depends on software purpose (game app vs. safety-critical system) user expectations (free app vs. premium software) commercial considerations NOT free of errors... BUT good enough

V&V methods dynamic vs. static black box vs. white box
dynamic methods run and observe programs static methods analyze documents (incl. programs) black box vs. white box black box methods rely on the requirements white box methods have access to the code structure defect testing vs. debugging defect testing tries to force failures debugging tries to localize and repair defects testing vs. proof testing relies on a statistical argument proof relies on a logical argument

What’s a bug, anyway? The notion of “bug” conflates several concepts:
a mistake is a human behavior that introduces defects into the system a defect (or fault) is a characteristic of a system that may lead to errors an error is a deviation from an expected system state a failure is an event where the system does not deliver the expected service humans make mistakes that lead to defects one (or many) defects may lead to an error the error may manifest as a visible failure For a given mistake, the defect, error, and failure can all be at different locations in the program.

What’s a bug, anyway? The notion of “bug” conflates several concepts:
a mistake is a human behavior that introduces defects into the system a defect (or fault) is a characteristic of a system that may lead to errors an error is a deviation from an expected system state a failure is an event where the system does not deliver the expected service humans make mistakes that lead to defects one (or many) defects may lead to an error debugging the error may manifest as a visible failure testing For a given mistake, the defect, error, and failure can all be at different locations in the program.

What’s a bug, anyway? Consider a library that sends out overdue notices 30 days after the due date. mistake: programmer forgot about leap years defect: wrong compute_days function error: wrong value in variable overdue_days failure: notice not sent out on specified date

What’s a bug, anyway? Consider a function that writes a byte array to a serial output port connected to a Lego motor. mistake: programmer misunderstood array indexing defect: array out of bounds access error: array index is larger than array size failure: none random motor jigger motor catches fire blue screen of death

V&V in the software life cycle
V&V works at different levels: unit testing: individual components methods; test functional behavior (pre/post) module testing: groups of related components classes; test class invariants system testing: whole (sub-) system(s) test emergent properties (e.g., security, reliability,...) acceptance testing: use customer data test that system meets user expectations Each development phase corresponds to a V&V phase!

V&V in the software life cycle: V-model
Source: D. Firesmith: Using V models for testing,

V&V’s Fate Fact 1: software development is underfunded
i.e., development team lacks resources Fact 2: V&V is always one of the last things to do the system must work to some degree before testing makes sense Facts 1 and 2 imply that when the money runs out the corners are cut in V&V!!!

Testing Fundamentals

Testing can only exercise a small fraction of the possible behaviors!
The limits of testing Consider the following program: public class Test { public static void main(String [] args) { int x = Integer.parseInt(args[0]); int y = Integer.parseInt(args[1]); System.out.println(x+y);}} How long does it take to test this exhaustively? ( tests / sec, but > 2³² values for each input) ~ years Testing can only exercise a small fraction of the possible behaviors! “Testing can only show the presence of errors, never their absence!” E.W. Dijkstra

The limits of testing Consider the following program:
public class Test { public static void main(String [] args) { int x = Integer.parseInt(args[0]); int y = Integer.parseInt(args[1]); System.out.println(x+y);}} How do you test this program? do you run Test or main? how do you fix the inputs? how do you determine the expected result? how do you compare expected and actual result? how do you handle exceptions?

Test cases “A test case consists of set of test inputs, execution conditions, and expected results developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement.” IEEE standard test outcome is pass or fail ... but determining the outcome can be difficult floating point arithmetics complex data structures effects on the “real world” (GUIs, embedded systems) non-functional requirements (timing, ...)

Test cases – a simple example
title: open account without error input: customer data conditions: enough storage space for account record execution: fill in mask … and confirm expected new account number is generated results: and shown within mask underspecified how do you check this?

Test cases – a more concrete example
title: length of empty linked list input: empty linked list conditions: none execution: call method size() expected returned value is 0 results:

Test cases in JUnit (Java)
/* method size() must return 0 for the empty linked list */ void testEmptyLinkedListSize() throws Exception { List<Object> x = new LinkedList<Object>(); int s = x.size(); assertEquals(“length not 0”, 0, s); } short description of the test descriptive name of the test input of the test empty list execution of the test comparison to expected result

Isolating the system under test with stubs and drivers
class A { System.open(f); ... B.out(f,new A(“5”)); } driver sets up test environment class Driver { Env.open(f); B.out(f,new A(“5”)); assert(Env.val(f)==5); } oracle compares outcomes class B { void out(f,a) { int y = C.cvt(a.x); System.write(f,y); }} class B { void out(f,a) { int y = C.cvt(a.x); System.write(f,y); }} system under test class C { int cvt(x) { ... return y } class C { int cvt(x) { if(x==“1”) return 1; if(x==“5”) return 5; } stub simulates lower- level functionality

Test execution needs a test harness.
test case: set of test input data (provided at a specific system state) and expected output oracle: program that compares actual and expected outputs, and decides whether the test is passed driver: program written to test a unit module stub: program written to allow testing of a higher level component harness: environment to run programs with stubs and drivers and check the results test suite: collection of test cases test plan: description of a testing process including overall approach and specific tests

Test adequacy criteria
How do you know that you have tested enough? when the money runs out...?? typically based on code coverage statement, decision, path sometimes based on detection of injected faults mutation testing A test adequacy criterion determines when testing can be ended.

Testing in the software life cycle
Testing works at different levels: unit testing: individual components methods; test functional behavior (pre/post) module testing: groups of related components classes; test class invariants system testing: whole (sub-) system(s) test emergent properties (e.g., security, reliability,...) acceptance testing: use customer data test that system meets user expectations Each development phase corresponds to a testing phase!

Testing in the software life cycle: V-model
Source: D. Firesmith: Using V models for testing,

More testing terms... Some terms denote specific test techniques or goals: smoke testing: minimal attempt at system operation identify fundamental problems, build verification test alpha / beta testing: operational testing by users at developer’s / user’s site identify delivery and acceptance problems regression testing: testing after system changes ensure that changes do not introduce (new) errors ensure that faults have been fixed usability testing: UI, accessibility, performance, security, ...

Testing strategies for large systems
Large systems must be tested incrementally: test each individual subsystem in isolation integrate subsystems into product, test integration integration is determined by software architecture vertical testing (e.g., multilayered architectures) horizontal testing can be used when the system is divided into separate sub-applications integration can be tested top-down or bottom-up anti-pattern: big-bang testing...

Top-down testing: start by testing just the user interface underlying functionality simulated by stubs work downwards, integrating lower layers big drawback: cost of writing the stubs Bottom-up testing: start by testing the lowest layers work upwards, integrating lower layers requires new drivers to test each new layer big drawback: cost of writing the drivers Sandwich testing: hybrid method

Black-box vs. White-box
Black box testing: ignores implementation details exercises all functional requirements typically used in later testing stages system tests acceptance tests White box testing follows control structure of procedural design exercises program: conditions, loops, data structures typically used in early testing stages unit tests integration tests

Spec: evenORodd shall return 0 if the given number is even, -1 if it is odd. int evenORodd (int number) { int result; result = number mod 2; return (result); } Problem: evenORodd works only even numbers black-box detects fault returns 1 for odd numbers instead of -1 white-box testing with 100% coverage doesn’t...

Spec: pretty shall print an integer input as text. void pretty(int number) { if (number > 1000) printf(“%d thousand”, number/100); else printf(“%d”, number); } Problem: pretty works only for numbers ≤ 1000 black-box testing may not detect fault only one equivalence class in spec white-box testing with 100% coverage would...

Best Practice The best results are achieved if you...
start with black-box testing measure coverage use white-box testing to increase coverage A warning note... “On average when a system is considered well tested, only about 60% of the branches in the code has been exercised.” R. L. Glass Source: R. L. Glass: Facts and fallacies of software engineering , 2002.

Black-box Testing

Principles of black-box testing
internal code structure is ignored also designs test cases are derived from requirements also specifications or models testers provide the system with inputs and observe the outputs internal state remains unobserved aka specification-based or functional testing

Deriving black-box tests
Black-box tests can be derived from different sources: use cases and scenarios use user actions to derive inputs concretize with data use system actions to derive expected outputs use “wrong” actions and inputs to derive failing tests sequence diagrams state machines use transition labels to derive test inputs use information about states for expected results

Partition testing Observation:
For similar inputs the software will behave similar. Or the other way round... If the software behaves similar, the inputs are similar – and we don’t need to test for all of them! Approach: partition the input into groups that should be processed in the same way (equivalence classes) consider incorrect inputs as well test with representative members from each class one test case per equivalence class requires understanding of ... structure of input space possible implementation

Examples valid input is ‘y’ or ‘Y’ for yes and ‘n’ or ‘N’ for no
equivalence classes are [‘y’, ‘Y’], [‘n’, ‘N’], and one class with all other characters valid input is a month number (1-12) equivalence classes are [-∞..0], [1..12], [13..∞] equivalence classes are [-∞..0], [1, 3, 5, 7, 8, 10, 12], [2], [4, 6, 9, 11], [13..∞]

A worked example Spec: employees receive a bonus depending on how long they have been working for the company employees who have worked more than 3 years get 50% of salary. more than 5 years get 75% more than 8 years get 100% assume that the number of years is positive and less than 70 static int computeBonus(int numYears) throws InvalidData

A worked example static int computeBonus(int numYears)
throws InvalidData Equivalence class Representative 0 < numYears <= 3 2 3 < numYears <=5 4 5 < numYears <= 8 6 8 < numYears < 70 10 numYears <= 0 -6 numYears >= 70 80 invalid inputs

Combinations of equivalence classes
combinatorial explosion means that we cannot realistically test every possible system-wide equivalence class: 4 inputs with 5 equivalences each ⇒ 5⁴ (i.e., 625) possible system-wide equivalence classes ensure that at least one test is run with every equivalence class of every individual input test all combinations where an input is likely to affect the interpretation of another test a few other random combinations of equivalence classes

Example: first valid input is either ‘Metric’ or ‘US/Imperial’ equivalence classes : Metric, US/Imperial, Other second valid input is maximum speed: 1 to 750 km/h or 1 to 500 mph validity depends on whether metric or US/imperial eq, classes: [-∞..0], [1..500], [ ], [751..∞] some test combinations Metric, [1..500] valid US/Imperial, [1..500] valid Metric, [ ] valid US/Imperial, [ ] invalid

Spec: The landing gear must be deployed whenever the plane is within 2 minutes from landing or takeoff, or within 2000 feet from the ground. If visibility is less than 1000 feet, then the landing gear must be deployed whenever the plane is within 3 minutes from landing or lower than 2500 feet. Total number of system equivalence classes: 108

Boundary value testing
Observation: More errors in software occur at the boundaries of equivalence classes. Conclusion: Equivalence class testing should specifically test values at the extremes of each equivalence class. Example: If the valid input is a month number (1-12) equivalence classes are defined as before but use test cases with 0, 1, 12 and 13 as well as very large positive and negative values

Exercise Spec: Given three integers representing the year, month and day, resp., the system shall compute a string that represents the weekday of the given date. Task: derive a test suite for black-box testing.

Testing guidelines (sequences)
Test software with sequences which have only a single value. Use sequences of different sizes in different tests. Derive tests so that the first, middle and last elements of the sequence are accessed. Test with sequences of zero length. Chapter 8 Software testing

General testing guidelines
Choose inputs that force the system to generate all error messages Design inputs that cause input buffers to overflow Repeat the same input or series of inputs numerous times Force invalid outputs to be generated Force computation results to be too large or too small. Chapter 8 Software testing

White-box Testing

Principles of white-box testing
testers have access to the system design code, design documents can observe internal data execution test cases are derived from program structure mainly designed to exercise conditions and loops program represented as control flow graph (CFG) each statement in the code creates a node each control flow branch creates an edge testing has to reach a target coverage cover all possible paths (often infeasible) cover all possible edges cover all possible nodes (often too simple)

Representing control flow by CFGs
edge node

predicate node

Each statement type can be represented by a CFG: sequence if while

Basis path testing a path is a sequence of instructions that may be performed in the execution of a computer program also: sequence of connected edges in CFG path testing: select (all) different paths through the code check that they work correctly in principle white box testing technique but: number of different paths cannot be determined! use approximation: independent paths (i.e., maximal paths where no edge is repeated)

Independent paths independent paths are here: basis set:
maximal paths where no edge is repeated here: 1 2 3 5 4 6 7 basis set: minimal set of independent paths where each node is covered

cyclomatic complexity
The size of the basis set is bounded by the CFG’s cyclomatic complexity. Size of basis set: # regions in CFG here: 3 # predicate nodes + 1 here: == 3 # edges - # nodes + 2 here: 8 – == 3 cyclomatic complexity

Example 1 int binsearch (int[] a, int v) { int low = 0; int high = length – 1; int r = -1; while (low <= high && r == -1) { int mid = (low + high) / 2; if (a[mid] > value) { high = mid – 1; } else if (a[mid] < value) { low = mid + 1; } else { r = mid; } } return r; } 1 2 2 3 3 4 5 4 6 12 7 5 8 9 7 6 10 9 8 11 12 10 11

Example Cyclomatic complexity V(G)
1 Cyclomatic complexity V(G) = 15 – == 5 (#edges - #nodes + 2) = == 5 (#(predicate nodes) + 1) = 5 (#regions) 1 R1 2 R2 2 3 R3 4 12 3 5 4 R4 7 6 R5 9 8 10 11

Example Basis paths: 1-2-12 1-2-3-12 1-2-3-4-5-7-9-10-11-2-12
2 3 4 12 5 7 6 9 8 10 11

Example Path: 1-2-3-4-5-7-8-10-11-1-12 Input: a == { 4 } v == 7
int binsearch (int[] a, int v) { int low = 0; low == 0 int high = length – 1; high == 0 int r = -1; r == -1 while (low <= high && r == -1) { int mid = (low + high) / 2; mid == 0 if (a[mid] > value) { high = mid – 1; } else if (a[mid] < value) { low = mid + 1; low == 1 } else { r = mid; } } return r; } 1 Path: Input: a == { 4 } v == 7 Output: r == -1 2 3 4 5 7 8 10 11 12

Inspections

Inspection fundamentals
An inspection is an activity in which one or more people systematically examine source code or documentation for defects. inspections are typically conducted as a meeting inspections not restricted to code unlike testing code review: (lightweight) inspection of source code code reviews and testing are complementary testing can reveal defects buried in complex code reviews can reveal many defects simulataneously code reviews can be extremely effective

Test first or review first?
It is important to review code before extensively testing it: reviews quickly get rid of many defects if the review leads to a redesign, the testing work has been wasted growing consensus that it is most efficient to review code before any testing is done even before developer testing Google: no check-in without code review

Inspection principles
inspect the most important documents of all types code, design documents, test plans, requirements inspect only documents that are ready inspect of a very poor document will miss defects choose an effective and efficient inspection team two to five people including experienced software engineers do not rush the inspection 200 lines of code per hour (including comments) or ten pages of text per hour re-inspect when large (>20%) changes are made

Conducting an inspection meeting
moderator calls meeting and distributes documents participants prepare for the meeting in advance use checklist to guide looking for defects moderator explains the procedures (at beginning) checks that everybody is prepared keeps meetings short reviewers take turns explaining the contents of the document or code, without reading it verbatim author not a reviewer ensures that reviewers say what they see, not what the author intended to say everybody speaks up when they notice a defect

Conducting an inspection meeting
avoid discussing how to fix defects can be left to the author avoid discussing style issues but enforce coding standards nobody should be blamed inspection team members should feel they are all working together to create a better document keep managers away allows the participants to speak openly

Example: data reference errors
Is a variable referenced whose value is unset or uninitialized? For all array references, is each subscript value within the defined bounds of the corresponding dimension? For all array references, does each subscript have an integer value? For all references through pointer or reference variables, is the referenced storage currently allocated? When a storage area has alias names with differing attributes, does the data value in this area have the correct attributes when referenced via one of these names?

Example: data reference errors
Does a variable’s value have a type or attribute other than that expected by the compiler? Are there any explicit or implicit addressing problems if, on the target machine, the units of storage allocation are smaller than the units of storage addressability? If a data structure is referenced in multiple procedures or subroutines, is the structure defined identically in each procedure? When indexing into a string, are the limits of the string exceeded? Are there any “off by one” errors in indexing operations or in subscript references to arrays?

Example: control flow errors
If the program contains a multiway branch (switch), can the index variable ever exceed the number of branch possibilities? Will every loop eventually terminate? Will the program, module, or subroutine eventually terminate? Is it possible that, because of the conditions upon entry, a loop will never execute? If so, does this represent an oversight? while(notfound) { for(i = x; i < z; i++) { ... } } What happens if notfound is initially false, or if x is greater than z?

Example: control flow errors
For a loop controlled by both iteration and a Boolean condition, what are the consequences of “loop fallthrough” (if the iteration completes and the Boolean condition is never triggered)? Are there any “off by one” errors (e.g., one too many or too few iterations)? Are statements grouped (begin/end) correctly? Are there any non-exhaustive decisions? E.g., if an input parameter’s expected values are 1, 2, and 3, does the logic assume that it must be 3 if it is not 1 or 2? If so, is the assumption valid?

Static Analysis

▬ ▬ ▬ Engineering RW344: Software Design Bernd Fischer

Similar presentations

Presentation on theme: "▬ ▬ ▬ Engineering RW344: Software Design Bernd Fischer"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

▬ ▬ ▬ Engineering RW344: Software Design Bernd Fischer

Similar presentations

Presentation on theme: "▬ ▬ ▬ Engineering RW344: Software Design Bernd Fischer"— Presentation transcript:

Similar presentations

About project

Feedback