Presentation is loading. Please wait.

Presentation is loading. Please wait.

Engineering Bernd Fischer RW344: Software Design ▬ ▬ ▬▬ ▬ ▬

Similar presentations


Presentation on theme: "Engineering Bernd Fischer RW344: Software Design ▬ ▬ ▬▬ ▬ ▬"— Presentation transcript:

1 Engineering Bernd Fischer RW344: Software Design ▬ ▬ ▬▬ ▬ ▬

2 Verification and Validation

3 Topics inspection and code reviews white-box testing black-box testing test coverage

4 Verification and Validation Traditionally broken down into two separate activities: verification: do we build the system right? validation: do we build the right system? V&V techniques: reviews (code and documents) prototyping and simulation (documents) testing static analysis (e.g., model checking, proof) verification and validation: the process of showing that the system conforms to its specification and meets the user requirements

5 Verification: do we build the system right? Goal: discover situations in which the behavior of the software is incorrect, undesirable or does not conform to its specification Approach: defect testing –test cases designed to expose defects –test cases can be deliberately obscure –test cases need not reflect how the system is normally used A successful test makes the system perform incorrectly and so exposes a defect in the system.

6 Validation: do we build the right system? Goal: demonstrate to the developer and the customer that the software meets its requirements Approach: validation testing –test cases designed to reflect the system’s expected use –at least one test for every requirement –at least one test for each system feature (plus all feature combinations) A successful test shows that the system operates as intended.

7 V&V aim confidence level given by –residual defect discovery rate  “less than 10 defects discovered in a week” –mean time between failures fitness for purpose depends on –software purpose (game app vs. safety-critical system) –user expectations (free app vs. premium software) –commercial considerations The aim of V&V is to establish with a certain level of confidence that the system is fit for purpose. NOT free of errors... BUT good enough

8 V&V methods dynamic vs. static –dynamic methods run and observe programs –static methods analyze documents (incl. programs) black box vs. white box –black box methods rely on the requirements –white box methods have access to the code structure defect testing vs. debugging –defect testing tries to force failures –debugging tries to localize and repair defects testing vs. proof –testing relies on a statistical argument –proof relies on a logical argument

9 What’s a bug, anyway? The notion of “bug” conflates several concepts: a mistake is a human behavior that introduces defects into the system a defect (or fault) is a characteristic of a system that may lead to errors an error is a deviation from an expected system state a failure is an event where the system does not deliver the expected service humans make mistakes that lead to defects one (or many) defects may lead to an error the error may manifest as a visible failure For a given mistake, the defect, error, and failure can all be at different locations in the program.

10 What’s a bug, anyway? The notion of “bug” conflates several concepts: a mistake is a human behavior that introduces defects into the system a defect (or fault) is a characteristic of a system that may lead to errors an error is a deviation from an expected system state a failure is an event where the system does not deliver the expected service humans make mistakes that lead to defects one (or many) defects may lead to an error the error may manifest as a visible failure For a given mistake, the defect, error, and failure can all be at different locations in the program. debugging testing

11 What’s a bug, anyway? Consider a library that sends out overdue notices 30 days after the due date. mistake:programmer forgot about leap years defect: wrong compute_days function error: wrong value in variable overdue_days failure: notice not sent out on specified date

12 What’s a bug, anyway? Consider a function that writes a byte array to a serial output port connected to a Lego motor. mistake:programmer misunderstood array indexing defect: array out of bounds access error: array index is larger than array size failure: none random motor jigger motor catches fire blue screen of death

13 V&V in the software life cycle V&V works at different levels: unit testing: individual components –methods; test functional behavior (pre/post) module testing: groups of related components –classes; test class invariants system testing: whole (sub-) system(s) –test emergent properties (e.g., security, reliability,...) acceptance testing: use customer data –test that system meets user expectations Each development phase corresponds to a V&V phase!

14 V&V in the software life cycle: V-model Source: D. Firesmith: Using V models for testing, testing-315

15 V&V’s Fate Fact 1: software development is underfunded –i.e., development team lacks resources Fact 2: V&V is always one of the last things to do –the system must work to some degree before testing makes sense Facts 1 and 2 imply that when the money runs out the corners are cut in V&V!!!

16 Testing Fundamentals

17 The limits of testing Consider the following program: public class Test { public static void main(String [] args) { int x = Integer.parseInt(args[0]); int y = Integer.parseInt(args[1]); System.out.println(x+y);}} How long does it take to test this exhaustively? ( tests / sec, but > 2³² values for each input) ⇒ ~ years Testing can only exercise a small fraction of the possible behaviors! “Testing can only show the presence of errors, never their absence!” E.W. Dijkstra

18 The limits of testing Consider the following program: public class Test { public static void main(String [] args) { int x = Integer.parseInt(args[0]); int y = Integer.parseInt(args[1]); System.out.println(x+y);}} How do you test this program? do you run Test or main ? how do you fix the inputs? how do you determine the expected result? how do you compare expected and actual result? how do you handle exceptions?

19 Test cases test outcome is pass or fail... but determining the outcome can be difficult –floating point arithmetics –complex data structures –effects on the “real world” (GUIs, embedded systems) –non-functional requirements (timing,...) “A test case consists of set of test inputs, execution conditions, and expected results developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement.” IEEE standard

20 Test cases – a simple example title: open account without error input: customer data conditions: enough storage space for account record execution: fill in mask … and confirm expected new account number is generated results: and shown within mask underspecified how do you check this?

21 Test cases – a more concrete example title: length of empty linked list input: empty linked list conditions: none execution: call method size() expected returned value is 0 results: 21

22 /* method size() must return 0 for the empty linked list */ void testEmptyLinkedListSize() throws Exception { List x = new LinkedList (); int s = x.size(); assertEquals(“length not 0”, 0, s); } Test cases in JUnit (Java) input of the test empty list descriptive name of the test execution of the test comparison to expected result short description of the test

23 Isolating the system under test with stubs and drivers class A { System.open(f);... B.out(f,new A(“5”)); } class A { System.open(f);... B.out(f,new A(“5”)); } class B { void out(f,a) { int y = C.cvt(a.x); System.write(f,y); }} class B { void out(f,a) { int y = C.cvt(a.x); System.write(f,y); }} class C { int cvt(x) {... return y } class C { int cvt(x) {... return y } system under test class Driver { Env.open(f); B.out(f,new A(“5”)); assert(Env.val(f)==5); } class Driver { Env.open(f); B.out(f,new A(“5”)); assert(Env.val(f)==5); } class B { void out(f,a) { int y = C.cvt(a.x); System.write(f,y); }} class B { void out(f,a) { int y = C.cvt(a.x); System.write(f,y); }} class C { int cvt(x) { if(x==“1”) return 1; if(x==“5”) return 5; } class C { int cvt(x) { if(x==“1”) return 1; if(x==“5”) return 5; } stub simulates lower- level functionality driver sets up test environment oracle compares outcomes

24 Test execution needs a test harness. test case: set of test input data (provided at a specific system state) and expected output oracle: program that compares actual and expected outputs, and decides whether the test is passed driver: program written to test a unit module stub: program written to allow testing of a higher level component harness: environment to run programs with stubs and drivers and check the results test suite: collection of test cases test plan: description of a testing process including overall approach and specific tests

25 Test adequacy criteria How do you know that you have tested enough? when the money runs out...?? typically based on code coverage –statement, decision, path sometimes based on detection of injected faults –mutation testing A test adequacy criterion determines when testing can be ended.

26 Testing in the software life cycle Testing works at different levels: unit testing: individual components –methods; test functional behavior (pre/post) module testing: groups of related components –classes; test class invariants system testing: whole (sub-) system(s) –test emergent properties (e.g., security, reliability,...) acceptance testing: use customer data –test that system meets user expectations Each development phase corresponds to a testing phase!

27 Testing in the software life cycle: V-model Source: D. Firesmith: Using V models for testing, testing-315

28 More testing terms... Some terms denote specific test techniques or goals: smoke testing: minimal attempt at system operation –identify fundamental problems, build verification test alpha / beta testing: operational testing by users at developer’s / user’s site –identify delivery and acceptance problems regression testing: testing after system changes –ensure that changes do not introduce (new) errors –ensure that faults have been fixed usability testing: –UI, accessibility, performance, security,...

29 Testing strategies for large systems Large systems must be tested incrementally: test each individual subsystem in isolation integrate subsystems into product, test integration integration is determined by software architecture –vertical testing (e.g., multilayered architectures) –horizontal testing can be used when the system is divided into separate sub-applications integration can be tested top-down or bottom-up anti-pattern: big-bang testing...

30 Testing strategies for large systems Top-down testing: start by testing just the user interface –underlying functionality simulated by stubs work downwards, integrating lower layers big drawback: cost of writing the stubs Bottom-up testing: start by testing the lowest layers work upwards, integrating lower layers –requires new drivers to test each new layer big drawback: cost of writing the drivers Sandwich testing: hybrid method

31 Testing strategies for large systems

32 Black-box vs. White-box Black box testing: ignores implementation details exercises all functional requirements typically used in later testing stages –system tests –acceptance tests White box testing follows control structure of procedural design exercises program: conditions, loops, data structures typically used in early testing stages –unit tests –integration tests

33 Black-box vs. White-box Spec: evenORodd shall return 0 if the given number is even, -1 if it is odd. int evenORodd (int number) { int result; result = number mod 2; return (result); } Problem: evenORodd works only even numbers black-box detects fault –returns 1 for odd numbers instead of -1 white-box testing with 100% coverage doesn’t...

34 Black-box vs. White-box Spec: pretty shall print an integer input as text. void pretty(int number) { if (number > 1000) printf(“%d thousand”, number/100); else printf(“%d”, number); } Problem: pretty works only for numbers ≤ 1000 black-box testing may not detect fault –only one equivalence class in spec white-box testing with 100% coverage would...

35 Best Practice The best results are achieved if you... start with black-box testing measure coverage use white-box testing to increase coverage A warning note... “On average when a system is considered well tested, only about 60% of the branches in the code has been exercised.” R. L. Glass Source: R. L. Glass: Facts and fallacies of software engineering, 2002.

36 Black-box Testing

37 Principles of black-box testing internal code structure is ignored –also designs test cases are derived from requirements –also specifications or models testers provide the system with inputs and observe the outputs –internal state remains unobserved aka specification-based or functional testing

38 Deriving black-box tests Black-box tests can be derived from different sources: use cases and scenarios –use user actions to derive inputs  concretize with data –use system actions to derive expected outputs –use “wrong” actions and inputs to derive failing tests sequence diagrams state machines –use transition labels to derive test inputs –use information about states for expected results

39 Partition testing Observation: For similar inputs the software will behave similar. Or the other way round... If the software behaves similar, the inputs are similar – and we don’t need to test for all of them! Approach: partition the input into groups that should be processed in the same way (equivalence classes) consider incorrect inputs as well test with representative members from each class –one test case per equivalence class requires understanding of... structure of input space possible implementation

40 Examples valid input is ‘y’ or ‘Y’ for yes and ‘n’ or ‘N’ for no –equivalence classes are [‘y’, ‘Y’], [‘n’, ‘N’], and one class with all other characters valid input is a month number (1-12) –equivalence classes are [-∞..0], [1..12], [13..∞] –equivalence classes are [-∞..0], [1, 3, 5, 7, 8, 10, 12], [2], [4, 6, 9, 11], [13..∞]

41 A worked example Spec: employees receive a bonus depending on how long they have been working for the company employees who have worked more than 3 years get 50% of salary. more than 5 years get 75% more than 8 years get 100% assume that the number of years is positive and less than 70 static int computeBonus(int numYears) throws InvalidData

42 A worked example Equivalence classRepresentative 0 < numYears <= 32 3 < numYears <=54 5 < numYears <= 86 8 < numYears < 7010 numYears <= 0-6 numYears >= 7080 static int computeBonus(int numYears) throws InvalidData invalid inputs

43 Combinations of equivalence classes combinatorial explosion means that we cannot realistically test every possible system-wide equivalence class: –4 inputs with 5 equivalences each ⇒ 5 ⁴ (i.e., 625) possible system-wide equivalence classes ensure that at least one test is run with every equivalence class of every individual input test all combinations where an input is likely to affect the interpretation of another test a few other random combinations of equivalence classes

44 Combinations of equivalence classes Example: first valid input is either ‘Metric’ or ‘US/Imperial’ –equivalence classes : Metric, US/Imperial, Other second valid input is maximum speed: 1 to 750 km/h or 1 to 500 mph –validity depends on whether metric or US/imperial –eq, classes: [-∞..0], [1..500], [ ], [751..∞] some test combinations –Metric, [1..500]valid –US/Imperial, [1..500]valid –Metric, [ ]valid –US/Imperial, [ ]invalid

45 Combinations of equivalence classes Spec: The landing gear must be deployed whenever the plane is within 2 minutes from landing or takeoff, or within 2000 feet from the ground. If visibility is less than 1000 feet, then the landing gear must be deployed whenever the plane is within 3 minutes from landing or lower than 2500 feet. Total number of system equivalence classes: 108

46 Boundary value testing Observation: More errors in software occur at the boundaries of equivalence classes. Conclusion: Equivalence class testing should specifically test values at the extremes of each equivalence class. Example: If the valid input is a month number (1-12) equivalence classes are defined as before but use test cases with 0, 1, 12 and 13 as well as very large positive and negative values

47 Exercise Spec: Given three integers representing the year, month and day, resp., the system shall compute a string that represents the weekday of the given date. Task: derive a test suite for black-box testing.

48 Testing guidelines (sequences) Test software with sequences which have only a single value. Use sequences of different sizes in different tests. Derive tests so that the first, middle and last elements of the sequence are accessed. Test with sequences of zero length. 48Chapter 8 Software testing

49 General testing guidelines Choose inputs that force the system to generate all error messages Design inputs that cause input buffers to overflow Repeat the same input or series of inputs numerous times Force invalid outputs to be generated Force computation results to be too large or too small. 49Chapter 8 Software testing

50 White-box Testing

51 Principles of white-box testing testers have access to the system design –code, design documents –can observe internal data execution test cases are derived from program structure –mainly designed to exercise conditions and loops –program represented as control flow graph (CFG)  each statement in the code creates a node  each control flow branch creates an edge testing has to reach a target coverage –cover all possible paths (often infeasible) –cover all possible edges –cover all possible nodes (often too simple)

52 Representing control flow by CFGs node edge

53 Representing control flow by CFGs node predicate

54 Representing control flow by CFGs Each statement type can be represented by a CFG: sequenceifwhile

55 Basis path testing a path is a sequence of instructions that may be performed in the execution of a computer program –also: sequence of connected edges in CFG path testing: –select (all) different paths through the code –check that they work correctly in principle white box testing technique but: number of different paths cannot be determined! use approximation: independent paths (i.e., maximal paths where no edge is repeated)

56 Independent paths independent paths are –maximal paths –where no edge is repeated here: – – – basis set: minimal set of independent paths where each node is covered

57 The size of the basis set is bounded by the CFG’s cyclomatic complexity. Size of basis set: # regions in CFG –here: 3 # predicate nodes + 1 –here: == 3 # edges - # nodes + 2 –here: 8 – == 3 cyclomatic complexity

58 Example int binsearch (int[] a, int v) { int low = 0; int high = length – 1; int r = -1; while (low value) { high = mid – 1; } else if (a[mid] < value) { low = mid + 1; } else { r = mid; } } return r; }

59 Example Cyclomatic complexity V(G) = 15 – == 5 (#edges - #nodes + 2) = == 5 (#(predicate nodes) + 1) = 5 (#regions) R1R1 R2R2 R4R4 R3R3 R5R5

60 Example Basis paths:

61 Example int binsearch (int[] a, int v) { int low = 0;low == 0 int high = length – 1;high == 0 int r = -1;r == -1 while (low value) { high = mid – 1; } else if (a[mid] < value) { low = mid + 1;low == 1 } else { r = mid; } } return r; } Path: Input: a == { 4 } v == 7 Output: r == -1

62 Inspections

63 Inspection fundamentals inspections are typically conducted as a meeting inspections not restricted to code –unlike testing –code review: (lightweight) inspection of source code code reviews and testing are complementary –testing can reveal defects buried in complex code –reviews can reveal many defects simulataneously code reviews can be extremely effective An inspection is an activity in which one or more people systematically examine source code or documentation for defects.

64 Test first or review first? It is important to review code before extensively testing it: reviews quickly get rid of many defects if the review leads to a redesign, the testing work has been wasted ⇒ growing consensus that it is most efficient to review code before any testing is done –even before developer testing –Google: no check-in without code review

65 Inspection principles inspect the most important documents of all types –code, design documents, test plans, requirements inspect only documents that are ready –inspect of a very poor document will miss defects choose an effective and efficient inspection team –two to five people –including experienced software engineers do not rush the inspection –200 lines of code per hour (including comments) –or ten pages of text per hour re-inspect when large (>20%) changes are made

66 Conducting an inspection meeting moderator calls meeting and distributes documents participants prepare for the meeting in advance –use checklist to guide looking for defects moderator explains the procedures (at beginning) –checks that everybody is prepared –keeps meetings short reviewers take turns explaining the contents of the document or code, without reading it verbatim –author not a reviewer –ensures that reviewers say what they see, not what the author intended to say everybody speaks up when they notice a defect

67 Conducting an inspection meeting avoid discussing how to fix defects –can be left to the author avoid discussing style issues –but enforce coding standards nobody should be blamed –inspection team members should feel they are all working together to create a better document keep managers away –allows the participants to speak openly

68 Example: data reference errors Is a variable referenced whose value is unset or uninitialized? For all array references, is each subscript value within the defined bounds of the corresponding dimension? For all array references, does each subscript have an integer value? For all references through pointer or reference variables, is the referenced storage currently allocated? When a storage area has alias names with differing attributes, does the data value in this area have the correct attributes when referenced via one of these names?

69 Example: data reference errors Does a variable’s value have a type or attribute other than that expected by the compiler? Are there any explicit or implicit addressing problems if, on the target machine, the units of storage allocation are smaller than the units of storage addressability? If a data structure is referenced in multiple procedures or subroutines, is the structure defined identically in each procedure? When indexing into a string, are the limits of the string exceeded? Are there any “off by one” errors in indexing operations or in subscript references to arrays?

70 Example: control flow errors If the program contains a multiway branch ( switch ), can the index variable ever exceed the number of branch possibilities? Will every loop eventually terminate? Will the program, module, or subroutine eventually terminate? Is it possible that, because of the conditions upon entry, a loop will never execute? If so, does this represent an oversight? while(notfound) { for(i = x; i < z; i++) {... } } What happens if notfound is initially false, or if x is greater than z ?

71 Example: control flow errors For a loop controlled by both iteration and a Boolean condition, what are the consequences of “loop fallthrough” (if the iteration completes and the Boolean condition is never triggered)? Are there any “off by one” errors (e.g., one too many or too few iterations)? Are statements grouped (begin/end) correctly? Are there any non-exhaustive decisions? –E.g., if an input parameter’s expected values are 1, 2, and 3, does the logic assume that it must be 3 if it is not 1 or 2? If so, is the assumption valid?

72 Static Analysis


Download ppt "Engineering Bernd Fischer RW344: Software Design ▬ ▬ ▬▬ ▬ ▬"

Similar presentations


Ads by Google