Basic Definitions: Testing

Slides:



Advertisements
Similar presentations
Software Testing. Quality is Hard to Pin Down Concise, clear definition is elusive Not easily quantifiable Many things to many people You'll know it when.
Advertisements

Test process essentials Riitta Viitamäki,
Verification and Validation
White Box and Black Box Testing Tor Stålhane. What is White Box testing White box testing is testing where we use the info available from the code of.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION DEBUGGING Autumn 2011 Bug-Date: Sept 9, 1947.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
(c) 2007 Mauro Pezzè & Michal Young Ch 9, slide 1 Test Case Selection and Adequacy Criteria.
1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults.
Topics in Testing We’ve Covered
(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)
Building Reliable Software Requirements and Methods.
1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
1 Today More on random testing + symbolic constraint solving (“concolic” testing) Using summaries to explore fewer paths (SMART) While preserving level.
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.
TESTING.
CMSC 345 Fall 2000 Unit Testing. The testing process.
CS4311 Spring 2011 Unit Testing Dr. Guoqiang Hu Department of Computer Science UTEP.
What is Software Testing? And Why is it So Hard J. Whittaker paper (IEEE Software – Jan/Feb 2000) Summarized by F. Tsui.
Coverage – “Systematic” Testing Chapter 20. Dividing the input space for failure search Testing requires selecting inputs to try on the program, but how.
Introduction to Software Testing
Coverage Literature of software testing is primarily concerned with various notions of coverage Four basic kinds of coverage: Graph coverage Logic coverage.
SWE 637: Test Criteria and Definitions Tao Xie Prepared based on Slides by ©Paul Ammann and Jeff Offutt Revised by Tao Xie.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Chapter 13: Regression Testing Omar Meqdadi SE 3860 Lecture 13 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
CSE403 Software Engineering Autumn 2001 More Testing Gary Kimura Lecture #10 October 22, 2001.
Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.
637 – Introduction (Ch 1) Introduction to Software Testing Chapter 1 Jeff Offutt Information & Software Engineering SWE 437 Software Testing
Chapter 22 Developer testing Peter J. Lane. Testing can be difficult for developers to follow  Testing’s goal runs counter to the goals of the other.
Making Good Code AKA: So, You Wrote Some Code. Now What? Ray Haggerty July 23, 2015.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
1 Introduction to Software Testing. Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Chapter 1 2.
Introduction to Software Testing Paul Ammann & Jeff Offutt Updated 24-August 2010.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
Introduction to Software Testing. OUTLINE Introduction to Software Testing (Ch 1) 2 1.Spectacular Software Failures 2.Why Test? 3.What Do We Do When We.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Testing. Today’s Topics Why Testing? Basic Definitions Kinds of Testing Test-driven Development Code Reviews (not testing) 1.
1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults.
Software Construction Lecture 19 Software Testing-2.
Testing CSE 160 University of Washington 1. Testing Programming to analyze data is powerful It’s useless (or worse!) if the results are not correct Correctness.
1. Black Box Testing  Black box testing is also called functional testing  Black box testing ignores the internal mechanism of a system or component.
Dynamic Testing.
Workshop on Integrating Software Testing into Programming Courses (WISTPC14:2) Friday July 18, 2014 Introduction to Software Testing.
Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Introduction to Software Testing Model-Driven Test Design and Coverage testing Paul Ammann & Jeff Offutt Update.
Testing (final thoughts). equals() and hashCode() Important when using Hash-based containers class Duration { public final int min; public final int sec;
Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt
Software Testing.
Software Testing.
Dr. Eng. Amr T. Abdel-Hamid
Input Space Partition Testing CS 4501 / 6501 Software Testing
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Verification and Validation
Testing UW CSE 160 Spring 2018.
Introduction to Software Testing Chapter 2 Model-Driven Test Design
Software Testing (Lecture 11-a)
Testing UW CSE 160 Winter 2016.
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
CSE403 Software Engineering Autumn 2000 More Testing
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
Mutation Testing Faults are introduced into the program by creating many versions of the program called mutants. Each mutant contains a single fault. Test.
Presentation transcript:

Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults a.k.a. BUGS Hrm. . . that’s a lot of “a.k.a”s Let’s refine this terminology a bit

Faults, Errors, and Failures Fault: a static flaw in a program What we usually think of as “a bug” Error: a bad program state that results from a fault Not every fault always produces an error Failure: an observable incorrect behavior of a program as a result of an error Not every error ever becomes visible

To Expose a Fault with a Test Reachability: the test much actually reach and execute the location of the fault Infection: the fault must actually corrupt the program state (produce an error) Propagation: the error must persist and cause an incorrect output – a failure

An Example Find the fault int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; Find the fault

An Example Here’s a test case: a = {} n = 0 x = 2 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; Here’s a test case: a = {} n = 0 x = 2 Does not even reach the fault

An Example Here’s another: a = {3, 9, 4} n = 3 x = 2 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1; Here’s another: a = {3, 9, 4} n = 3 x = 2 Reaches the fault Infects state with error But no failure

An Example And finally: a = {2, 9, 4} n = 3 x = 2 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1; And finally: a = {2, 9, 4} n = 3 x = 2 Reaches the fault Infects state with error And fails – returns -1 instead of 0

Controllability and Observability Goals for a test case: Reach a fault Produce an error Make the error visible as a failure In order to make this easy the program must be controllable and observable Controllability: How easy it is to drive the program where we want to go Observability: How easy it is to tell what the program is doing

Design for Testability If a program is not designed to be controllable and observable, it generally won’t be We have to start preparing for testing before we write any code Testing as an after-the-fact, ad hoc, exercise is often limited by earlier design choices

Test-Driven Development One way to design for testability is to write the test cases before the code Idea arising from Extreme Programming and agile development Write automated test cases first Then write the code to satisfy tests Helps focus attention on making software well-specified Forces observability and controllability: you have to be able to handle the test cases you’ve already written (before deciding they were impractical) Reduces temptation to tailor tests to idiosyncratic behaviors of implementation

Controllability: Simulation and Stubbing A key to controllable code is effective simulation and stubbing Simulation of low-level hardware devices through a clean driver interface Real hardware may be slow May be impossible/expensive to induce some hardware failure modes on real hardware Real hardware may be a limited resource Stubbing for other routines and code Other code/modules may not be complete May be slow and irrelevant to test May need to simulate failure of other modules

Simulation and Stubbing: JPL Example When testing JPL flash storage modules we rely on software simulation of flash devices Real flash devices are slow Can’t do aggressive random testing Real flash devices are expensive JPL only has a few boards – constant competition to test on these Running hundreds of thousand of tests will wear the flash hardware out Enables us to introduce rare hardware failures System resets, spontaneous bad blocks and write failures, etc.

Controllability: Downwards Scalability Another important aspect of controllability is to make code “downwards scalable” Many faults cause an error only in a corner case due to a resource limit An effective strategy for finding errors is to reduce the resource limits Test a version of the program with very tight bounds Finding corner cases is easier if the corners are close together Too many programs hard-code resource limits or make assumptions about resources unconnected to defined limits E.g., not checking the result of malloc

Downwards Scalability: JPL Example Flight flash hardware is usually 1-4 GB device E.g., 64 blocks of 32 pages of 8192 bytes We primarily test with much smaller “devices” (using software simulation) 6 blocks of 4 pages of 64 bytes Forces flash file system to compact storage more often Tests assumptions about how space is used on flash Forces more multi-page writes and directory entries over multiple pages

Downwards Scalability: JPL Example Easier to explore various combinations of states of blocks/pages of the device Used page Free page Dirty page Bad block

Controllability Other important themes for controllability Network/file access If program reads from the network or to remote files, this is hard to control Again, simulation and stubbing are key System calls Similarly, reading the time from the operating system can be hard to control Simulation and stubbing – Operating System Abstraction Layer etc. GUI control Allow scripted control of GUI elements so tests can be automated

Observability: Assertions Assertions improve observability by making (some) errors into failures Even if the effect of a fault doesn’t propagate, it may be visible if an assertion checks the state at the right time Assertions also improve observability by making the error, rather than failure, visible Know how the state was corrupted directly, not just eventual effect

Observability: Invariant Checkers Can extend the idea of assertions to writing “full” invariant checkers Do a crawl of code’s basic data structures Check various invariants that would be too expensive to check at runtime Invariant checker can be written to be easy-to-use: recursion, memory allocation, etc. Won’t run on actual system But be careful! If your invariant checker has a bug and changes the system state. . .

Observability Other important themes for observability Logging Especially critical for GUI interfaces, to mirror GUI events in ordered parseable messages Network/file access If program writes to the network or to remote files, this is hard to observe

Controllability & Observability: Memory Allocation More extreme case: embedded code for mission or safety critical systems May be running without memory protection Dynamic allocation often forbidden Design module to accept a static block allocated elsewhere, and only access this memory Controllability: allows us to introduce memory faults, simulate warm reboots Observability: allows us to easily instrument code with low-overhead checks to find memory safety violations during testing

Coverage Literature of software testing is primarily concerned with various notions of coverage Ammann and Offutt identify four basic kinds of coverage: Graph coverage Logic coverage Input space partitioning Syntax-based coverage

Graph Coverage Cover all the nodes, edges, or paths of some graph related to the program Examples: Statement coverage Branch coverage Path coverage Data flow (def-use) coverage Model-based testing coverage Many more – most common kind of coverage, by far

Graph Coverage Most FSM testing algorithms can be seen as graph coverage Consider VC – computing a spanning tree to nodes is standard graph exploration Beizer: “find a graph and cover it”

Statement/Basic Block Coverage if (x < y) { y = 0; x = x + 1; } else x = y; Statement coverage: Cover every node of these graphs 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 3 1 2 x >= y x < y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } Treat as one node because if one statement executes the other must also execute (code is a basic block)

Branch Coverage if (x < y) { y = 0; x = x + 1; } else x = y; Branch coverage vs. statement coverage: Same for if-then-else 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 3 1 2 x >= y x < y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } But consider this if-then structure. For branch coverage can’t just cover all nodes, but must cover all edges – get to node 3 both after 2 and without executing 2!

Path Coverage How many paths through this code are there? Need one test case for each to get path coverage if (x < y) { y = 0; x = x + 1; } else x = y; 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 To get statement and branch coverage, we only need two test cases: 1 2 4 5 6 and 1 3 4 6 6 4 5 x >= y x < y y = 0 x = x + 1 Path coverage needs two more: 1 2 4 5 6 1 3 4 6 1 2 4 6 1 3 4 5 6 In general: exponential in the number of conditional branches!

Data Flow Coverage 1 2 3 4 4 5 6 7 x = 3; y = 3; if (w) { x = y + 2; } if (z) { y = x – 2; n = x + y x = 3 Def(x) Annotate program with locations where variables are defined and used (very basic static analysis) 2 y = 3 Def(y) 5 3 4 !w w x = y + 2 Def-use pair coverage requires executing all possible pairs of nodes where a variable is first defined and then used, without any intervening re-definitions Def(x) Use(y) 7 4 6 !z z y = x - 2 E.g., this path covers the pair where x is defined at 1 and used at 7: 1 2 3 5 6 7 Def(y) Use(x) May be many pairs, some not actually executable But this path does NOT: 1 2 3 4 5 6 7 n = x + y Use(x) Use(y)

((a>b) || G)) && (x < y) ((a <= b) && !G) || (x >= y) Logic Coverage What if, instead of: if (x < y) { y = 0; x = x + 1; } 1 ((a>b) || G)) && (x < y) y = 0 x = x + 1 2 ((a <= b) && !G) || (x >= y) 3 we have: if (((a>b) || G)) && (x < y)) { y = 0; x = x + 1; } Now, branch coverage will guarantee that we cover all the edges, but does not guarantee we will do so for all the different logical reasons We want to test the logic of the guard of the if statement

Active Clause Coverage ( (a > b) or G ) and (x < y) 1 T F T T 2 F F T F With these values for G and (x<y), (a>b) determines the value of the predicate duplicate With these values for (a>b) and (x<y), G determines the value of the predicate 3 F T T T 4 F F T F With these values for (a>b) and G, (x<y) determines the value of the predicate 5 T T T T 6 T T F F 29

Input Domain Partitioning Partition scheme q of domain D The partition q defines a set of blocks, Bq = b1 , b2 , … bQ The partition must satisfy two properties: blocks must be pairwise disjoint (no overlap) together the blocks cover the domain D (complete) b1 b2 b3 bi  bj = ,  i  j, bi, bj  Bq  b = D b  Bq Coverage then means using at least one input from each of b1, b2, b3, . . . 30

Input Domain Partitioning Some subtleties here… What’s wrong with this partition of file contents? { b1: Sorted ascending file b2: Sorted descending file b3: Neither sorted ascending nor sorted descending } b1 b2 b3 bi  bj = ,  i  j, bi, bj  Bq  b = D b  Bq 31

Syntax-Based Coverage Based on mutation testing (a pet topic of Amman and Offutt, who are heavily into this research area) Bit different kind of creature than the other coverages we’ve looked at Idea: generate many syntactic mutants of the original program Coverage: how many mutants does a test suite kill (detect)? 32

Mutating Our Buggy Program int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1;

Mutant #1 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n; i > 0; i--) { if (a[i] = x) return i; } return -1;

Mutant #2 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return 0;

Mutant #3 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] != x) return i; } return -1;

Mutant #4 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = n) return i; } return -1;

Mutant #5: Wait, this one’s the fix! int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return -1;

Syntax-Based Coverage MUTANTS OF P Program P P 100% coverage means you kill all the mutants with your test suite 39

Generation vs. Recognition Generation of tests based on coverage means producing a test suite to achieve a certain level of coverage As you can imagine, generally very hard Consider: generating a suite for 100% statement coverage easily reaches “solving the halting problem” level Obviously hard for, say, mutant-killing Recognition means seeing what level of coverage an existing test suite reaches

Coverage and Subsumption Sometimes one coverage approach subsumes another If you achieve 100% coverage of criteria A, you are guaranteed to satisfy B as well For example, consider node and edge coverage (there’s a subtlety here, actually – can you spot it?) What does this mean? Unfortunately, not a great deal If test suite X satisfies “stronger” criteria A and test suite Y satisfies “weaker” criteria B Y may still reveal bugs that X does not! For example, consider our running example and statement vs. branch coverage It means we should take coverage with a grain of salt, for one thing

Testing “for” Coverage Never seek to improve coverage just for the sake of increasing coverage Well, unless it’s a command from-on-high Coverage is not the goal Finding failures that expose faults is the goal No amount of coverage will prove that the program cannot fail “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming

The Purpose of Testing “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming Dijkstra meant this as a criticism of testing and an argument in favor of more disciplined and total approaches (proving programs correct) But he also points out what testing is good for: exposing errors Coverage is valuable if and only if test sets with higher coverage are more likely to expose failures

The Purpose of Testing “Program testing can be used to show the presence of bugs” When we first start “testing,” we often want to “see that the program works” Try out some scenarios and watch the program “do its stuff” Surprised (annoyed) when (if) the program fails This is not really testing: testing is not the same as a demonstration Aim to break (your) code, if it can be broken

Levels of Testing Adapted from Beizer, by Amman and Offutt Level 0: Testing is debugging Level 1: Testing is to show the program works Level 2: Testing is to show the program doesn’t work Level 3: Testing is not to prove anything specific, but to reduce risk of using program Level 4: Testing is a mental discipline that helps develop higher quality software

What’s So Good About Coverage? Consider a fault that causes failure every time the code is executed Don’t execute the code: cannot possibly find the fault! That’s a pretty good argument for statement coverage int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; }

What’s So Good About Coverage? We should have an argument for any kind of coverage: “If I don’t cover this, then there is more chance I’ll miss a fault like that” Backed with empirical data, preferably! int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; }

Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; Let’s write a tester for this version of the program (back to the first off-by-one bug) Forget for a moment that we know what the bug is!

Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1; What kind of coverage might we want to think about when testing this code?

Return to Our Example What kind of coverage does this tester exploit? #define N 5 // 5 is “big enough”? int testFind () { int a[N]; int p, i; for (p = 0; p < N; p++) { random_assign(a, N) a[p] = 3; for (i = p; i < N; i++) { if (a[i] == 3) a[i] = a[i] – 1; } printf (“TEST: findLast({”); print_array(a, N); printf (“}, %d, 3)”, N); assert (findLast(a, N, 3) == p); What kind of coverage does this tester exploit?