Basic Definitions: Testing

Name: Basic Definitions: Testing
Uploaded: 2017-07-22T18:44:33+00:00
Duration: PTM33S4
Description: Basic Definitions: Testing

Basic Definitions: Testing
What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults a.k.a. BUGS Hrm. . . that’s a lot of “a.k.a”s Let’s refine this terminology a bit

Faults, Errors, and Failures
Fault: a static flaw in a program What we usually think of as “a bug” Error: a bad program state that results from a fault Not every fault always produces an error Failure: an observable incorrect behavior of a program as a result of an error Not every error ever becomes visible

To Expose a Fault with a Test
Reachability: the test much actually reach and execute the location of the fault Infection: the fault must actually corrupt the program state (produce an error) Propagation: the error must persist and cause an incorrect output – a failure

An Example Find the fault int findLast (int a[], int n, int x) {
// Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; Find the fault

An Example Here’s a test case: a = {} n = 0 x = 2
int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; Here’s a test case: a = {} n = 0 x = 2 Does not even reach the fault

An Example Here’s another: a = {3, 9, 4} n = 3 x = 2
int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1; Here’s another: a = {3, 9, 4} n = 3 x = 2 Reaches the fault Infects state with error But no failure

An Example And finally: a = {2, 9, 4} n = 3 x = 2
int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1; And finally: a = {2, 9, 4} n = 3 x = 2 Reaches the fault Infects state with error And fails – returns -1 instead of 0

Controllability and Observability
Goals for a test case: Reach a fault Produce an error Make the error visible as a failure In order to make this easy the program must be controllable and observable Controllability: How easy it is to drive the program where we want to go Observability: How easy it is to tell what the program is doing

Design for Testability
If a program is not designed to be controllable and observable, it generally won’t be We have to start preparing for testing before we write any code Testing as an after-the-fact, ad hoc, exercise is often limited by earlier design choices

Test-Driven Development
One way to design for testability is to write the test cases before the code Idea arising from Extreme Programming and agile development Write automated test cases first Then write the code to satisfy tests Helps focus attention on making software well-specified Forces observability and controllability: you have to be able to handle the test cases you’ve already written (before deciding they were impractical) Reduces temptation to tailor tests to idiosyncratic behaviors of implementation

Controllability: Simulation and Stubbing
A key to controllable code is effective simulation and stubbing Simulation of low-level hardware devices through a clean driver interface Real hardware may be slow May be impossible/expensive to induce some hardware failure modes on real hardware Real hardware may be a limited resource Stubbing for other routines and code Other code/modules may not be complete May be slow and irrelevant to test May need to simulate failure of other modules

Simulation and Stubbing: JPL Example
When testing JPL flash storage modules we rely on software simulation of flash devices Real flash devices are slow Can’t do aggressive random testing Real flash devices are expensive JPL only has a few boards – constant competition to test on these Running hundreds of thousand of tests will wear the flash hardware out Enables us to introduce rare hardware failures System resets, spontaneous bad blocks and write failures, etc.

Controllability: Downwards Scalability
Another important aspect of controllability is to make code “downwards scalable” Many faults cause an error only in a corner case due to a resource limit An effective strategy for finding errors is to reduce the resource limits Test a version of the program with very tight bounds Finding corner cases is easier if the corners are close together Too many programs hard-code resource limits or make assumptions about resources unconnected to defined limits E.g., not checking the result of malloc

Downwards Scalability: JPL Example
Flight flash hardware is usually 1-4 GB device E.g., 64 blocks of 32 pages of 8192 bytes We primarily test with much smaller “devices” (using software simulation) 6 blocks of 4 pages of 64 bytes Forces flash file system to compact storage more often Tests assumptions about how space is used on flash Forces more multi-page writes and directory entries over multiple pages

Downwards Scalability: JPL Example
Easier to explore various combinations of states of blocks/pages of the device Used page Free page Dirty page Bad block

Controllability Other important themes for controllability
Network/file access If program reads from the network or to remote files, this is hard to control Again, simulation and stubbing are key System calls Similarly, reading the time from the operating system can be hard to control Simulation and stubbing – Operating System Abstraction Layer etc. GUI control Allow scripted control of GUI elements so tests can be automated

Observability: Assertions
Assertions improve observability by making (some) errors into failures Even if the effect of a fault doesn’t propagate, it may be visible if an assertion checks the state at the right time Assertions also improve observability by making the error, rather than failure, visible Know how the state was corrupted directly, not just eventual effect

Observability: Invariant Checkers
Can extend the idea of assertions to writing “full” invariant checkers Do a crawl of code’s basic data structures Check various invariants that would be too expensive to check at runtime Invariant checker can be written to be easy-to-use: recursion, memory allocation, etc. Won’t run on actual system But be careful! If your invariant checker has a bug and changes the system state. . .

Observability Other important themes for observability Logging
Especially critical for GUI interfaces, to mirror GUI events in ordered parseable messages Network/file access If program writes to the network or to remote files, this is hard to observe

Controllability & Observability: Memory Allocation
More extreme case: embedded code for mission or safety critical systems May be running without memory protection Dynamic allocation often forbidden Design module to accept a static block allocated elsewhere, and only access this memory Controllability: allows us to introduce memory faults, simulate warm reboots Observability: allows us to easily instrument code with low-overhead checks to find memory safety violations during testing

Coverage Literature of software testing is primarily concerned with various notions of coverage Ammann and Offutt identify four basic kinds of coverage: Graph coverage Logic coverage Input space partitioning Syntax-based coverage

Graph Coverage Cover all the nodes, edges, or paths of some graph related to the program Examples: Statement coverage Branch coverage Path coverage Data flow (def-use) coverage Model-based testing coverage Many more – most common kind of coverage, by far

Graph Coverage Most FSM testing algorithms can be seen as graph coverage Consider VC – computing a spanning tree to nodes is standard graph exploration Beizer: “find a graph and cover it”

Statement/Basic Block Coverage
if (x < y) { y = 0; x = x + 1; } else x = y; Statement coverage: Cover every node of these graphs 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 3 1 2 x >= y x < y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } Treat as one node because if one statement executes the other must also execute (code is a basic block)

Branch Coverage if (x < y) { y = 0; x = x + 1; } else x = y;
Branch coverage vs. statement coverage: Same for if-then-else 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 3 1 2 x >= y x < y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } But consider this if-then structure. For branch coverage can’t just cover all nodes, but must cover all edges – get to node 3 both after 2 and without executing 2!

Path Coverage How many paths through this code are there? Need one test case for each to get path coverage if (x < y) { y = 0; x = x + 1; } else x = y; 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 To get statement and branch coverage, we only need two test cases: and 6 4 5 x >= y x < y y = 0 x = x + 1 Path coverage needs two more: In general: exponential in the number of conditional branches!

Data Flow Coverage 1 2 3 4 4 5 6 7 x = 3; y = 3; if (w) { x = y + 2; }
if (z) { y = x – 2; n = x + y x = 3 Def(x) Annotate program with locations where variables are defined and used (very basic static analysis) 2 y = 3 Def(y) 5 3 4 !w w x = y + 2 Def-use pair coverage requires executing all possible pairs of nodes where a variable is first defined and then used, without any intervening re-definitions Def(x) Use(y) 7 4 6 !z z y = x - 2 E.g., this path covers the pair where x is defined at 1 and used at 7: Def(y) Use(x) May be many pairs, some not actually executable But this path does NOT: n = x + y Use(x) Use(y)

((a>b) || G)) && (x < y) ((a <= b) && !G) || (x >= y)
Logic Coverage What if, instead of: if (x < y) { y = 0; x = x + 1; } 1 ((a>b) || G)) && (x < y) y = 0 x = x + 1 2 ((a <= b) && !G) || (x >= y) 3 we have: if (((a>b) || G)) && (x < y)) { y = 0; x = x + 1; } Now, branch coverage will guarantee that we cover all the edges, but does not guarantee we will do so for all the different logical reasons We want to test the logic of the guard of the if statement

Active Clause Coverage
( (a > b) or G ) and (x < y) 1 T F T T 2 F F T F With these values for G and (x<y), (a>b) determines the value of the predicate duplicate With these values for (a>b) and (x<y), G determines the value of the predicate 3 F T T T 4 F F T F With these values for (a>b) and G, (x<y) determines the value of the predicate 5 T T T T 6 T T F F 29

Input Domain Partitioning
Partition scheme q of domain D The partition q defines a set of blocks, Bq = b1 , b2 , … bQ The partition must satisfy two properties: blocks must be pairwise disjoint (no overlap) together the blocks cover the domain D (complete) b1 b2 b3 bi  bj = ,  i  j, bi, bj  Bq  b = D b  Bq Coverage then means using at least one input from each of b1, b2, b3, . . . 30

Input Domain Partitioning
Some subtleties here… What’s wrong with this partition of file contents? { b1: Sorted ascending file b2: Sorted descending file b3: Neither sorted ascending nor sorted descending } b1 b2 b3 bi  bj = ,  i  j, bi, bj  Bq  b = D b  Bq 31

Syntax-Based Coverage
Based on mutation testing (a pet topic of Amman and Offutt, who are heavily into this research area) Bit different kind of creature than the other coverages we’ve looked at Idea: generate many syntactic mutants of the original program Coverage: how many mutants does a test suite kill (detect)? 32

Mutating Our Buggy Program
int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1;

Mutant #1 int findLast (int a[], int n, int x) {
// Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n; i > 0; i--) { if (a[i] = x) return i; } return -1;

// Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return 0;

// Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] != x) return i; } return -1;

// Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = n) return i; } return -1;

Mutant #5: Wait, this one’s the fix!
int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return -1;

Syntax-Based Coverage
MUTANTS OF P Program P P 100% coverage means you kill all the mutants with your test suite 39

Generation vs. Recognition
Generation of tests based on coverage means producing a test suite to achieve a certain level of coverage As you can imagine, generally very hard Consider: generating a suite for 100% statement coverage easily reaches “solving the halting problem” level Obviously hard for, say, mutant-killing Recognition means seeing what level of coverage an existing test suite reaches

Coverage and Subsumption
Sometimes one coverage approach subsumes another If you achieve 100% coverage of criteria A, you are guaranteed to satisfy B as well For example, consider node and edge coverage (there’s a subtlety here, actually – can you spot it?) What does this mean? Unfortunately, not a great deal If test suite X satisfies “stronger” criteria A and test suite Y satisfies “weaker” criteria B Y may still reveal bugs that X does not! For example, consider our running example and statement vs. branch coverage It means we should take coverage with a grain of salt, for one thing

Testing “for” Coverage
Never seek to improve coverage just for the sake of increasing coverage Well, unless it’s a command from-on-high Coverage is not the goal Finding failures that expose faults is the goal No amount of coverage will prove that the program cannot fail “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming

The Purpose of Testing “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming Dijkstra meant this as a criticism of testing and an argument in favor of more disciplined and total approaches (proving programs correct) But he also points out what testing is good for: exposing errors Coverage is valuable if and only if test sets with higher coverage are more likely to expose failures

The Purpose of Testing “Program testing can be used to show the presence of bugs” When we first start “testing,” we often want to “see that the program works” Try out some scenarios and watch the program “do its stuff” Surprised (annoyed) when (if) the program fails This is not really testing: testing is not the same as a demonstration Aim to break (your) code, if it can be broken

Levels of Testing Adapted from Beizer, by Amman and Offutt
Level 0: Testing is debugging Level 1: Testing is to show the program works Level 2: Testing is to show the program doesn’t work Level 3: Testing is not to prove anything specific, but to reduce risk of using program Level 4: Testing is a mental discipline that helps develop higher quality software

What’s So Good About Coverage?
Consider a fault that causes failure every time the code is executed Don’t execute the code: cannot possibly find the fault! That’s a pretty good argument for statement coverage int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; }

What’s So Good About Coverage?
We should have an argument for any kind of coverage: “If I don’t cover this, then there is more chance I’ll miss a fault like that” Backed with empirical data, preferably! int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; }

Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; Let’s write a tester for this version of the program (back to the first off-by-one bug) Forget for a moment that we know what the bug is!

Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1; What kind of coverage might we want to think about when testing this code?

Return to Our Example What kind of coverage does this tester exploit?
#define N 5 // 5 is “big enough”? int testFind () { int a[N]; int p, i; for (p = 0; p < N; p++) { random_assign(a, N) a[p] = 3; for (i = p; i < N; i++) { if (a[i] == 3) a[i] = a[i] – 1; } printf (“TEST: findLast({”); print_array(a, N); printf (“}, %d, 3)”, N); assert (findLast(a, N, 3) == p); What kind of coverage does this tester exploit?

Basic Definitions: Testing

Similar presentations

Presentation on theme: "Basic Definitions: Testing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basic Definitions: Testing

Similar presentations

Presentation on theme: "Basic Definitions: Testing"— Presentation transcript:

Similar presentations

About project

Feedback