Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Welcome to CS 362 Applied Software Engineering Dr. Alex Groce (KEC 3067) Testing, debugging, running programs Design for testability Implementation (actual.

Similar presentations


Presentation on theme: "1 Welcome to CS 362 Applied Software Engineering Dr. Alex Groce (KEC 3067) Testing, debugging, running programs Design for testability Implementation (actual."— Presentation transcript:

1 1 Welcome to CS 362 Applied Software Engineering Dr. Alex Groce (KEC 3067) Testing, debugging, running programs Design for testability Implementation (actual coding!) Maintenance and bug tracking “Compile-time” testing – warnings, static analysis, build systems Testing, testing, testing

2 2 Today Some general background Topics of the class Testing project Basic definitions Black box testing (FSM) algorithms Why is testing difficult, in theory and practice?

3 3 Before we start What do I know about testing, anyway? I’ve written programs and tested them So have most of you, I would bet Split my time at JPL (before I came to OSU) between model checking & testing research E.g., testing the file systems that will be used in the Mars Science Laboratory – JPL’s next big Mars mission

4 4 Black box (Finite State Machine) testing Design for testability Coverage measures Random testing Constraint-based testing Debugging and test case minimization Using model checkers for testing Coverage revisited (“small model property”) Topics in Testing We’ll Cover

5 5 Static program analysis Including by the compiler Coding standards/rules/ inspections Other Topics of Interest

6 6 Read All About It Books I like that have something important to say about this side of software engineering (though none of these are about testing): The Practice of Programming, Kernighan and Pike Programming Pearls, Bentley Why Programs Fail: A Guide to Systematic Debugging, Zeller Code Complete, McConnell The Mythical Man-Month, Brooks

7 7 Read All About It Book about testing (our textbook, IF you want it) Introduction to Software Testing, Ammons and Offutt I like it myself Recommended by colleagues who’ve taught classes on testing (and are first-rate testing researchers) Book is thorough and cleverly organized, provokes some real thought about how to test programs NOT REQUIRED, marginally recommended I will only loosely follow this book As it is, we’ll take more of a “hit the highlights” approach – More concentrated on automated techniques: random testing, constraint-based testing, and model checking Won’t stop me from using some of their slides for areas they cover well We’ll also cover some non-testing topics from time to time

8 8 Read All About It Practical info on how to test, from James Whittaker (Microsoft, now Google): How to Break Software How to Break Software Security

9 9 Testing Project Start with: interface and a chunk of (buggy) code Implement one function Start writing tests! Have a “clean” suite that will run for your/others’ code Apply tests to other students’ code and submit bug reports in the repository Write periodic test reports informing “management” what you’ve done

10 10 Testing Project Grading criteria Effectiveness of your testing approach: coverage, bugs found, depth/interest, originality of ideas! Quality of test/bug reports Can I figure out how you tested the system? Can I figure out what wasn’t tested? Can I figure out how reliable you think the code is?

11 11 Testing Project Expectations You can program in C You can figure out possibly poorly written specifications (a key software engineering skill) Or know when to ask someone who can! You can (learn to) use makefiles / build system “he who learns to play the harp learns to play by playing it” - Aristotle, Metaphysics, Book IX

12 12 Testing Project Oh, right. What are we implementing and testing? A card game Well defined notion of correctness “Specifications” not intended for implementation, but intended to be unambiguous Games are a good example of formal specifications: no one wants to have to “make up” the rule or interpret ambiguity in the middle of playing a game

13 13 Testing Project What are we implementing and testing? Dominion (+ expansions) Card game published by Rio Grande games Players start with their own draw decks Buy cards to add to decks (using cards) Some cards worth points if in deck Play this cards & you can draw a card and gain two actions (playing a card is an action)

14 14 Grading & Other Admin Stuff Project: 75% Late “midterm”: 15% (Possibly-in-class) quizzes/exercises/homework: 10% My office hours: Wed, 11-12:30 KEC 3067 TA: Amin Alipour, Anirban Roy

15 15 Show of Hands Who here has: Debugged a program Written a unit test Written a test for a full program Tested someone else’s program Submitted a bug report on a program Debugged someone else’s program Used a source control system (svn, cvs, etc.) Used a static analysis tool

16 16 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults a.k.a. BUGS

17 17 Bugs “It has been just so in all of my inventions. The first step is an intuition, and comes with a burst, then difficulties arise—this thing gives out and [it is] then that 'Bugs'—as such little faults and difficulties are called—show themselves and months of intense watching, study and labor are requisite... ” – Thomas Edison “an analyzing process must equally have been performed in order to furnish the Analytical Engine with the necessary operative data; and that herein may also lie a possible source of error. Granted that the actual mechanism is unerring in its processes, the cards may give it wrong orders. ” – Ada, Countess Lovelace (notes on Babbage’s Analytical Engine) Hopper’s “bug” (moth stuck in a relay on an early machine)

18 18 Testing What isn’t software testing? Purely static analysis: examining a program’s source code or binary in order to find bugs, but not executing the program Good stuff, and very important, but it’s not testing We’ll get back to this in a future class Fuzzy borderline: if we only symbolically execute the program For this class, we’ll call it testing when the program actually runs (but maybe in a virtual machine)

19 19 Why Testing? Ideally: we prove code correct, using formal mathematical techniques (with a computer, not chalk) Extremely difficult: for some trivial programs (100 lines) and many small (5K lines) programs Simply not practical to prove correctness in most cases – often not even for safety or mission critical code

20 20 Why Testing? Nearly ideally: use symbolic or abstract model checking to prove the system correct Automatically extracts a mathematical abstraction from a system Proves properties over all possible executions In practice, can work well for very simple properties (“this program never crashes in this particular way”), but can’t handle complex properties (“this is a working file system”) Doesn’t work well for programs with complex data structures (like a file system)

21 21 As a last resort… … we can actually run the program, to see if it works This is software testing Always necessary, even when you can prove correctness – because the proof is seldom directly tied to the actual code that runs “Beware of bugs in the above code; I have only proved it correct, not tried it” – Knuth

22 22 Why Does Testing Matter? NIST report, “The Economic Impacts of Inadequate Infrastructure for Software Testing” (2002) Inadequate software testing costs the US alone between $22 and $59 billion annually Better approaches could cut this amount in half Major failures: Ariane 5 explosion, Mars Polar Lander, Intel’s Pentium FDIV bug Insufficient testing of safety-critical software can cost lives: THERAC-25 radiation machine: 3 dead We want our programs to be reliable Testing is how, in most cases, we find out if they are Mars Polar Lander crash site? THERAC-25 design Ariane 5: exception-handling bug : forced self destruct on maiden flight (64-bit to 16-bit conversion: about 370 million $ lost)

23 23 Testing and Monitoring In this class, we’ll look at which executions of a program to run I’ll call this problem “the” testing problem Second problem: how do we know if an execution reveals a bug? Key question when monitoring deployed programs to handle faults or send in bug reports from the field I’ll (mostly) take this for granted: we have a reference model or assertions to check

24 24 Example: File System Testing File system is a library, called by other components of the flight software Accepts a fixed set of operations that manipulate files: OperationResult mkdir (“/eng”, …)SUCCESS mkdir (“/data”, …)SUCCESS creat (“/data/image01”, …)SUCCESS creat (“/eng/fsw/code”, …)ENOENT mkdir (“/data/telemetry”, …)SUCCESS unlink (“/data/image01”) SUCCESS / /eng/data image01/telemetry File system

25 25 Example: File System Testing Easy to detect many errors: we have access to many working file systems, and can just compare results Choose operation F Perform F on Tested FS Perform F on Reference (if applicable) Compare return values Compare error codes Compare file systems Check invariants (inject a fault?) (in this unusual case, the oracle problem isn’t really there!)

26 26 Example: File System Testing How hard would it be to just try “all” the possibilities? Consider only core 7 operations ( mkdir, rmdir, creat, open, close, read, write ) Most of these take either a file name or a numeric argument, or both Even for a “reasonable” (but not provably safe) limitation of the parameters, there are 266 10 executions of length 10 to try Not a realistic possibility (unless we have 10 12 years to test)

27 27 The Testing Problem This is a primary topic of this class: what “questions” do we pose to the software, i.e., How do we select a small set of executions out of a very large set of executions? Fundamental problem of software testing research and practice An open (and essentially unsolvable, in the general case) problem

28 28 The Testing Problem / Terms This is not a class in the management or even the basic practices of testing Hard, important problem But not the focus of this class This class is going to focus on state-of-the-art automated approaches Using tools To catch the bugs that you don’t catch with basic practices I will briefly cover some basic terms of testing and testing management today, then we’ll mostly dive into “How To Test It” at a more technical level

29 29 Terms: Unit, Integration, System Testing Stages of testing Unit testing is the first phase, done by developers of modules Integration testing combines unit tested modules and tests how they interact System testing tests a whole program to make sure it meets requirements “Design testing” is testing prototypes or very abstract models before implementation – seldom mentioned, but when possible it can save your bacon Exhaustive model checking may be possible at this stage

30 30 Terms: Regression Testing Regression testing Changes can break code, reintroduce old bugs Things that used to work may stop working (e.g., because of another “fix”) – software regresses Usually a set of cases that have failed (& then succeeded) in the past Finding small regressions is an ongoing research area – analyze dependencies “... as a consequence of the introduction of new bugs, program maintenance requires far more system testing.... Theoretically, after each fix one must run the entire batch of test cases previously run against the system, to ensure that it has not been damaged in an obscure way. In practice, such regression testing must indeed approximate this theoretical idea, and it is very costly." - Brooks, The Mythical Man-Month

31 31 Terms: The Oracle Problem The oracle problem How to know if a test fails If the oracle says every execution is good, why bother running the program? Some obvious, easily automated approaches: The program probably shouldn’t crash Assertions shouldn’t be violated Automatable, but more difficult to apply: Differential testing (McKeeman, etc.) – when you have another program, likely correct, that does the same thing, just compare outputs over same inputs Last resort, not automatable: Hand inspection of executions (oracle: a magical source of truth, often cryptic, given by the gods)

32 32 Terms: Test (Case) vs. Test Suite Test (case): one execution of the program, that may expose a bug Test suite: a set of executions of a program, grouped together A test suite is made of test cases Tester: a program that generates tests Line gets blurry when testing functions, not programs – especially with persistent state

33 33 Terms: Black Box Testing Black box testing Treats a program or system as a That is, testing that does not look at source code or internal structure of the system Send a program a stream of inputs, observe the outputs, decide if the system passed or failed the test Abstracts away the internals – a useful perspective for integration and system testing Sometimes you don’t have access to source code, and can make little use of object code True black box? Access only over a network

34 34 Terms: White Box Testing White box testing Opens up the box! (also known as glass box, clear box, or structural testing) Use source code (or other structure beyond the input/output spec.) to design test cases Brings us to the idea of coverage

35 35 Terms: Coverage Coverage measures or metrics Abstraction of “what a test suite tests” in a structural sense Best explained by giving examples Common measures: Statement coverage A.k.a line coverage or basic block coverage Which statements execute in a test suite Decision coverage Which boolean expressions in control structures evaluated to both true and false during suite execution Path coverage Which paths through a program’s control flow graph are taken in the test suite

36 36 Terms: Coverage Measures In general, used to measure the quality of a test suite Even in cases where the suite was designed for some other purpose (such as testing lots of different use scenarios) Not always a very good measure of suite quality, but “better than nothing” We “open the box” in white box testing partly in order to look at (and design tests to achieve) coverage We’ll cover coverage in much more detail

37 37 Terms: Mutation Testing A mutation of a program is a version of the program with one or more random changes Mutation testing is another way to measure the quality of a test suite Amman and Offutt call it syntax-based coverage Idea: generate a large number of mutants Run the test suite on these If few mutants are detected, the test suite may not be very good Difficulties Cost of testing many versions of a program How to generate mutants (operators) In principle, can subsume many other forms of coverage

38 38 Black Box (Finite State Machine) Testing “The latter situation was not very easy to see, because it is very dark inside a black box.” – Henry Petroski (The Essential Engineer)

39 39 Black box(FSM) testing Let’s step back from software testing Let’s look at a simpler model Finite state machines Software is a finite state machine What? Software is a Turing machine, right? Lego “Turing machine” Only with an infinite tape. That is, only if your software has access to infinite memory.

40 40 Black box (FSM) testing With static memory allocation or with limited dynamic allocation nothing is infinite Even if you add in disk or network storage We don’t have infinite electrons, much less memory So software systems are finite state machines, in reality Don’t you feel better now? No more late nights worrying about the halting problem! there are only ~10 79 of these little guys, y’know?

41 41 Black box (FSM) testing Theoretical issues aside, why do we care about testing finite state machines? Abstraction: designs can often be best understood as finite-state machines String processing/searching Protocols – communication, cache coherence, etc. Control component of any discrete system Automatic abstraction: Tools that take systems and produce (coarse) finite state abstractions

42 42 Black box (FSM) testing Useful for modeling aspects of many designs FD = open (“/foo”) close(FD) read(FD, buf, nbytes) write(FD, buf, nbytes)

43 43 Very Simple FSM Model FSM is a tuple, S is a set of states  is the input alphabet T is the transition relation T: S x  x S I  S is the initial state Further assume: Machine is deterministic T is a (partial) function S x   S Given an input from , machine either Outputs 0 (if no transition) Or outputs 1 and takes the transition to s’ a c a b a d a 1 b1 c0

44 44 Conformance Testing How do we test finite state machines? Let’s say we have Known FSM A Know all states and transitions Unknown FSM B (same alphabet) Can only perform experiments How do we tell if A = B? Known as the conformance testing or equivalence testing problem As stated, we cannot solve the problem Why? a c a b a d

45 45 Combination Lock Machine How many states does B have? If we don’t know, we can never be sure it is the same machine as A B is a combination lock: looks like A unless we input exact sequence “b u g” – in which case it deadlocks Machine A a-z Machine B a, c-z a-t, v-z b a-f, h-z u g

46 46 Combination Lock Machine Even if we know upper limit n on B’s size, for alphabet of size |  | It takes |  | n tests to check equivalence to this particular A This pathological case imposes some limits on conformance testing in general Machine A a-z Machine B a, c-z a-t, v-z b a-f, h-z u g

47 47 Conformance Testing (VC Algorithm) Algorithm due to Vasilevskii and Chow for conformance testing Assumptions A is minimized, has m states B has no more than n states A, B both have a reliable reset We can start from initial state at will Worst-case complexity: O(n 2 m |  | n-m+1 ) I’ll cover this quickly and informally, skipping over the sub-algorithms

48 48 Conformance Testing (VC Algorithm) First, we find a path to each state of A Typically, we compute a spanning tree For example, by a depth first search (DFS) Call this set P a c a b d a ab d Read the paths off of the tree: a a a a a d a b a

49 49 Conformance Testing (VC Algorithm) Next, compute a characterizing (or distinguishing) set for A Set W of input sequences such that  s, s’  S s  s’ Exists w  W Output for w from s not equal to output from s’ i.e., we can use W to tell what state we’re in

50 50 Conformance Testing (VC Algorithm) Next, compute a characterizing (or distinguishing) set for A For example, W for A might be: {aa, b} aa: 11, 10, 01, 00, 10 Distinguishes all but these two states Which are distinguished by b (1 vs. 0) Can we find another (better?) set? a c a b a d

51 51 Conformance Testing (VC Algorithm) Now we can compute Z: W U   W U  2  W U …  m-n  W To test B for conformance with A Run the tests produced by taking cross-product of P and Z on both A and B

52 52 Conformance Testing (VC Algorithm) P: {<>, a, aa, aad, ab} W: {aa, b} Let’s say we know B has no more than 6 states The complete testing sequence (with reset before each test on each machine) is: {aa, b, a aa, a b, aa aa, aa b, aad aa, aad b, ab aa, ab b, a aa, b aa, c aa, d aa, a b, b b, c b, d b, a a aa, a b aa, a c aa, a d aa, a a b, a b b, a c b, a d b, aa a aa, aa b aa, aa c aa, aa d aa, aa a b, aa b b, aa c b, aa d b, aad a aa, aad b aa, aad c aa, aad d aa, aad a b, aad b b, aad c b, aad d b, ab a aa, ab b aa, ab c aa, ab d aa, ab a b, ab b b, ab c b, ab d b} a c a b a d

53 53 Conformance Testing (VC Algorithm) As this small example shows, exhaustive tests can be very expensive In general, we cannot computationally afford to perform complete testing We will always face the risk of missing errors Even when we reduce our problem to the simplest model The complexity of testing full equivalence to a reference model is simply too high Exhaustion is exhausting

54 54 From FSM Testing to the Big Picture Testing (almost always) is an attempt to Cover some measure of a structure Nodes of a graph (e.g., VC’s spanning tree) Inputs that give different outputs (e.g., VC’s distinguishing set) All possible inputs (e.g.,  m-n ) Logical expression evaluations Predicates over program variables Pairs of where a variable is defined and where it is used (data flow) Usually, we can’t even guarantee that coverage directly correlates to more bugs found

55 55 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults a.k.a. BUGS But also, in order to Increase our confidence that the program has high quality and low risk Because we can never be sure we caught all bugs How does a set of executions increase confidence? Sometimes, by algorithmic argument (VC) Sometimes by less formal arguments (coverage in general)


Download ppt "1 Welcome to CS 362 Applied Software Engineering Dr. Alex Groce (KEC 3067) Testing, debugging, running programs Design for testability Implementation (actual."

Similar presentations


Ads by Google