Model Counting for Test Coverage, CodeHunt & Mutations Willem Visser Stellenbosch University.

Slides:



Advertisements
Similar presentations
A Survey of Approaches for Automated Unit Testing
Advertisements

PLDI’2005Page 1June 2005 Example (C code) int double(int x) { return 2 * x; } void test_me(int x, int y) { int z = double(x); if (z==y) { if (y == x+10)
Model Counting >= Symbolic Execution Willem Visser Stellenbosch University Joint work with Matt Dwyer (UNL, USA) Jaco Geldenhuys (SU, RSA) Corina Pasareanu.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Symbolic execution © Marcelo d’Amorim 2010.
Necessity of Systematic & Automated Testing Techniques Moonzoo Kim CS Dept, KAIST.
Unit Testing CSSE 376, Software Quality Assurance Rose-Hulman Institute of Technology March 27, 2007.
CSE503: SOFTWARE ENGINEERING SYMBOLIC TESTING, AUTOMATED TEST GENERATION … AND MORE! David Notkin Spring 2011.
DART Directed Automated Random Testing Patrice Godefroid, Nils Klarlund, and Koushik Sen Syed Nabeel.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Copyright by Scott GrissomCh 1 Software Development Slide 1 Software Development The process of developing large software projects Different Approaches.
1 Software Testing and Quality Assurance Lecture 5 - Software Testing Techniques.
Path testing Path testing is a “design structural testing” in that it is based on detailed design & the source code of the program to be tested. The methodology.
Black Box Software Testing
CompSci 230 Software Design and Construction
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented.
Symbolic Execution with Mixed Concrete-Symbolic Solving (SymCrete Execution) Jonathan Manos.
Chapter 12: Software Testing Omar Meqdadi SE 273 Lecture 12 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Today’s Agenda  Quick Review  Finish Graph-Based Testing  Predicate Testing Software Testing and Maintenance 1.
Coverage Literature of software testing is primarily concerned with various notions of coverage Four basic kinds of coverage: Graph coverage Logic coverage.
Exercise Solutions 2014 Fall Term. Week 2: Exercise 1 public static Boolean repOK(Stack mystack) { if (mystack.capacity() < 0) { return false;
Verification of Java Programs using Symbolic Execution and Loop Invariant Generation Corina Pasareanu (Kestrel Technology LLC) Willem Visser (RIACS/USRA)
24/3/00SEM107 - © Kamin & ReddyClass 16 - Searching - 1 Class 16 - Searching r Linear search r Binary search r Binary search trees.
Software Testing. 2 CMSC 345, Version 4/12 Topics The testing process  unit testing  integration and system testing  acceptance testing Test case planning.
1 Debugging. 2 A Lot of Time is Spent Debugging Programs Debugging. Cyclic process of editing, compiling, and fixing errors. n Always a logical explanation.
Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.
Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.
Unit Testing 101 Black Box v. White Box. Definition of V&V Verification - is the product correct Validation - is it the correct product.
Boolean expressions, part 2: Logical operators. Previously discussed Recall that there are 2 types of operators that return a boolean result (true or.
16 October Reminder Types of Testing: Purpose  Functional testing  Usability testing  Conformance testing  Performance testing  Acceptance.
Model Counting A Quest for Nails 2 Willem Visser Stellenbosch University Joint work with Matt Dwyer (UNL, USA) Jaco Geldenhuys (SU, RSA) Corina Pasareanu.
Summarizing “Structural” Testing Now that we have learned to create test cases through both: – a) Functional (blackbox)and – b) Structural (whitebox) testing.
Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.
Review, Pseudocode, Flow Charting, and Storyboarding.
Week 6.  Lab 1 and 2 results  Common mistakes in Style  Lab 1 common mistakes in Design  Lab 2 common mistakes in Design  Tips on PE preparation.
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.
Com1040 Systems Design and Testing Part II – Testing (Based on A.J. Cowling’s lecture notes) LN-Test1+2: Introduction to Testing Marian Gheorghe ©University.
Types of Triangles. Angles The angles in a triangle add up to o + 60 o + 60 o =
Software Construction Lecture 19 Software Testing-2.
Model Counting with Applications to CodeHunt Willem Visser Stellenbosch University South Africa.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
Dynamic Testing.
Chapter 12: Software Testing Omar Meqdadi SE 273 Lecture 12 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
( = “unknown yet”) Our novel symbolic execution framework: - extends model checking to programs that have complex inputs with unbounded (very large) data.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION SYMBOLIC TESTING Autumn 2011.
Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.
Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.
Testing It is much better to have a plan when testing your programs than it is to just randomly try values in a haphazard fashion. Testing Strategies:
Testing (final thoughts). equals() and hashCode() Important when using Hash-based containers class Duration { public final int min; public final int sec;
Software Testing. Software Quality Assurance Overarching term Time consuming (40% to 90% of dev effort) Includes –Verification: Building the product right,
A Review of Software Testing - P. David Coward
Advanced Concepts for/using Symbolic Execution
Paul Ammann & Jeff Offutt
Input Space Partition Testing CS 4501 / 6501 Software Testing
Verification and Testing
Types of Testing Visit to more Learning Resources.
White-Box Testing Using Pex
Testing, conclusion Based on material by Michael Ernst, University of Washington.
Conditional Branching
Paul Ammann & Jeff Offutt
Automatic Test Generation SymCrete
Software Development Process
Example (C code) int double(int x) { return 2 * x; }
CSE 1020:Software Development
Presentation transcript:

Model Counting for Test Coverage, CodeHunt & Mutations Willem Visser Stellenbosch University

In a perfect world… only linear integer constraints and only uniform distributions

void test(int x, int y) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; } Symbolic Execution [ Y=X*10 ] S0 [ X>3 & 10<Y=X*10] S2 [ true ] test (X,Y) [ Y!=X*10 & !(X>3 & Y>10) ] S3 [ Y!=X*10 ] S1 [ Y=X*10 & !(X>3 & Y>10) ] S3 [ X>3 & 10<Y!=X*10] S2 Test(1,10) reaches S0,S3 Test(0,1) reaches S1,S3 Test(4,11) reaches S1,S2

void test(int x, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; } Almost Rivers [ Y=X*10 ] [ Y!=X*10 ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10][ Y!=X*10 & !(X>3 & Y>10) ] [ true ] [ Y=X*10 & !(X>3 & Y>10) ] y=10x x>3 & y> Which of 1, 2, 3 or 4 is the most likely?

void test(int x, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; } Rivers [ Y=X*10 ] [ Y!=X*10 ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10][ Y!=X*10 & !(X>3 & Y>10) ] [ true ] [ Y=X*10 & !(X>3 & Y>10) ] y=10x x>3 & y>10

LattE Model Counter Count solutions for conjunction of Linear Inequalities

void test(int x, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; else S3; } Rivers of Values [ Y=X*10 ][ Y!=X*10 ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10][ Y!=X*10 & !(X>3 & Y>10) ] [ true ] [ Y=X*10 & !(X>3 & Y>10) ] y=10x x>3 & y>

[ Y=X*10 ] [ Y!=X*10 ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ] [ true ] [ Y=X*10 & !(X>3 & Y>10) ] y=10x x>3 & y> Program Understanding

[ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ] [ Y=X*10 & !(X>3 & Y>10) ] y=10x x>3 & y> Probabilities

[ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ] [ Y=X*10 & !(X>3 & Y>10) ] y=10x x>3 & y> Reliability Reliable

Coverage Revisited The goal of coverage is to measure the quality of a test suite or When can I stop testing? but what you really want is the likelihood of there being any errors left when you stop testing

Input Coverage How many of the inputs have been covered Use model counting on input partitions Symbolic execution gives you the input partitions

Input Coverage Method Run the test suite and collect all the symbolic input partitions i.e. all the path conditions Count the solutions for these Divide by the input domain size to obtain coverage %

public static int classify(int i, int j, int k) { if ((i <= 0) || (j <= 0) || (k <= 0)) return 4; int type = 0; if (i == j) type = type + 1; if (i == k) type = type + 2; if (j == k) type = type + 3; if (type == 0) { // Confirm it is a legal triangle before declaring it to be scalene. if ((i + j <= k) || (j + k <= i) || (i + k <= j)) type = 4; else type = 1; return type; } // Confirm it is a legal triangle before declaring it to be isosceles or // equilateral. if (type > 3) type = 3; else if ((type == 1) && (i + j > k)) type = 2; else if ((type == 2) && (i + k > j)) type = 2; else if ((type == 3) && (j + k > i)) type = 2; else type = 4; return type; } DeMillo/Offutt Triangle Classification

(k+i>=j)&&((k+j)>i)&&(j+i)>k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (j+i 0)&&(j>0)&&(i>0) (k+j k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (k+i i)&&(j+i>k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (i<=0) 9900 (j 0) 9801 (k 0)&&(i>0) 7252 (k+i)>j)&&(i==k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) 7252 (j+i)>k)&&(i!=k)&&(i==j)&&(k>0)&&(j>0)&&(i>0) 7252 (k+j)>i)&&(j==k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) 2450 (k+i) 0)&&(j>0)&&(i>0) 2450 (j+i) 0)&&(j>0)&&(i>0) 2450 (k+j) 0)&&(j>0)&&(i>0) 99 (i==k)&&(i==j)&&(k>0)&&(j>0)&&(i>0) Domain All Partitions with sizes Buggy

463344(k+i>j)&&((k+j)>i)&&(j+i)>k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (k+i i)&&(j+i>k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (k+j k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (j+i 0)&&(j>0)&&(i>0) (i<=0) 9900 (j 0) 9801 (k 0)&&(i>0) 7252 (k+j>i)&&(j==k))&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) 7252 (j+i>k)&&(i!=k)&&(i==j)&&(k>0)&&(j>0)&&(i>0) 7252 (k+i>j)&&(i==k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) 2450 (k+j 0)&&(j>0))&&(i>0) 2450 (j+i 0)&&(j>0)&&(i>0) 2450 (k+i 0)&&(j>0)&&(i>0) 99 (i==k)&&(i==j)&&(k>0)&&(j>0)&&(i>0) Domain All Partitions with sizes

Random Testing Revisited SamplesBranch CoverageInput Coverage 124%46% %77% %94% %99% -99

classify(8,9,2) classify(0,0,0) classify(8,-1,-1) classify(12,9,2) classify(2,2,5) classify(1,1,1) classify(8,8,14) User Paddy’s Test Cases “My Program is Correct!”

(a!=c)&&(a!=b)&&(b>(a-c)&&(a>(b-c)&&(a>(c-b)&&(c!=0)&&(b!=0)&&(a!=0) (b (b-c)&&(a>(c-b)&&(c!=0)&&(b!=0)&&(a!=0) (a (c-b)&&(c!=0)&&(b!=0)&&(a!=0) (a<=(c-b)&&(c!=0)&&(b!=0)&&(a!=0) 10000a== (b==0)&&(a!=0) 9801(c==0)&&(b!=0)&&(a!=0) 7252 (a==c)&&(a!=b)&&(b>(a-c)&&(a>(b-c)&&(a>(c-b)&&(c!=0)&&(b!=0)&&(a!=0) 7252 (b!=c)&&(a==b)&&(b>(a-c)&&(a>(b-c)&&(a>(c-b)&&(c!=0)&&(b!=0)&&(a!=0) 99 (b==c)&&(a==b)&&(b>(a-c)&&(a>(b-c)&&(a>(c-b)&&(c!=0)&&(b!=0)&&(a!=0) Domain All Partitions Paddy with sizes

classify(8,9,2) classify(0,0,0) classify(8,-1,-1) classify(12,9,2) classify(2,2,5) classify(1,1,1) classify(8,8,14) User Paddy’s Test Cases Branch Coverage 41% Input Coverage 81% 47% 1% 16% 0% (16%) 16% 0.01% 0.73%

classify(8,9,2) classify(0,0,0) classify(8,-1,-1) classify(12,9,2) classify(2,2,5) classify(1,1,1) classify(8,8,14) User Paddy’s Test Cases on Correct Version Branch Coverage 53% Input Coverage 66% 46% (47%) 1% 1% (16%) 16%(0/16%).25%(16%) 0.01% 0.73%

Example with loops: Buggy Binary Tree //Domains: opt 1,2,3,4: 1..2; val 1,2,3,4: 0..9 public static void driver( int opt1, int opt2, int opt3, int opt4, int val1, int val2, int val3, int val4) { BinTree b = new BinTree(); if (opt1 == 1) b.add(val1); else b.remove(val1); if (opt2 == 1) b.add(val2); else b.remove(val2); … assert !b.checkTree() : " tree broken ”; }

Random Testing for Binary Tree SamplesBranch CoverageInput Coverage %48% %53% %70% %82% 2 paths out of 432 leads to the bug and each has a size of 120, i.e. 240/ (.15%)

Next part inspired by Can you add a progress bar? How wrong is the program?

Triangle Classification 2 Correct versions 2 Buggy versions – One created ourselves accidentally Small typo >= instead of <= – One from a student coding exercise Missed one condition

public static int classify(int i, int j, int k) { if ((i <= 0) || (j <= 0) || (k <= 0)) return 4; int type = 0; if (i == j) type = type + 1; if (i == k) type = type + 2; if (j == k) type = type + 3; if (type == 0) { // Confirm it is a legal triangle before declaring it to be scalene. if ((i + j <= k) || (j + k <= i) || (i + k <= j)) type = 4; else type = 1; return type; } // Confirm it is a legal triangle before declaring it to be isosceles or // equilateral. if (type > 3) type = 3; else if ((type == 1) && (i + j > k)) type = 2; else if ((type == 2) && (i + k > j)) type = 2; else if ((type == 3) && (j + k > i)) type = 2; else type = 4; return type; } Correct1

public static int classify(int i, int j, int k) { if ((i <= 0) || (j <= 0) || (k <= 0)) return 4; int type = 0; if (i == j) type = type + 1; if (i == k) type = type + 2; if (j == k) type = type + 3; if (type == 0) { // Confirm it is a legal triangle before declaring it to be scalene. if ((i + j = j)) type = 4; else type = 1; return type; } // Confirm it is a legal triangle before declaring it to be isosceles or // equilateral. if (type > 3) type = 3; else if ((type == 1) && (i + j > k)) type = 2; else if ((type == 2) && (i + k > j)) type = 2; else if ((type == 3) && (j + k > i)) type = 2; else type = 4; return type; } Buggy1

Correct2 public static int classify(int a, int b, int c) { if (a <= 0 || b <= 0 || c <= 0) return 4; if (! (a > c - b && a > b - c && b > a - c)) return 4; if (a == b && b == c) return 3; if (a == b || b == c || a == c) return 2; return 1; }

Buggy2 public static int classify(int a, int b, int c) { if ((a <= 0) || (b <= 0) || (c <= 0)) return 4; if ((a <= c - b) || (a<= b - c) || (b<= a - c)) return 4; if (a == b && b == c) return 3; if (((a == b) && (b != c)) || ((a == c) && (a != b) && ((b == c) && (b != a)))) return 2; return 1; }

Which of Correct1 or Correct2 is the “best”? public static int classify(int i, int j, int k) { if ((i <= 0) || (j <= 0) || (k <= 0)) return 4; int type = 0; if (i == j) type = type + 1; if (i == k) type = type + 2; if (j == k) type = type + 3; if (type == 0) { if ((i + j <= k) || (j + k <= i) || (i + k <= j)) type = 4; else type = 1; return type; } if (type > 3) type = 3; else if ((type == 1) && (i + j > k)) type = 2; else if ((type == 2) && (i + k > j)) type = 2; else if ((type == 3) && (j + k > i)) type = 2; else type = 4; return type; } public static int classify(int a, int b, int c) { if (a <= 0 || b <= 0 || c <= 0) return 4; if (! (a > c - b && a > b - c && b > a - c)) return 4; if (a == b && b == c) return 3; if (a == b || b == c || a == c) return 2; return 1; }

463344(k+i>j)&&((k+j)>i)&&(j+i)>k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (k+i i)&&(j+i>k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (k+j k)&&(j!=k)&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) (j+i 0)&&(j>0)&&(i>0) (i<=0) 9900 (j 0) 9801 (k 0)&&(i>0) 7252 (k+j>i)&&(j==k))&&(i!=k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) 7252 (j+i>k)&&(i!=k)&&(i==j)&&(k>0)&&(j>0)&&(i>0) 7252 (k+i>j)&&(i==k)&&(i!=j)&&(k>0)&&(j>0)&&(i>0) 2450 (k+j 0)&&(j>0))&&(i>0) 2450 (j+i 0)&&(j>0)&&(i>0) 2450 (k+i 0)&&(j>0)&&(i>0) 99 (i==k)&&(i==j)&&(k>0)&&(j>0)&&(i>0) Correct1 All Partitions with sizes

463344(a!=c)&&(b!=c)&&a!=b&&b>a-c&&(a>(b-c)&&(a>(c-b)&&(c>0)&&(b>0)&&(a>0) b (b-c)))&&(a>(c-b)))&&(c>0))&&(b>0))&&(a>0) (a (c-b)))&&(c>0))&&(b>0))&&(a>0) (((a 0))&&(b>0))&&(a>0) (a<=0) 9900 (b 0) 9801 (c 0)&&(a>0) 7252 a==c&&(b!=c)&&(a!=b))&&b>a-c&&a>b-c&&a>c-b&&c>0&&b>0&&(a>0) 7252 b!=c&&a==b&&(b>(a-c)))&&(a>(b-c)))&&(a>(c-b)))&&(c>0))&&(b>0))&&(a>0) 7252 b==c&&a!=b&&(b>(a-c)))&&(a>(b-c)))&&(a>(c-b)))&&(c>0))&&(b>0))&&(a>0) 99 (i==k)&&(i==j)&&(k>0)&&(j>0)&&(i>0) Correct2 All Partitions with sizes Correct2 is “better” it has less paths dealing with the Illegal Triangle case

public static void check(int a, int b, int c) { int test = classifyBuggy1(a, b, c); int oracle = classifyCorrect1(a, b, c); assert (test == oracle); } Test harness Record path conditions when assertion fails and count their sizes

Counting size of failing partitions Correct vs Buggy1 2 failing paths: and Correct vs Buggy2 2 failing paths: 7252 and 7252 Buggy2 is closer than Buggy1

New Insight Mutations in programs can have bigger effect than one might think Notice how a small = change made the program incorrect on more inputs than a logical mistake Can we rank mutation operators according to how much of the behavior of a program they mutate?

Mutation Example boolean classify1(int i, int j : 0..99) { return i <= j; } boolean classify1(int i, int j) { return i >= j; } 1 boolean classify1(int i, int j) { return i < j; } 2 99% wrong1% wrong boolean classify1(int i, int j) { return i == j; } % wrong boolean classify1(int i, int j) { return true; } % wrong

Conclusions Model Counting can be used for many things in Program Analysis Creating a better form of test coverage that looks at input coverage and structural coverage We can rank buggy programs, but needs a bit more work to be perfect

The Green Framework Already integrated into Symbolic PathFinder

Resources ISSTA 2012 – Probabilistic Symbolic Execution FSE 2012 – Green: Reduce, Reuse and Recycle Constraints… ICSE 2013 – Software Reliability with Symbolic PathFinder PLDI 2014 – Compositional Solution Space Quantification for Probabilistic Software Analysis FSE 2014 – Statistical Symbolic Execution with Informed Sampling ASE 2014 – Exact and Approximate Probabilistic Symbolic Execution for Nondeterministic Programs Implemented in Symbolic PathFinder – Using LattE