Automatic Test Generation SymCrete Willem Visser Stellenbosch University Joint work with Corina Pasareanu and Neha Rungta from NASA Ames Research Center
How do we get there? How did we get here?
How do we obtain Statement Coverage? void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } How do we obtain Statement Coverage?
How do we obtain Statement Coverage? void test(int x, int y) { if (x > 0) { if (y == hash(x)) else S1; if (x > 3 && y > 10) S3; S4; } How do we obtain Statement Coverage?
might work if you are moderately lucky void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } int hash(x) { if (0<=x<=10) return x*10; else return 0; Random Inputs might work if you are moderately lucky But there is a better way! Where you don’t need to win the Lottery
Symbolic Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } int hash(x) { if (0<=x<=10) return x*10; else return 0; test(X,Y) [ X > 0 ] [ X > 0 ] hash (X) [ 0<X<=10 & Y=X*10 ] S0 [ X>10 & … ] … [ 0<X<=10 & Y!=X*10 ] S1 [ 3<X<=10 & 10<Y=X*10] S3 [ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4 [ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4 [ 3<X<=10 & 10<Y!=X*10] S3 [ X > 0 ] hash (X) [ 0<X<=10 ] ret X*10 [ X>10] ret 0
Symbolic Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } int hash(x) { if (0<=x<=10) return x*10; else return 0; test(X,Y) [ X > 0 ] [ X > 0 ] hash (X) Solve [ X>10 & … ] … X=1,Y=10 [ 0<X<=10 & Y=X*10 ] S0 Solve [ 0<X<=10 & Y!=X*10 ] S1 X=1,Y=0 Solve X=4,Y=11 [ 3<X<=10 & 10<Y=X*10] S3 [ 3<X<=10 & 10<Y!=X*10] S3 Solve X=1,Y=10 [ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4 [ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4 [ X > 0 ] hash (X) [ 0<X<=10 ] ret X*10 [ X>10] ret 0
Symbolic Execution is not the best thing since it has a few serious namely : It is inherently white-box Only as good as the decision procedures
Code is not available so no SE is possible void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x); OR int hash(x) { return x*x % 1023 Code is not available so no SE is possible Assuming we only have a linear integer arithmetic DP we cannot handle the non-linearity here
Concolic Execution or Directed Automated Random Testing (DART) Godefroid, Klarlund and Sen 2005 Novel combination of concrete and symbolic execution to overcome the two weaknesses of classic symbolic execution Executes program concretely, but collects the path condition, negates constraints on the PC after a run and executes again with the newly found solutions.
[ X>0 & Y!=40 & X>3 & Y<= 10] void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X>0 & Y!=10 & X>3] test(1,0) test(4,0) [ X > 0 ] [ X > 0 ] [ X > 0 & Y != 40 ] [ X>0 & Y!=40 & X>3 & Y<= 10] [ X > 0 & Y != 10 ] [ X>0 & Y!=10 & X<=3] Concolic Execution
Concolic Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X>0 & Y!=40 & X>3 & Y>10] [ X>0 & Y=40 & X>3 & Y>10] test(4,11) [ X > 0 ] [ X > 0 & Y != 40 ] [ X>0 & Y!=40 & X>3 & Y>10] test(4,40) [ X > 0 ] [ X > 0 & Y = 40 ] [ X>0 & Y=40 & X>3 & Y>10] Concolic Execution
Concolic Execution void test(int x, int y) { if (x > 0) { if (y == 40) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X>0 & Y=40 & X>3 & Y>10] test(4,40) [ X > 0 ] [ X > 0 & Y = 40 ] [ X>0 & Y=40 & X>3 & Y>10] Concolic Execution 13
Divergence! Concolic Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X>0 & Y=40 & X<=3 & Y>10] test(1,40) [ X > 0 ] Divergence! Aimed to get S0;S4 But reached S1;S4 [ X > 0 & Y != 10 ] [ X>0 & Y!=10 & X<=3 & Y>10] ASSERT not via S0 Concolic Execution
Symbolic Execution with Mixed Concrete-Symbolic Solving Pasareanu, Rungta, Visser 2011 Symbolic Execution that falls back onto concrete values when it doesn’t have access to the code or the decision procedures don’t work. SymCrete = Symbolic + Concrete vs Concolic = Concrete + Symbolic
Symbolic Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; test(X,Y) [ X > 0 ] [ X > 0 ] hash (X) [ X>10 & … ] … [ 0<X<=10 & Y=X*10 ] S0 [ 0<X<=10 & Y!=X*10 ] S1 [ 3<X<=10 & 10<Y=X*10] S3 [ 3<X<=10 & 10<Y!=X*10] S3 [ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4 [ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4 [ X > 0 ] hash (X) [ 0<X<=10 ] ret X*10 [ X>10] ret 0
Symbolic Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; test(X,Y) [ X > 0 ] [ X > 0 ] hash (X) SymCrete 3 Steps Split PC into two parts: Part you can solve Part you cannot solve Solve the easy part and evaluate the hard part with the solutions Replace the hard part with the evaluated results and check SAT
SymCrete Execution void test(int x, int y) { if (x > 0) { test(X,Y) void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X > 0 ] [ X > 0 ] hash (X) [ X>0 & Y=hash(X) ] S0 [ X>0 & Y!=hash(X) ] S1 easy hard X>0 & Y!=10 is SAT 1 X>0 Y=hash(X) 2 X=1 Y=hash(1)=10 3 X>0 & Y=10 is SAT
SymCrete Execution void test(int x, int y) { if (x > 0) { test(X,Y) void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X > 0 ] [ X > 0 ] hash (X) [ X>0 & Y=hash(X) ] S0 [ X>3 & Y=hash(X) & Y>10 ] S3 [ 3>=X>0 & Y=hash(X)] S4 3>=X>0 Y=hash(X) X=1 Y=hash(1) [3>=X>0 & Y=10 is SAT X>3 & Y>10 Y=hash(X) X=4 & Y=11 Y=hash(4) [X>3 & Y=40 & Y>10 is SAT
SymCrete Execution void test(int x, int y) { if (x > 0) { test(X,Y) void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X > 0 ] [ X > 0 ] hash (X) x=1,y=10 [ X>0 & Y=hash(X) ] S0 x=1,y=0 [ X>0 & Y!=hash(X) ] S1 [ X>3 & Y=hash(X) & Y>10 ] S3 [ X>3 & Y!=hash(X) & Y>10 ] S3 x=4,y=40 x=4,y=11 [ 3>=X>0 & Y=hash(X)] S4 [ 3>=X>0 & Y!=hash(X)] S4 x=1,y=10 x=1,y=0
The Risk of Unsoundness test (int x, int y) { if (x>=0 && x>y && y == x*x) S0; else S1; } Not Reachable [ X>=0 & X > Y & Y = X*X ] S0 Must add constraints on the solutions from Step 2 in Step 3 X>=0 & X>Y Y = X*X X=0, Y=-1 Y=0*0=0 X>=0 & X>Y & Y=0 & X=0 NOT SAT X>=0 & X>Y & Y=0 Is SAT which implies S0 is Reachable Concolic will diverge instead
3 More Enhancements Incremental Solving User Annotations Random Solving
After Negation Concolic is Stuck Problem for Concolic [ X>0 & Y!=10 & Y>10] void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; test(1,11) test(1,0) [ X > 0 ] [ X > 0 ] [ X > 0 & Y != 10 ] [ X > 0 & Y != 10 ] [ X>0 & Y!=10 & Y>10] [ X>0 & Y!=10 & Y<=10] After Negation Concolic is Stuck [ X>0 & Y=10 & Y>10]
SymCrete Execution void test(int x, int y) { if (x > 0) { test(X,Y) void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X > 0 ] [ X > 0 ] hash (X) [ X>0 & Y=hash(X) ] S0 [ X>0 & Y=hash(X) & Y>10 ] S3 X>0 & Y>10 Y=hash(X) X=1 Y=hash(1) =10 X>0 & Y>10 & Y=10 & X=1 UNSAT Get another solution! X=2 Y=hash(2) =20 X>0 & Y>10 & Y=20 & X=2 is SAT
SymCrete Execution @Partition({“x>3”,”x<=3”}) test(X,Y) @Partition({“x>3”,”x<=3”}) void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (y > 10) S3; S4; } native int hash(x) { if (0<=x<=10) return x*10; else return 0; [ X > 0 ] [ X > 0 ] hash (X) [ X>0 & Y=hash(X) ] S0 [ X>0 & Y=hash(X) & Y>10 ] S3 X>0 & Y>10 Y=hash(X) X=1 Y=hash(1) =10 X>0 & Y>10 & Y=10 & X=1 UNSAT Add user partitions one at a time X>0 & Y>10 & X > 3 Y=hash(X) X=4 Y=hash(4) =40 X>3 & Y>10 & Y=40 & X=4 is SAT
- Not all solvers support the general feature - Random Solving Pick solutions randomly from the solution space Current implementation only picks randomly if the solution space is completely unconstrained - Not all solvers support the general feature -
Symbolic PathFinder SPF Implementation Symcrete Custom Listeners on SPF Symbolic PathFinder SPF Symbolic Execution extension for JPF called jpf-symbc Model Checker for Java Open Source http://babelfish.arc.nasa.gov/trac/jpf JavaPathFinder
Conclusions Symbolic driven use of concrete values to address problems with classic symbolic execution In the process addresses issues with DART/Concolic analysis But is still incomparable in strength (see next slide) Open source implementation for Java
DART/Concolic gets “Lucky” public void test (boolean b, int x, int y) { if (b) if(y <= 0) { ... } else else if(x <= 0 && identity(y) == 1) { HERE! } }
DART/Concolic gets “Lucky” public void test (boolean b, int x, int y) { if (b) if(y <= 0) { ... } else else if(x <= 0 && identity(y) == 1) { HERE! } } b=true,x=0,y=0 b,y<=0 Negating last constraint y <= 0 b=true,x=0,y=1 Now b branches done, so negate b b=false,x=0,y=1