Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Intelligent automatic test pattern generation for C-based HW/SW co-design descriptions through combined use of concrete and symbolic simulations Masahiro Fujita　　　Yoshihisa Kojima University of Tokyo May 2, 2008

Background In high-level SoC design, system behavior can be described in C-like programming languages Target both hardware and software Tool support is not sufficient Difficulties compared with RTL or lower design descriptions Many wide-bit word-level signals (large exploration space) Complicated control flow (many paths) Difficulty in modeling various descriptions SW: pointers, pointer-arithmetic, casting, dynamic allocation, recursive calls… HW: concurrency, synchronization, throughput, latency… Our goal is to assist test case generation for system-level descriptions in C-like languages Automatic input pattern generation Assertion-based verification to find bugs For higher code coverage that results in higher confidence

Most important issues in debugging
Generally speaking, counter examples generated by simulation/emulation are very “long” Could be billions of cycles Not east at all to understand why error occurs Need much shorter counter examples just to understand why the bug happens Are those long sequences really necessary ? Bounded model checking is based on assertions with “constraints” Bounds cannot be large Can we drive good constraints from the counter examples found in simulation/emulation ? There can be more direct path Initial state Bug Bug Initial state Loops can be skipped State space State space

Target language SpecC = ANSI-C + mechanisms for HW
Structural hierarchy Parallelism Synchronization Channel Languages discussed here C language Some additional features Behavior Ports Interfaces Channel b1 b2 v1 c1 B p1 p2 Variable (wire) Child behaviors

Outline Background Problem definitions for input pattern generation
Preliminaries branch / path / coverage definitions Concrete/symbolic hybrid simulation Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage Implementation Experimental Results Conclusion and Future work

Requirements for input pattern generation (1)
For assertion failure detection Given a design description annotated with Input variable definitions Assumption for input variables as predicates Assertion predicates Possible result Assertion violation (and input value assignments), Assertion holds for all possible input values, Unknown int func(int x, int y) { int r = 0; if (x – y > 0) r = x - y; else r = y – x; return r; } Assertion failure Counter examples exist: (x = 0, y = 0) (x = 3, y = 3) ... int x, y; FL_INPUT(x); FL_INPUT(y); FL_ASSUME(x >= 0); FL_ASSUME(y >= 0); FL_ASSERT(func(x, y) > 0);

Requirements for input pattern generation (2)
For branch coverage: Given design description with annotations and target branch coverage Generate set of test cases (input value assignments) to cover branches Tell how to activate code fragments as many as possible (over multiple runs) int x, y; FL_INPUT(x); FL_INPUT(y); if (x > 2) { } if (y > 2) { Test cases of (1) (x = 0, y = 0) (2) (x = 3, y = 3) will achieve 100% branch coverage

Branch / path definitions
A (pair of) conditional branch(es): Associated with if, do-while, for, switch-case, and while statements A branch is covered when the associated condition has been evaluated as true (or false) at least once (over multiple runs) if (cond) else then BC = cond BC = ! cond

Branch / path definitions
A path is a sequence of branches taken A path condition is defined as the conjunction of all the branch conditions taken A false (infeasible) path is a path such that there is no value assignment which satisfies the path condition 1: void func(int x, int y) { 2: if (x > 2) { 3: } 4: if (x < 2) { 5: } 6: } 1: void func(int x, int y) { 2: if (x > 2) { 3: } else { 4: } 5: if (y > 2) { 6: } else { 7: } 8: } There appear to be 4 paths; But the path condition is (x > 2) AND (x < 2) INFEASIBLE! There are 4 paths; The path condition is (x > 2) AND NOT(y > 2)

Branch / path coverage definitions
Branch coverage # of branches covered out of # of all branches Path coverage # of paths covered out of # of all (or feasible) paths Difficult to use in practice because: The number of feasible paths cannot be known so easily The number of possible paths can be huge Exponential w.r.t. # of if-statements * loop iterations if Exercised 2 runs: branch coverage: 4 / (2 + 2) (100%) path coverage: 2 / (2 * 2) (50%) if

Traditional (concrete) simulation approach
Create test cases (input values) by hand Not so easy Or, generate randomly Automated, but maybe difficult to activate the corner cases In system level descriptions, the search space can be huge (e.g. 32-bit word level signals) Run simulation Very simple, but how long does it take to hit the failure? Incomplete: cannot prove the assertion ALWAYS holds unless all possible values have been exercised (not practically possible) Confidence (quality of tests): given by coverage metrics E.g. Branch-coverage Try (x=3, y=100) => r=97 > 0 OK Try (x=1, y=20) => r=19 > 0 OK ... Try (x=10, y=10) => r=0 > 0 NG! (may eventually happen, but much rarely)

Formal approach Word-level approach: Symbolic simulation
Build the formal expressions and mathematically solve the constraints Precise & Complete Computationally expensive Word-level approach: Symbolic simulation Evaluates values as symbolic expressions instead of concrete values

Symbolic Simulation Needs to enumerate all the paths
Sometimes the path can be infeasible (false-path problem) path-condition Path1: (r_1=0) (x – y > 0) (r_2=x - y) (x>=0) (y>=0) -> (r_2>0) Path2: (r_1=0) NOT(x – y > 0) (r_2=y -x) (x>=0) (y>=0) -> (r_2>0) int func(int x, int y) { int r = 0; if (x – y > 0) r = x - y; else r = y – x; return r; } Path1 path2 Enumerates possible paths (including infeasible ones) INVALID Counter Example: (y - x=0) (some of them may be reported) VALID for all x,y

Symbolic simulation (cont’d)
Employs SMT (satisfiability modulo theory) solver To solve path conditions To evaluate assertions For each path: One symbolic simulation on a path corresponds to concrete simulations of all possible values on that path Limitations: # of paths (including false paths) Size of symbolic expressions Solver capability (non-linear algebra) How to model complicated descriptions May not be applied straightforwardly to complex / large descriptions

Concrete-symbolic hybrid approach
Combines concrete simulation and symbolic simulation (originally proposed by Larson[5]) CUTE[11] is proposed for unit testing Exhaustive traversal on all paths Concrete run guides the path for symbolic simulation (initially random simulation) Symbolic run on that path derives the path-condition Use concrete values for approximation if the constraints cannot be processed (e.g. non-linear) Solve the constraints to guide the path to another Negate some path-condition term to take another branch

Concolic Simulation (1st)
initially random Concrete States x=0 y=0 z=0 (0 > 3)? -> no! Symbolic States x=i1 y=i2 z=i3 (i1 > 3)? Path Condition (i1 <= 3) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Negate this condition And solve to take THEN branch at B1 Find the inputs to reach reach_me()

Concolic Simulation (2nd)
Concrete States x=10 y=0 z=0 (10 > 3) (0 > 11)? -> no! Symbolic States x=i1 y=i2 z=i3 (x > 3) (y <= 11) Path Condition (i1 > 3) (i2 <= 11) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Negate this condition And solve to take THEN branch at B2 Find the inputs to reach reach_me()

Concolic Simulation (3rd)
Concrete States x=10 y=20 z=0 (10 > 3) (20 > 11) (0 == 400)? -> no! Symbolic States x=i1 y=i2 z=i3 (x > 3) (y > 11) (z == y*y) Path Condition (i1 > 3) (i2 > 11) (i3 != 400) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Non-linear i2*i2 is replaced by 400. Negate this condition And solve to take THEN branch at B3 Find the inputs to reach reach_me()

Concolic Simulation (4th)
Concrete States x=10 y=20 z=400 (10 > 3) (20 > 11) (400 == 400) (10 < 5)? -> no! Symbolic States x=i1 y=i2 z=i3 (x > 3) (y > 11) (z == 400) (x >= 5) Path Condition (i1 > 3) (i2 > 11) (i3 == 400) (i1 >= 5) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Find the inputs to reach reach_me() Negate this condition And solve to take THEN branch at B4

Concolic Simulation (5th)
Concrete States x=4 y=20 z=400 (4 > 3) (20 > 11) (400 == 400) (4 < 5) Symbolic States x=i1 y=i2 z=i3 (x > 3) (y > 11) (z == 400) (x < 5) Path Condition (i1 > 3) (i2 > 11) (i3 == 400) (i1 < 5) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Find the inputs to reach reach_me() Reached successfully!

Concolic approach Can be applied to work-around non-linear
Can be used to enumerate the paths Good for path coverage Can be used to guide the path But CUTE does not think about which path should be tried next As CUTE’s strategy is exhaustive May not terminate if # of paths is huge

Proposed method Flip a branch condition on a path only when not covered yet Gives the priority for path enumeration Skips the uncovered paths that do not contribute to the branch coverage Terminates when the target coverage is achieved Tries to avoid enumerating all the paths Not guaranteed to cover all possible branches Derived alternative paths may not be feasible Worst case: all paths need to be enumerated Also limited by the solver’s capability (i.e. path condition may not be solved)

Our implementation Implemented on FLEC (our C-Equivalence Checker)
Used as SpecC[3] frontend Control/data/communication/… dependencies have been extracted AST interpreter Evaluates AST node (expression / statement) one by one C.f. CUTE: instrument & compile We can start from any points in the program ! Concrete simulator evaluates with concrete values Symbolic simulator evaluates with symbolic expressions Branch/Path coverage profiler Input pattern generator For alternative path For assertion failure SMT solver: CVC3[12] To generate input patterns To evaluate assertions C.f. CUTE: lpsolve

Experimental results (1/3)
Simple example Achieved 2 / 2 (100%) branch coverage with 2 runs Detected assertion failure with (x=0, y=0) 1: int func(int x, int y) { 2: int r = 0; 3: if (x – y > 0) 4: r = x – y; 5: else 6: r = y – x; 7: return r; 8: } 9: void main() { 10: int x, y; 11: FL_INPUT(x); 12: FL_INPUT(y); 13: FL_ASSUME(x >= 0); 14: FL_ASSUME(y >= 0); 15: FL_ASSERT(func(x, y) > 0); 16: }

1: unsigned int fact_rec(unsigned int s) { 2: if ( s <= 1) { 3: return 1; 4: } else { 5: unsigned int t; 6: unsigned int p; 7: t = s * fact_rec(s – 1); 8: return t; 9: } 10: unsigned int fact_for(unsigned int s) { 11: unsigned int i; 12: unsigned int p; 13: p = 1; 14: for (i = 1; i <= s; i++) { 15: p *= I; 16: } 17: return p; 18: } 19: void main() { 20: int i, o1, o2; 21: FL_INPUT(i); 22: FL_ASSUME(i <= 10); 23: o1 = fact_for(i); 24: o2 = fact_rec(i); 25: FL_ASSERT(o1 == o2); 26: } Calculate factorial with two implementations With recursive function calls With for-loop Validated for one path (i = 8) Achieved 4/4 (100%) branch coverage with 1 run

# of branches: 10 # of paths: 4 * 2^100 Achieved 10 / 10 (100%) branch coverage with 5 runs Detected assertion failure with (x=1, y=2, z=3) CUTE got stuck due to too many paths 1: int f(int x,int y, int z) { 2: int p; 3: if (x+y+z == 6) 4: if (2*x+7*y+3*z==25) 5: if(-4*x-2*y+2*z==-2) 6: FL_ASSERT(0); 7: for (p = 0; p < 100; p++) { 8: if (p == z) { 9: } 10: } 11: } 12: void main() { 13: int x, y, z; 14: FL_INPUT(x); 15: FL_INPUT(y); 16: FL_INPUT(z); 17: f(x, y, z); 18: }

Elevator controller profile
Elevator controller (abstracted model) Cycle-based behavior Simple, but designed by real engineer There is a not-intended bug Inputs: 3 Floors Up request buttons on 1F and 2F Down request buttons on 2F and 3F 1 Cabin 3 buttons for floor stop request 2 buttons for door open / close Outputs: Up, Down request status Floor stop request status Door open/close Cabin vertical speed (0: stopped, +1: up, -1: down) Cabin position (on 1F, b/w 1F and 2F, on 2F, b/w 2F and 3F, on 3F) Service direction (0: none, +1: up, -1: down) 3F 1F 2F 3F 2F open close 1F

Elevator controller profile (cont’d)
State variables: Up/Down request status (2+2) Floor stop request status (3) Door status (1) Cabin position (on 1F, b/w 1F and 2F, on 2F, b/w 2F and 3F, on 3F) Cabin speed (0: stopped, +1: up, -1: down) Service direction (0: none, +1: up, -1: down) 2^8 * 5 * 3 * 3 = 11.5k states (including infeasible ones) Initially stopped on 1F, door closed, no request active Original code: 396 lines in SpecC 145 million paths (including infeasible) Replaced if-then-else & switch-case statements with conditional (cond ? True : false) expressions To handle multiple paths at once Simple control flow (straight line), but very complex data flow Reduced to 155 lines

Elevator controller profile (cont’d)
Property examples Elevator must be on or between 1F and 3F ASSERT((out_position >= 0) && (out_position <= 4)); Door opens only when the elevator is stopped on either of 1F, 2F and 3F ASSERT (!out_door || ( (out_speed == 0) && ( (out_position == 0) || (out_position ==2) || (out_position == 4))))

Symbolic simulation result
Symbolic expression explodes in 3-4 cycles of symbolic simulation With constant propagation/substitution With simplifications for ITE, AND, OR, and other operators Without concrete-value substitution (approximation) Without common sub-expression sharing # of cycles of symbolic simulation must be highly bounded! Beginning of Symbolic simulation 300k nodes and more! Reset sequence

User guided simulation
Starts symbolic simulation from the specified state by the user Explore with respect to the states of user’s interest Some of the states (proved to be) reachable by concrete (random) simulation Jump into the states (which may or may not be feasible) Will need to check its feasibility later Cycle is bounded Concrete simulation Symbolic simulation Symbolic simulation State space Might be infeasible Initial states Paths unknown

User guided result (1) Try to generate the input pattern to make a situation where Located on 2F Speed = -1 (down) (not a bug) I.e. to violate ASSERT (!((out_speed == -1) && (out_position == 2))) This state is out of bound from the initial state (stopped on 1F) Need more than 3 cycles for elevator to accept request on 1F, start moving, go up at least to 2F, and go down…

User guided result (1) (cont’d)
So let’s jump in to one of the feasible state state_position = 4, state_door = false, state_speed = 0 … Known as a reachable state by random simulation a priori Found one of the input pattern to violate the cycle 5 (3rd cycle of symbolic sim.) Up request on cycle 1 = true Up request on cycle 1 = false Down request on cycle 1 = false Stop on 1F cycle 1 = false Stop on 2F cycle 1 = false

User guided result (2) Try to violate the assertion
Elevator must be on or between 1F and 3F ASSERT((out_position >= 0) && (out_position <= 4)); Let’s jump into one of the state state_position = 4 (on 3F) state_speed = +1 (up) next state goes into out_position = 5 (higher than 3F!) And violates the assertion! However, the state (state_position = 4, state_speed = +1) is actually infeasible Wrong assumption may lead a wrong conclusion The feasibility of the originating state should be verified in some way

Conclusion & Future work
Implemented concrete/symbolic hybrid simulator based on AST interpreter Proposed a method for input pattern generation for branch coverage Experimental results demonstrate the input pattern generation For assertion failure detection For better branch coverage Future work Capability to cover the specified target branch Handling of concurrent executions Hybrid simulation heuristic tuning Efficient management of symbolic expressions

References [3] D. D. Gajski, J. Zhu, R. Domer, A. Gerstlauer, and S. Zhao. SpecC: Specification Language and Methodology. Kluwer Academic Publishers, 2000. [5] E. Larson and T. Austin. High coverage detection of input-related security facults. In SSYM’03: Proc of 12th conf on USENIX Security Symbosium, 2003. [11] K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for c. In Proc. Of Esec/SIGSOFT FSE-13, 2005. [12] A. Stump, C. Barrett, and D. Dill. CVC: a cooperating validity checker. In 14th int’l conf on computer-aided verification, 2002

Difficulty compared with RTL or lower
In traditional methodology for RTL or gate-level Word signals are converted into bit-vector Then, solved with Boolean algebra Efficient algorithms available: SAT, BDDs… In system-level descriptions Too many word signals, too wide words (32 bit / 64 bit) Too wide space to explore Complicated control-flow Data-flow dynamically changes depending on the path Control-conditions are complex Too many paths

Difficulty compared with RTL or lower (cont’d)
In system-level descriptions To model software Recursive calls, pointers, pointer-arithmetic, type-casting, dynamic-allocations… To model hardware Concurrency, synchronization, throughput, latency… As word-level solvers, SMT solvers can be employed, but with limited capability Usually up to linear algebra Need approximation / workaround, otherwise it would not work!

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008

Similar presentations

Presentation on theme: "Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008

Similar presentations

Presentation on theme: "Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008"— Presentation transcript:

Similar presentations

About project

Feedback