COMPI: Concolic Testing for MPI Applications

COMPI: Concolic Testing for MPI Applications
Hongbo Li , Sihuan Li, Zachary Benavides, Zizhong Chen, and Rajiv Gupta May 24th, 2018

Testing in Industry Software bugs can be VERY costly
In 1998, the crash of NASA’s Mars Climate Orbiter costs $125 millions In 2004, a software bug in the child support agency IT system in UK costs over $1 billion Testing is widely used in industry to ensure code quality Because of such potential devastating consequence,

HPC Also Needs Testing The study of practical systematic testing techniques is scarce in the field of HPC HPC applications drives scientific discovery and technological innovation The study of testing is a must in our field However, … As we all know, … Such a great mission is in need of high-quality software. .. With this paper, we made a small step of introducing practical software testing techniques to our community.

Outline Concolic Testing Challenges & Solutions Evaluation
This is the outline of my talk. I will first introduce how we test a SEQUENTIAL program based on concolic testing. Following that, I will present the challenges of applying concolic testing to MPI programs and also I present the corresponding solutions. In the end is the evaluation of COMPI. 00:45

Let’s check how concolic testing works.

Concolic Testing Concolic testing automatically generates inputs for a given program so as to drive the testing. Its goal is to uncover as many branches as possible Here I will show how concolic testing works with a sequential program The program is given here, it consists of four branch conditions in total, and variable x and y read input values from users.

Concolic Testing Suppose, at the beginning random inputs are generated, which are x = 10 and y = 50.

Concolic Testing With these inputs, we execute the program. The execution helps us uncover 2 branches, 0T and 1F, with the branch coverage being 50%. Additionally, we record a set of constraints, which are x != 100 and x/2 + y <= 200.

Concolic Testing In order to generate a new set of inputs to uncover a new branch, we supposedly select the first constraint x != 100 and negate it, which gives us x=100.

Concolic Testing Hence, we have the new constraint set.

Concolic Testing By solving it, we get new inputs that could uncover a new branch, i.e., 0F.

Concolic Testing After the execution, we uncover the new branch 0F with the branch coverage being increased to 75%. Most importantly, we manifest a bug located in this branch. This is a very good example showing why higher branch coverage could potentially expose bugs.

Concolic Testing So on and so forth, we negate the second constraint, solve the constraints and get new inputs. With the new inputs, we uncovered the last branch. Ideally, we get 100% branch coverage.

Then let’s check what are the challenges of applying the idea to MPI programs as well as our solutions. 06:00

Challenge (1) Fail to tackle important MPI semantics
Multi-process execution Branch coverage using ONLY one process is not enough! How many processes should be used? MPI rank Which process should be the FOCUS process that is used for input generation (concolic testing)? The first challenge is standard concolic testing fails to identify following MPI semantics: (1) MPI programs run with multiple processes instead of just one; and (2) each process has an MPI rank. Failure of identifying multi-process execution causes two problems: (1) .. as branches not encountered in one process can possibly be executed by other processes; (2) … Failure of … fails to answer one question that is …

Challenge (1) Fail to tackle important MPI semantics
Multiple processes Branch coverage using ONLY one process is not enough! How many processes should be used? MPI rank Which process should be the FOCUS process that is used for input generation (concolic testing)? Next, I am gonna give an example showing the two highlighted issue. For simplicity, we ignore The second as it is similar to the third.

Concolic testing with ONLY process 0
Executed by process 0 Executed process i (𝑖≠0) Here we show a very typical MPI program skeleton. In this program, there is a branch conditional statement that checks if MPI rank is 0 or not, which impacts the execution of four branch conditions …. Suppose we only perform standard concolic testing with process 0, i.e., input generation is based on process 0. During the testing, blue, red , orange because input generation logic is only in use by process 0. This testing will fail to (1) … and also fail to (2) Not executed Concolic testing with ONLY process 0 Fail to record branches 3F & 4T Fail to uncover branch 4F

Solution (1) COMPI’s Framework
Record branch coverage based on ALL processes Dynamically vary the number of processes Dynamically vary the focus

Concolic testing USING our Framework
With our framework, we identify variable rank as it denotes MPI rank and thus we can exploit the region enclosed by the blue rectangle below, i.e., the else branch. More specifically, we can uncover both 3F and 4T as we record coverage across all processes Also, we can uncover 4F because we can switch the focus process. This is because we identify the variable denoting MPI rank and we can make the rank equal a value other than 0. 10:30 Concolic testing USING our Framework Help uncover: 3F & 4T Help uncover branch 4F

Challenge (2) Too high testing cost hinders COMPI’s practicality
Too large input value Require long execution time Break testing platform’s memory limit Crash a computer when too many processes are started The other challenge is without careful design too high testing cost hinders COMPI’s practicality, which could result from three sources First, too large input value can fail the testing for two reasons: (a) too large a problem size might exceed the testing platform's memory limit (b) A too large input value that determines the number of processes can crash the platform, which occurred once in our preliminary tool

Hence, matrix sizes like 100 or 200 are far better than 1000.
Execution time and coverage for HPL using different matrix sizes. Therefore, we provide a solution to avoid too large values without sacrificing coverage. Intuition… Take HPL for e.g… HPL is a widely used benchmark whose execution time is mainly determined by the matrix size. We measure the execution time and the recorded coverage at different matrix sizes. As we can see, the coverage almost stays the same even if we increases the matrix size, i.e., input value. the time increases about 45 times. Hence, matrix sizes like 100 or 200 are far better than 1000. Solution: input capping --- set an upper bound for input variables that dominate a program’s execution time

Too large input value Heavy instrumentation

Two-way instrumentation incurs less I/O.
One-way instrumentation launch all processes including non-focus processes with the same heavily instrumented program Solution: two-way instrumentation launch only the focus process with the heavily instrumented program and launch non-focus processes with lightly instrumented program Instrumentation is very important for concolic testing. At runtime, it collects symbolic execution information such as input values, variables, and symbolic constraints. At the end of execution, it write to file that would be read by COMPI to analyze for input generation. A lazy yet easy way to realize our framework is to use one-way, but it is no efficient as non-focus processes perform unnecessary symbolic execution logic and output unnecessary data though only branch coverage is needed. In this way, we avoid redundant execution logic as well as redundant I/O for non-focus processes.

Too large inputs Heavy instrumentation Redundant constraints in loops

{𝑥 | 𝑥 + 𝑖 < 100 and 0<𝑖 < 100} ⊂ 𝑥 𝑥<100}
Constraints reduction. Redundant constraints can be generated from the same branch conditional statement in a loop. Testing on them doesn’t boost coverage and incurs significant time cost. Here we show an loop skeleton supposing the value of x is 0 before entering the loop. After the completion of executing this loop, we end up having 101 constraints. However, the first 100 constraints are reducible as shown here. That is the first would be satisfied once any of the first 100 constraints is satisfied. Hence, our solution is … Solution: constraints reduction --- only record a constraint (a) at the first time a branch is encountered or (b) the branch’s evaluated Boolean value changes

Solution Summary Concolic testing framework targeting MPI programs
Controlling testing cost Input capping Two-way instrumentation Constraints reduction 18:00

Regarding systematic testing, we … 01:45

Evaluation Setting Hardware platform Programs
One single computer with two intel E5607 CPUs totaling 8 cores and 32 GB DRAM Programs Programs Lines of code # Reachable branches Selected variable SUSY-HMC 19,201 2,030 Lattice size HPL 15,699 3,754 Matrix width IMB-MPI1 7,092 1,290 # iterations IMB-MPI1 is from Intel and it is used for benchmarking MPI-1 functions Denoted as 𝑁

2 or 4 Processes fail the test! 1 or 3 processes succeed!
Evaluation – Bugs 2 or 4 Processes fail the test! 1 or 3 processes succeed! First and most importantly, we find four bugs in SUSY-HMC that are confirmed by the developers. Also, it should be noted that one of the bug only manifest when we run 2 or 4 processes, but it doesn’t when we do 1 or 3. This shows the importance of why we need to dynamically vary the number of processes.

Evaluation – Controlling Testing Cost
Input capping forms the basis of practical testing Over here, for each applications we observe the time cost and the coverage by giving different cap values that are the upper bounds for the input variable N. As we can see that the coverages are almost the same, but the time cost increases significantly when the cap values increases. The observation applies to all applications here. Considering what will happen if we don’t have such technique. Testing would just be impossible.

Two-way instrumentation saves up to 66% testing time cost One-way v.s. Two-way We run each program by fixing N to different values and measure the time cost as well as the non-focus processes’ logs’ size.

With constraints reduction COMPI achieves % more branch coverage than without using it Each figure shows the constraint set size frequency of COMPI and two COMPI variations that don’t use this method. Obviously, our method makes the constraint set size is at most a few hundreds, which is far smaller than the constraint set size that can be up to a few thousands or even a few millions. High Reduction Efficiency: A few thousands or even millions to a few hundreds

Evaluation – COMPI Framework
COMPI (Fwk) No_Fwk: concolic testing without COMPI’s framework Random: random input values generated for each test Effectiveness of COMPI’s framework.

Thank you!

COMPI: Concolic Testing for MPI Applications

Similar presentations

Presentation on theme: "COMPI: Concolic Testing for MPI Applications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COMPI: Concolic Testing for MPI Applications

Similar presentations

Presentation on theme: "COMPI: Concolic Testing for MPI Applications"— Presentation transcript:

Similar presentations

About project

Feedback