Maple: A Coverage-Driven Testing Tool for Multithreaded Programs

Maple: A Coverage-Driven Testing Tool for Multithreaded Programs
Chunkun Bo Haoyu Chen Ke Dou Lin Gong Hello everyone! Today is our Chinese New Year, since the four members of us are all from China, we’d like to you wish you a happy Chinese new year. Then I should come back to class, today we will talk about the Maple, A Coverage-Driven Testing Tool for Multithreaded Programs.

Problem being addressed
How to test multithread program. Similar to the papers we have discussed, this paper still focuses on how to test a multithread program. But in this paper, it aims to find a coverage-driven testing tool.

Background Why is it hard to find the concurrency bugs?
Difficult to expose the interleavings that can trigger a concurrency bug, especially for the rare ones. Fristly, it is quite Difficult to expose the interleavings that can trigger a concurrency bug, especially for the rare ones.

Background How to expose these concurrency bugs? Stress testing.
Execute the program again and again. Alternative: systematic testing, thread scheduler systematically explore all legal thread interleaving. Active testing. Use bug detectors and active scheduler to predict buggy interleavings and testify these predicted interleavings. Shortcoming: target specific bug types, e.g. data races. So how to expose these concurrency bugs? One is stress testing. It executes the program again and again, hope to find the concurrency bug. I has an alternative, systematic testing, thread scheduler systematically explore all legal thread interleaving. The other method is Active testing. Use bug detectors and active scheduler to predict buggy interleavings and testify these predicted buggy thread interleavings. Shortcoming: target specific bug types, e.g. data races. In this paper, they propose a tool called Maple that employs a coverage-driven approach for testing multithreaded programs. An interleaving coverage-driven approach has the potential to find different types of concurrency bugs, and also provide a metric for the programmers to understand the quality of their tests.

Background Two Hypothesis Small scope hypothesis.
Most concurrency bugs can be exposed using a small number of preemptions. Value-independence hypothesis. A majority of concurrency bugs get triggered irrespective of the data values. In this paper, they define a set of interleaving idioms. These idioms are based on two hypothesis. One is Small scope hypothesis. Most concurrency bugs can be exposed using a small number of preemptions. The other is Value-independence hypothesis. A majority of concurrency bugs get triggered irrespective of the data values.

Background Two usage models:
Test the program with an existed input, find the interleavings that were not tested in the past. 2. Expose the buggy interleaving, when accidentally find a bug. In this scenario,Maple will help the programmer actively expose thread interleavings that were not tested in the past. Another usage scenario is when a programmer accidentally exposed a bug for some input, but is unable to reproduce the failed execution. A programmer could use Maple with the bug triggering input to quickly expose the buggy interleaving.

Background Interleaving idiom
A pattern of inter-thread dependencies and the associated memory operations. Two requests: generic and small coverage domain IRoot A dynamic instance of an idiom in a program’s execution. An inter-thread memory dependency (denoted using ⇒ ) is an immediate (read-write or write-write) dependency between two memory accesses in two threads. A memory access could be either to a data or a synchronization variable.

Background Six Idioms. 1st is the simple idiom and the other five are compound idioms. They can represent a majority of concurrency bugs: atomicity violations, including both single variable (idiom1, idiom2,idiom3) and multi-variable (idiom4, idiom5) typical deadlock bugs (idiom5) generic order related concurrency bugs (idiom1, idiom6).

Background Compound idioms have two constraints:
First, number of instructions between two events in the same thread should be less than a threshold (vw). Second, if atomicity of two memory accesses in a thread T to V is violated by accesses in another thread, disallow accesses to V between two accesses in the thread T . For example, in idiom3 we do not allow any access to the variable X between the two memory accesses AX and DX , but there could be accesses to X between BX and CX .

Background What’s the relation between iRoot and Concurrency Bug?
The iRoot of a concurrency bug provides the minimum set of inter-thread dependencies and the associated memory or synchronization accesses, which if satisfied, can trigger that bug in an execution.

Background An idiom1 concurrency bug.
Figure 3 shows an example of a concurrency bug. The bug is triggered whenever the inter-thread dependency I2 ⇒ I5 is satisfied in an execution. Therefore, this is an idiom1 bug and its iRoot is I2 ⇒ I5 . Note that there exists an inter-thread dependency I1 ⇒ I4 that must also be satisfied before the iRoot I2 ⇒ I5 can be exposed. This dependency affects the control flow of the thread T 2 and determines whether I5 is executed or not.We refer to such conditions which must be satisfied in order to satisfy the idiom conditions as pre-conditions .

Background Real concurrency bug.
Figure 4 shows a real concurrency bug in MySQL and its idiom. In this example, The bug will be exposed when the critical sections in Thread-1 are intercepted with the critical section in Thread-2. The iRoot for this bug is of type idiom4 consisting of the two inter-thread dependencies between the lock and unlock operations. This example conveys an important observation that even if a concurrency bug is fairly complex involving many different variables and inter-thread dependencies, the iRoot of that bug (minimum set of interleaving conditions that need to be satisfied to trigger that bug) could be quite simple. Thus, by testing iRoots for a small set of idioms, we can hope to expose a significant fraction of concurrency bugs.

Background Empirical analysis
Study 17 real world concurrency bug and find only one cannot be found.

Background Using Memoization
An optimization technique used to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs. If an iRoot has been already exposed in an earlier execution for some input, Maple will not seek to expose the same iRoot again.

Challenges Expose the rare interleaving that can trigger a concurrency bug. Define the idioms. Build the tools.

Goals Build a coverage-driven testing tool for multithread program.

Take Home Ideas Develop practical tool: Maple
Apply set of interleaving idioms into practice Develop advanced algorithms based on naive approachs Evaluate Maple with other tools and prove its efficiency and effectiveness

Methodology

Definition Ax represents a dynamic memory access.
A: Static Instruction. x : variable E.g. For “b = 3;”, A is “b=3;”, x could be “b”

Candidate Interleavings (iRoots)
Framework Overview Test Input Profiler Idioms Candidate Interleavings (iRoots) Test? iRoots to Be Tested Failed To Test iRoot Tested iRoots Active Scheduler Tested Interleaving Bug Exposed Tested iRoots Reference:

Naïve Approach

Infeasible iRoots --- From Non-mutex happens-before
Main Thread Child Thread Iine1 x = 1 fork(child) Iine2 tmp = x Iine2 line1 Reference:

Solution: Vector Lock(VC):
VC(Ax)={Ax->Bx} VC(Bx)={Ax->Bx} When they check VC of Ax and Bx, we know Bx->Ax is impossible!

Infeasible iRoots --- From Mutual exclusion
Thread 1 Thread 2 lock(m) line1 x = 1 x = 2 line2 unlock(m) lock(m) Iine3 tmp = x unlock(m) line1 line3 line3 line2 Reference:

Solution: Annotated Lockset
AnnoLS(Cx)={Ax->Cx,Lock(m)} AnnoLS(Bx)={Bx->Dx,Lock(m)} They are jointed, but meet the last and first demand!

How about compound Idioms?
Reference:

Predicting iRoot for Compound Idioms
A. Identifying Local Pairs B. Correlating with Idiom1 Prediction Results Note: VW- preset threshold, specified in idiom defintion

Active Scheduler x86 binary + Test Input Profiler Idioms Candidate Interleavings (iRoots) Test? iRoots to Be Tested Failed To Test iRoot Active Scheduler is used to expose the predicted candidate iRoots. Tested iRoots Active Scheduler + Recorder Output Tested Interleaving + Replay Log Bug Exposed Tested iRoots Potentially Infeasible iRoots

(a). Ideal case (b).deadlock case
Naïve Approach (a). Ideal case (b).deadlock case Reference:Paper-

Leverage Non-preemptive and Strict Priority Scheduler
Solution For Deadlock Leverage Non-preemptive and Strict Priority Scheduler

Complementary Schedules
Main point: use two test runs on each candidate iRoot

Asynchronous External Events

Limitations Precondition: from unlock to lock
From this picture, we could see if T2 is executed firstly, in order to meet “Ax to Bx”, it must go through unlock(m) on T2 and then it could execute Ax. Current Active scheduler could not handle pre-conditions. If we don’t know the pre-conditions, it would be possible that because of timeout, the system give up to find the iRoot(Ax to Bx). If it knows the pre-condition, it would not give up just because of timeout. Case 1: T2 prior(Bx -> Ax) Case2: T1 prior(Ax -> Bx) Precondition: from unlock to lock

Memoization Module Test Input Profiler Idioms Candidate Interleavings (iRoots) Test? iRoots to Be Tested Failed To Test iRoot Past work ignore the information about interleavings tested from the previous test runs. So the author puts forwards to adding two databases. One is to store tested iRoots. And another is to store the iRoot failed to be tested. Every candidate iRoots should be checked with these two databases. It could be used to reduce the number of interleavings that need to be tested for a given program input. Tested iRoots Active Scheduler Tested Interleaving Bug Exposed Tested iRoots

Evaluation How fast Maple can expose the bugs with bug triggering inputs when comparing to PCT PCT: a randomized testing technique Methodology Choose 13 buggy applications with their bug triggering inputs For each bug, run it repeatedly using the inputs until the bug is trigged Each time, a different testing technique is used

Maple Can Quickly Expose Bugs
Total time (in seconds) needed to trigger the bug Timeout: 24 hours Bug PCT Maple w/o Memo LogProcSweep Timeout 17.1 StringBuffer 56.4 12.8 CircularList 9.1 10.6 BankAccount 17.4 10.0 MySQL-LogMiss 29.0 133.9 Pbzip2 155.1 Apache #25520 2124.9 MySQL #791 Aget #2 355.0 177.4 Memcached #127 3635.1 316.0 Aget #1 198.6 CNC 4214.4 Glibc 1157.0 Maple w/ Memo 375.8 2316.5 122.3 284.8 Memoization help expose bugs fast Unknown Bugs

Evaluation Whether Maple is better in coverage-driven testing than other testing tools: PCT, PCTLarge, RandDelay and CHESS. Methodology Use iRoot coverage as the coverage metric Implement a tool called observer to measure the iRoot coverage Use 7 multi-threaded applications The tools use the same amount of time as Maple does They run Maple till its completion.

Maple gains iRoot Coverage faster
Normalized to the iRoot coverage achieved by Maple Maple gains iRoot coverage faster than the other tools.

Memoization Test the application using 8 different inputs.
The first test is without memoziation database The (i+1)th test uses the database built from the 1st input to the ith input. Only test for idiom1 iRoots due to time constraints.

Memoization Helps

Overhead of Maple Comparison to native execution time
Applications Profiler Active Scheduler fft 30.9X 16.3X radix 67.7X 17.8X pfscan 31.9X 27.7X pbzip2 183.3X 45.4X aget 34.4X 98.8X memcached 4.8X 4.1X apache 6.2X 6.0X mysql 15.7X 2.5X There is still room to improve the performance of Maple Ranging from 5X to 200X, and on average is 50X Ranging from 3X to 100X, and on avrage is 30X

Effectiveness of Active Scheduler
Success rate of the active scheduler 28% on idiom1, 17% on idiom2, 9% on idiom3, 10% on idiom4 and 9% on idiom 5 The accuracy of the active scheduler can be further improved.

Pros The paper is well-organized and the logic is transparent.
Explain some complex terms with easy-understanding examples. The author explains the methodology and basic algorithms in proper order.

Pros The evaluation of maple is convincing: adequate applications, different kinds of representing methods. It adopts real-world applications. The paper has elaborate comparisons between maple and other tools.

Cons The sample number for coverage rate of idioms is not enough.
The paper does not cover Idiom6 just because it doesn’t find it. The overhead of maple is high.

Cons The success rate for the active scheduler is low.
The paper does not provide computational complexity for the algorithms. The random arbitration algorithm the paper uses may cause a later access exponentially.

Next steps Adopt more samples to verify the coverage rate of the Idioms. Explore more on Idiom6. Reduce the overhead of maple. Compute the complexity of the algorithms. Handle pre-conditions. Solve the problem caused by random arbitration.

Thank you!

Maple: A Coverage-Driven Testing Tool for Multithreaded Programs

Similar presentations

Presentation on theme: "Maple: A Coverage-Driven Testing Tool for Multithreaded Programs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Maple: A Coverage-Driven Testing Tool for Multithreaded Programs

Similar presentations

Presentation on theme: "Maple: A Coverage-Driven Testing Tool for Multithreaded Programs"— Presentation transcript:

Similar presentations

About project

Feedback