Efficient Instrumentation for Code Coverage Testing

Efficient Instrumentation for Code Coverage Testing
Mustafa M. Tikir Jeffrey K. Hollingsworth Today, I am going to talk about an efficient instrumentation technique for code coverage testing. This work is done by me and my advisor Dr. Jeffrey Hollingsworth in University of Maryland.

Evaluation of Code Coverage
Measuring code coverage is important to Identify unexecuted program statements Verify that each path is taken at least once Requires extensive instrumentation Must determine if every statement is executed As you might all know code coverage testing is the problem of identifying parts of the programs that did not execute in one or more runs. Shortly it is identifying what ran what did not execute in a program. It is important to verify all or substantially all paths are taken at least once during the testing phase. Code Coverage requires extensive program instrumentation to determine if every statement is executed.

Instrumentation for Code Coverage
Traditional Approach Static instrumentation using counters Instrumentation code remains for entire execution Conservative instrumentation of all possibly needed instrumentation code for all functions Useless instrumentation wastes time especially for long running programs such as servers and enterprise software Full instrumentation increases setup time Traditionally code coverage tools are build using static instrumentation. In static instrumentation counters are inserted in to the program during program compilation or linking. In static instrumentation, instrumentation code remains in the executable for entire execution if though after the first execution of the instrumentation code it does not produce any additional coverage information. Besides the static instrumentation requires conservative insertion of all instrumentation code for all functions even though the function will never be called. Leaving useless instrumentation wastes execution time especially for long running programs such as servers and enterprise software. Insertion of all possibly needed instrumentation increases the setup time. Especially program with a lot of libraries.

Our Approach Insert instrumentation code dynamically
Pre-Instrument all functions at program start On-Demand instrument at first function call Periodically remove instrumentation Use dominator trees Reduces instrumentation within a function Use incremental function instrumentation Insertion of instrumentation on first call Reduced which function gets instrumented To eliminate the disadvantages of static instrumentation we propose a new instrumentation technique to insert/remove instrumentation code dynamically. We either pre-instrument all functions at the program start or on-demand instrument functions at their first invocation. We then remove periodically instrumentation code that is already executed and collect coverage information. We also use dominator trees to reduce the number of instrumentation needed. I ll talk about it later in the talk. We use incremental function instrumentation to instrument functions at their first invocation. This reduces the number of functions to be instrumented.

Relocated Instruction
Dyninst API Modify code in a running program Implementations available for Alpha,Sparc,Power,Mips and x86 architectures A mutator program Generates machine code from high-level code Transfers machine code to a running mutatee program Program Base Tramp Mini Tramp Mutatee is the application being instrumented Base-trampoline contains relocated instructions slots for calling code Mini-trampoline stores the inserted code Save Registers Pre Our technique would not be possible without dyninst API. Dyninst API is application interface to a library that permits insertion and deletion of code in a running program. DyninstAPI is machine independent and the implementations are available for Alpha,Sparc,Mips,Power and x86 architectures. Dyninst library has two processes. Mutator program is responsible for generating the machine code from high level instrumentation code and transfers it to the application. Mutatee is the application that is being modified. It contains trampolines. The base base-trampoline contains the relocated instructions and slots for calling instrumentation before and after instructions. The mini-trampoline stores the machine code for instrumentation. Setup Args Relocated Instruction Code Snippet Restore Registers Post

Using Dominator Trees Definitions: Fact:
A dom B if all paths from entry to basic block B goes through basic block A A idom B if, for all C, (C != A) and (C dom B) implies (C dom A) Fact: If a basic block, B, is executed all basic blocks along the path from B to the root of dominator tree also execute C B A Before explaining how we used dominator trees I am going to give some definitions. As you already know in a CFG basic block A dominates basic block B if all paths from entry to B goes through A. Here All paths to B goes through from A and C. Immediate dominator of a basic block is the last dominator of the basic block on all paths from entry to the basic block.. For example both A and C dominates B but A is the immediate dominator. The dominator tree of a CFG is built using the immediate dominator relation. The importance of dominator trees for our work is the fact that: If a basic block B is executed all blocks along the path from B to the entry in dominator tree are also executed. In this example, if B is executed A and C are executed.

Leaf Node Instrumentation
Leaf nodes in dominator tree are instrumented Coverage of internal nodes will be inferred Coverage information propagated from leaf nodes to entry Dominator Tree Control Flow Graph Entry Entry 1 1 3 Here is a CFG and its dominator tree. Using the fact we instrument the leaf nodes in dominator tree. Let’s assume the flow of control goes through Entry,1,2,5,4 and exit. At function exit we record that Exit and 5 are executed. Using exit block we infer that 4,2,1 and entry are also executed. Similarly using 5 we infer 2,1 and entry are also executed. The coverage information obtained from instrumented basic blocks are propagated towards to root node. 2 2 3 5 4 4 5 Exit Exit

Non-Leaf Node Instrumentation
Leaf node instrumentation is necessary but not sufficient Control flow might cause cross edges in dominator tree Control Flow Graph Dominator Tree Entry Entry 2 1 1 2 3 Exit 3 Leaf node instrumentation is necessary but now sufficient. Here is an example of a CFG and its dominator tree. Let’s assume we instrument basic blocks Exit and 3. At function termination we record that Exit is executed. Using Exit we infer that 2 and entry is also executed. However, it does not say anything about 1. That is even though 1 is executed it is not inferred using leaf nodes. This is due to the fact that the flow of control does not have to follow a path in dominator tree and there can be cross edges. To capture these cases we also instrument a basic block if it has an outgoing edge to another that it does not dominate. In this example basic block 1 is also instrumented. Exit We also instrument basic block, A, if A has at least one outgoing edge to basic block, B, and A does not dominate B

Code Coverage Algorithm
Pre-Instrumentation On-Demand Instrumentation At Start Create CFG and dominator trees for all functions Instrument the basic blocks selected During Execution Stop the execution at fixed time intervals Delete the executed instrumentation code At Termination Propagate and collect the coverage information At Start Insert breakpoints at each function entry During Execution On breakpoint Identify the function Create CFG and dominator tree Instrument the basic blocks selected At fixed time intervals delete the executed code At Termination Propagate and collect the coverage information Here is a summary of algorithms we use for our coverage tools. For our tools with pre-instrumentation, at program start we create CFG and dominator trees for all functions. We then instrument the basic blocks selected. During execution we stop the execution at fixed time intervals and delete the instrumentation that is already executed. At termination we collect and propagate the coverage information. Unlike pre-instrumentation, using on-demand instrumentation we only insert breakpoints at the entry of functions at start. During execution when a breakpoint is reached we identify the function that caused the breakpoint, create CFG and its dominator tree and instrument basic blocks. We stop the execution at fixed time intervals and delete the instrumentation code that is executed. At termination we collect the coverage information.

Reduction In Instrumentation Points
Here is a chart that shows the reduction in number of instrumentation points needed. The x-axis is for benchmarks we tested and y-axis for the reduction percentage. Each benchmark has four bars. In each group, the first two bars are used for our coverage tools using pre-instrumentation and the other two for on-demand instrumentation. In each bar tuple the first bar represents all basic blocks instrumentation and the next one using dominator trees. It shows that our coverage tools reduced the number of instrumentation points needed by 34-49% using pre-instrumentation. It also shows that using on-demand instrumentation reduced the number of points needed by 42-79%. 34-49% with pre-instrumentation 42-79% with on-demand instrumentation

SPEC/compress Coverage Curve
Now I will explain two benchmarks in details using their coverage curve and slowdown ratios. Here is the coverage percentage curve for SPEC/compress benchmark. Compress is a small program. The x-axis represents the time in seconds where y-axis is used for coverage percentage in number of source lines. The graph shows that the coverage rapidly increases to 76% at the beginning of the execution. This shows that most of the basic blocks that will execute are executed at the beginning and the rest is the re-executions. Covers 76% in the first 18% of the execution Most of the basic blocks that will execute are covered at the beginning of the program Rest of the run is re-executions

SPEC/compress Execution Time
Purecov is a commercial state of art code coverage tool Our code coverage tools outperform purecov Significant reduction when dynamic code deletion is enabled Most of the instrumentation code is deleted at the beginning Setup time and deletion overhead is insignificant A few number of basic blocks to instrument and check To compare the performance of our technique we used a commercial state of art coverage tool, purecov, for comparison. This chart compares our code coverage tools to purecov with respect to the slowdown ratio. The number under in each bar represents the value of the deletion interval for our code coverage tools. 0 means there is no dynamic code deletion. The first bar represents the purecov execution. This chart shows that our coverage tools outperform purecov. There is significant reduction when the deletion is enabled. This is due to the fact that most of the instrumentation code is executed and deleted at the beginning and the setup time and deletion interval overhead for our tools are insignificant for compress. It has a few number of basic blocks.

PostgreSQL Coverage Curve
Here is the coverage curve for postgres running Wisconsin benchmark queries. Postgres is a large complex program. Wisconsin benchmark queries are used to measure the performance of database systems. It runs join and select queries repeatedly. This graph shows that the coverage curve gradually increases to 19% during the execution, staying around 10% in the first half. Wisconsin benchmark queries Measure the performance of database systems Executes select/join queries repeatedly

PostgreSQL Execution Time
Setup, Instrumentation and Deletion Time Here is the execution time slowdown ratios for postgres running wisconsin benchmark. The tops parts of each bar is used to represent the setup time and deletion interval overhead and the bottom part is for execution time. To also show the impact of the other factors we include our tool with pre-instrumentation and all basic block instrumentation in this chart. Unlike compress example we see a significant setup time and deletion interval overhead for postgres. This is due to the fact that there a 45K of basic blocks in postgres executable. However using on-demand instrumentation our coverage tools outperform purecov almost always. It also shows that on-demand instrumentation outperforms pre-instrumentation. Another important thing is that the affect of deletion interval value. The curve over the bars in each group shows us that there is an optimal deletion interval value to control the deletion interval overhead. The chart shows that using on-demand instrumentation and dominator trees is complementary to reduce the setup time and deletion interval overhead. Using on-demand instrumentation Our coverage tools outperform purecov almost always On-demand instrumentation outperforms pre-instrumentation

Instrumentation Execution Frequency
Overhead of our dynamic code coverage system is bursty Running previously unexecuted code results in the execution of a significant amount of instrumentation Running previously executed code does not result in the execution of any instrumentation The overhead of our code coverage tools is bursty. As we delete the instrumentation code shortly after it is executed the instrumentation code can be executed during a few intervals. X-axis represents the time in seconds. Here is the coverage curve for postgres running Wisconsin benchmark using the first y-axis. The bars represent the execution frequency of instrumentation code in log10 scale using the second y-axis. This graph shows that whenever there is new code covered there are significant number of executions of instrumentation code. When the coverage curve stays steady there are no or few number of instrumentation code executions.

Overall Slowdown Results for our code coverage tool are for 2-second deletion Slowdown using purecov ranges from 1.83 to 19.78 Slowdown using our code coverage tools range From to 2.6 for on-demand instrumentation From to 4.96 for pre-instrumentation Here is the comparison of purecov(yellow) and our code coverage tool(orange) using dominator trees and on-demand instrumentation with respect to execution time slowdown ratio. The dotted line gives the uniformed execution time. The results for our coverage tools are taken with 2-second deletion interval. Execution time slowdown for purecov ranges from 1.83 to Our code coverage tool slowdowns the execution by to 2.6 using on-demand instrumentation and to 4.96 using pre-instrumentation.

Dyninst Coverage Tool Here is a snapshot from the graphical user interface of our coverage tools. It has different windows for navigation. Using this window you can navigate over the source files. Using the next one, you can navigate over the functions in the source file and see the coverage percentage. In the bigger window you can display the source file where the executed statements are highlighted. It has also two graphs, one showing the coverage curve versus time and the other for the number of instrumentation deleted for deletion each interval.

Conclusions Dominator trees
Reduce instrumentation points by 34-49% Plus on-demand instrumentation reduce instrumentation points by 42-79% Combining dominator trees and on-demand instrumentation reduces Setup time and deletion interval overhead Runtime overhead by 38-90% compared to purecov Dynamic deletion of instrumentation Computes coverage information faster From our work we have conclude that dominator trees reduce the number of instrumentation by 34-49%. Using on-demand instrumentation besides dominator trees reduce instrumentation points by 42-79%. Combining dominator trees and on-demand instrumentation is complementary as it reduces the setup time and deletion interval overhead. Runtime overhead is reduced by 38-90% compared to purecov. Dynamic deletion of instrumentation code produces coverage information faster.

Conclusions (cont) Code Coverage overhead reduced to about 10%
Code coverage can now be included as part of production code Information about the execution of extremely infrequent error cases will be provided Reduced overhead for residual testing Dyninst library + Dyninst Coverage tools Using our coverage tools, now we reduced the overhead of code coverage testing around 10%. So we believe code coverage can be included as part of production code such that information about the execution of extremely infrequent cases can be obtained. The overhead of residual testing can be significantly reduced. Dyninst library and our coverage tools can be obtained from

Efficient Instrumentation for Code Coverage Testing

Similar presentations

Presentation on theme: "Efficient Instrumentation for Code Coverage Testing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Instrumentation for Code Coverage Testing

Similar presentations

Presentation on theme: "Efficient Instrumentation for Code Coverage Testing"— Presentation transcript:

Similar presentations

About project

Feedback