Efficient x86 Instrumentation:

Efficient x86 Instrumentation:
Dynamic Rewriting and Function Relocation Itai Gurari Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI Paradyn/Condor Week Madison, WI March 12-14, 2001

Introduction Dynamic Instrumentation:
Insert instrumentation into application in execution Used by Paradyn to gather performance data Paradyn instrumentation is inserted for three types of points function entry, exit, and call Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrumentation Points
Paradyn Instrumentation Points Executable Code foo () { call <bar> } Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Paradyn Instrumentation Points Executable Code Entry foo () { call <bar> } Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Paradyn Instrumentation Points Instrumentation Executable Code Entry startTimer() foo () { call <bar> } counter++ Call Exit stopTimer() Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Goal Transfer from function to instrumentation code as quickly as possible Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Control Transfer To switch execution from a function to its instrumentation code: Overwrite instructions in function with a control transfer instruction. Equivalent of overwritten instructions are copied to the code patch area. On the x86, Paradyn uses, by default, a 5- byte jump to transfer control the instrumentation code. 5-byte jump range is whole address space If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Inserting Control Transfer Instructions
Dynamically rewrite function in place Different techniques for different types of instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrument Entry Point
Jumps and Traps Instrument Entry Point Case 1 push mov sub Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 1 push mov sub Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 2 push mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 2 push mov jmp Inserting a jump instruction interferes with the target of the backwards jump jmp <instrumentation> jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 2 push mov jmp Must use a trap instruction to get to instrumentation int3 mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Call Point call <Foo>
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Call Point call <Foo>
Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 1 mov leave ret

Jumps and Traps Instrument Exit Point Case 1 mov leave ret
Back up far enough to replace instructions with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2
call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2
call <Foo> leave ret Jump interferes with the preceding call call jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2a
call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Compiler pads with “bonus bytes” call <Foo> leave ret ? ? ? Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Compiler pads with “bonus bytes” call <Foo> leave ret ? ? ? Beginning of next function (4-byte boundary) Replace instructions with a jump call <Foo> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2b
Not enough “bonus bytes” to overwrite with a jump (if any) call <Foo> leave ret ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2b
Not enough “bonus bytes” to overwrite with a jump (if any) call <Foo> leave ret ? Overwrite return with a trap call <Foo> leave int3 ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Extra slot push mov sub mov
No jumps to first ten bytes of function push mov sub mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump jmp <instrumentation> mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump Make 2-byte jump to “extra slot”, overwrite “extra slot” with jump to instrumentation jmp <instrumentation> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Control Transfer Traps on x86
Generate an exception that is caught by either the application (Solaris, Linux) or the paradyn daemon (Windows NT). Address of trap instruction is used to calculate which instrumentation code to execute. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Problem Trap handling is slow: Traps Limit Instrumentation:
On Solaris 2.6 jumps are over 1000 times faster than traps. On Linux 2.2 jumps are over 200 times faster than traps Traps Limit Instrumentation: can’t insert as much or at as fine a granularity Trap handling logic is difficult: Susceptible to bugs Difficult to understand and maintain Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Solution Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps. Rewrite the function, on-the-fly: combines dynamic instrumentation, binary rewriting. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting

Dynamic Rewriting Dynamic Rewriting overwrite existing instructions

expand instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

expand instrumentation points Relocate Function Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Function Rewriting and Relocation
In Paradyn we rewrite a function: only if the function contains an instrumentation point that would require using a trap to instrument the first time a request to instrument the function is made even if the instrumentation to be inserted is not for a point that requires using a jump e.g. the exit needs a trap, the entry can use a jump, request is to instrument the entry Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Function Rewriting and Relocation (continued)
all instrumentation points that cannot use a jump are expanded. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function push mov call <Foo> call <Bar> ret
Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function push nop mov call <Foo> call <Bar>
Entry Call Insert nop at entry push nop mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function jmp < instrumentation > call <Foo>
Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret nop nop nop nop Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> jmp < instrumentation > Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function push mov call <Foo> call <Bar> ret
Original Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function jmp < rewritten function> call <Foo>
Original Function Entry Overwrite entry of original function with jump to rewritten function jmp < rewritten function> call <Foo> call <Foo> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Update Jumps and Calls PC-relative jump and call instructions:
with destinations outside the function will have incorrect displacements some jumps to locations inside the function will have incorrect displacements 2-byte jumps: have range of 128 bytes forward, 127 bytes backwards if target address is no longer in range, replace 2-byte instruction with 5-byte instruction that has further reach Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Status Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Current Limitations We do not relocate a function if:
the application is executing within the function we want to instrument it has a jump table Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Average time to get to instrumentation and back
Jumps vs. Traps Trap handling: Average time to get to instrumentation and back Trap Jump Solaris Linux 37.6 .03 .04 8.3 time in microseconds Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps vs. Traps Relocating functions that are performance bottlenecks, leads to greatest speedup More instrumentation can be inserted since perturbation to system is minimized. In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

bubba (circuit layout)
Some Results bubba (circuit layout) instrumented 9 functions for CPU all required trap for exit point 5 relocated functions called 400 thousand times consumed 20% of CPU. 23 seconds to execute using relocation 42 seconds to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

fspx (2-D heat transfer simulation)
Some Results fspx (2-D heat transfer simulation) 4 of 46 functions required traps all for exit points instrumented __atan for CPU required trap for exit called 107 million times consumed 25% of CPU. 7.5 minutes to execute using relocation 115 minutes to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Conclusions Dynamic rewriting and function relocation:
Used by Paradyn to allow using jumps, instead of traps, when profiling applications, to improve performance. Crucial for large scale and fine-grained instrumentation. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Efficient x86 Instrumentation:

Similar presentations

Presentation on theme: "Efficient x86 Instrumentation:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient x86 Instrumentation:

Similar presentations

Presentation on theme: "Efficient x86 Instrumentation:"— Presentation transcript:

Similar presentations

About project

Feedback