Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient x86 Instrumentation:

Similar presentations


Presentation on theme: "Efficient x86 Instrumentation:"— Presentation transcript:

1 Efficient x86 Instrumentation:
Dynamic Rewriting and Function Relocation Itai Gurari Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI Paradyn/Condor Week Madison, WI March 12-14, 2001

2 Introduction Dynamic Instrumentation:
Insert instrumentation into application in execution Used by Paradyn to gather performance data Paradyn instrumentation is inserted for three types of points function entry, exit, and call Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

3 Instrumentation Points
Paradyn Instrumentation Points Executable Code foo () { call <bar> } Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

4 Instrumentation Points
Paradyn Instrumentation Points Executable Code Entry foo () { call <bar> } Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

5 Instrumentation Points
Paradyn Instrumentation Points Instrumentation Executable Code Entry startTimer() foo () { call <bar> } counter++ Call Exit stopTimer() Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

6 Goal Transfer from function to instrumentation code as quickly as possible Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

7 Control Transfer To switch execution from a function to its instrumentation code: Overwrite instructions in function with a control transfer instruction. Equivalent of overwritten instructions are copied to the code patch area. On the x86, Paradyn uses, by default, a 5- byte jump to transfer control the instrumentation code. 5-byte jump range is whole address space If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

8 Inserting Control Transfer Instructions
Dynamically rewrite function in place Different techniques for different types of instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

9 Instrument Entry Point
Jumps and Traps Instrument Entry Point Case 1 push mov sub Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

10 Instrument Entry Point
Jumps and Traps Instrument Entry Point Case 1 push mov sub Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

11 Instrument Entry Point
Jumps and Traps Instrument Entry Point Case 2 push mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

12 Instrument Entry Point
Jumps and Traps Instrument Entry Point Case 2 push mov jmp Inserting a jump instruction interferes with the target of the backwards jump jmp <instrumentation> jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

13 Instrument Entry Point
Jumps and Traps Instrument Entry Point Case 2 push mov jmp Must use a trap instruction to get to instrumentation int3 mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

14 Jumps and Traps Instrument Call Point call <Foo>
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

15 Jumps and Traps Instrument Call Point call <Foo>
Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

16 Jumps and Traps Instrument Exit Point Case 1 mov leave ret
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

17 Jumps and Traps Instrument Exit Point Case 1 mov leave ret
Back up far enough to replace instructions with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

18 Jumps and Traps Instrument Exit Point Case 2
call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

19 Jumps and Traps Instrument Exit Point Case 2
call <Foo> leave ret Jump interferes with the preceding call call jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

20 Jumps and Traps Instrument Exit Point Case 2a
call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

21 Jumps and Traps Instrument Exit Point Case 2a
Compiler pads with “bonus bytes” call <Foo> leave ret ? ? ? Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

22 Jumps and Traps Instrument Exit Point Case 2a
Compiler pads with “bonus bytes” call <Foo> leave ret ? ? ? Beginning of next function (4-byte boundary) Replace instructions with a jump call <Foo> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

23 Jumps and Traps Instrument Exit Point Case 2b
Not enough “bonus bytes” to overwrite with a jump (if any) call <Foo> leave ret ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

24 Jumps and Traps Instrument Exit Point Case 2b
Not enough “bonus bytes” to overwrite with a jump (if any) call <Foo> leave ret ? Overwrite return with a trap call <Foo> leave int3 ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

25 Jumps and Traps Extra slot push mov sub mov
No jumps to first ten bytes of function push mov sub mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

26 Jumps and Traps Extra slot push mov sub mov
No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump jmp <instrumentation> mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

27 Jumps and Traps Extra slot push mov sub mov
No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump Make 2-byte jump to “extra slot”, overwrite “extra slot” with jump to instrumentation jmp <instrumentation> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

28 Control Transfer Traps on x86
Generate an exception that is caught by either the application (Solaris, Linux) or the paradyn daemon (Windows NT). Address of trap instruction is used to calculate which instrumentation code to execute. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

29 Problem Trap handling is slow: Traps Limit Instrumentation:
On Solaris 2.6 jumps are over 1000 times faster than traps. On Linux 2.2 jumps are over 200 times faster than traps Traps Limit Instrumentation: can’t insert as much or at as fine a granularity Trap handling logic is difficult: Susceptible to bugs Difficult to understand and maintain Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

30 Solution Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps. Rewrite the function, on-the-fly: combines dynamic instrumentation, binary rewriting. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

31 Dynamic Rewriting Dynamic Rewriting
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

32 Dynamic Rewriting Dynamic Rewriting overwrite existing instructions
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

33 Dynamic Rewriting Dynamic Rewriting overwrite existing instructions
expand instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

34 Dynamic Rewriting Dynamic Rewriting overwrite existing instructions
expand instrumentation points Relocate Function Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

35 Function Rewriting and Relocation
In Paradyn we rewrite a function: only if the function contains an instrumentation point that would require using a trap to instrument the first time a request to instrument the function is made even if the instrumentation to be inserted is not for a point that requires using a jump e.g. the exit needs a trap, the entry can use a jump, request is to instrument the entry Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

36 Function Rewriting and Relocation (continued)
all instrumentation points that cannot use a jump are expanded. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

37 Rewriting A Function push mov call <Foo> call <Bar> ret
Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

38 Rewriting A Function push nop mov call <Foo> call <Bar>
Entry Call Insert nop at entry push nop mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

39 Rewriting A Function jmp < instrumentation > call <Foo>
Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

40 Rewriting A Function jmp < instrumentation > call <Foo>
Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret nop nop nop nop Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

41 Rewriting A Function jmp < instrumentation > call <Foo>
Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> jmp < instrumentation > Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

42 Rewriting A Function push mov call <Foo> call <Bar> ret
Original Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

43 Rewriting A Function jmp < rewritten function> call <Foo>
Original Function Entry Overwrite entry of original function with jump to rewritten function jmp < rewritten function> call <Foo> call <Foo> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

44 Update Jumps and Calls PC-relative jump and call instructions:
with destinations outside the function will have incorrect displacements some jumps to locations inside the function will have incorrect displacements 2-byte jumps: have range of 128 bytes forward, 127 bytes backwards if target address is no longer in range, replace 2-byte instruction with 5-byte instruction that has further reach Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

45 Status Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

46 Current Limitations We do not relocate a function if:
the application is executing within the function we want to instrument it has a jump table Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

47 Average time to get to instrumentation and back
Jumps vs. Traps Trap handling: Average time to get to instrumentation and back Trap Jump Solaris Linux 37.6 .03 .04 8.3 time in microseconds Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

48 Jumps vs. Traps Relocating functions that are performance bottlenecks, leads to greatest speedup More instrumentation can be inserted since perturbation to system is minimized. In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

49 bubba (circuit layout)
Some Results bubba (circuit layout) instrumented 9 functions for CPU all required trap for exit point 5 relocated functions called 400 thousand times consumed 20% of CPU. 23 seconds to execute using relocation 42 seconds to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

50 fspx (2-D heat transfer simulation)
Some Results fspx (2-D heat transfer simulation) 4 of 46 functions required traps all for exit points instrumented __atan for CPU required trap for exit called 107 million times consumed 25% of CPU. 7.5 minutes to execute using relocation 115 minutes to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

51 Conclusions Dynamic rewriting and function relocation:
Used by Paradyn to allow using jumps, instead of traps, when profiling applications, to improve performance. Crucial for large scale and fine-grained instrumentation. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation


Download ppt "Efficient x86 Instrumentation:"

Similar presentations


Ads by Google