Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel.

Similar presentations


Presentation on theme: "Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel."— Presentation transcript:

1 Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel

2 Pin2 Tutorial2 Pin People Robert Cohn Kim Hazelwood Artur Klauser Geoff Lowney CK Luk Robert Muth Harish Patil Ramesh Peri Vijay Janapareddi Steven Wallace

3 Pin2 Tutorial3 Outline Pin Overview Instrumentation Basics Advanced Topics

4 Pin2 Tutorial4 What is Pin? P in I s N ot a TLA Pin is a dynamic binary rewriting engine Derived from Spike: a static rewriter Two versions available: –Pin2 is the current version –Pin0 (IPF only) is not covered in this talk

5 Pin2 Tutorial5 Pin Features Rewritten program exists only in memory No tool chain dependence –No issues with code/data mixing, missing relocs, etc. Rewrites all user level code including shared libs Multi-ISA: Itanium, IA32, EM64T, XScale Attach/detach to/from running process (like gdb) Transparent: unchanged program behavior Efficient: very good performance

6 Pin2 Tutorial6 Pin Applications Optimization Security (program shepherding) Debugging Instrumentation Instrumentation is our current focus

7 Pin2 Tutorial7 Uses for Instrumentation Profiling for optimization –Basic block counts, edge counts –Value profiles, stride profiling, load latencies Micro-architectural studies –Branch predictor simulation –Cache simulation –Trace generation Bug checking –Find uninitialized or unallocated data references

8 Pin2 Tutorial8 Pin Instrumentation Features User programmable via plug-ins –many examples provided –plug-ins are typically ISA agnostic Can take advantage of symtab info Automatic register saving/restoring Various instrumentation granularities –Instruction, “Trace”, Routine ATOM compatibility mode (AOTI)

9 Pin2 Tutorial9 Other Dynamic Rewriting Engines (and what they focus on) Dynamo (PA-RISC HPUX) –Dynamic optimization DynamoRIO (IA32 Linux + Win32) –Originally: Dynamic optimization –Now: Sandboxing, some instrumentation Valgrind (IA32 Linux) –Originally: Special-purpose instrumentation –Now: General-purpose instrumentation

10 Pin2 Tutorial10 Static Instrumentation (“Atom Style”) (Way) Ahead-of-time Persistent Good but not perfect transparency Shared libraries can be a problem Program Instrumented Program ATOM

11 Pin2 Tutorial11 Dynamic Instrumentation (“Pin Style”) Execution driven –Occurs when code is executed Original program is NOT modified –Code is “copied” into code cache –Only code in code cache is executed Instrumentation is not persistent Can also instrument libraries

12 Pin2 Tutorial12 Dynamic Instrumentation 23 1 7 45 6 Pin Original code Code cache Pin has grabbed control before execution of block 1

13 Pin2 Tutorial13 Dynamic Instrumentation 23 1 7 45 6 7’ 2’ 1’ Pin Original code Code cache Pin fetches trace and allows for instrumentation

14 Pin2 Tutorial14 Dynamic Instrumentation 23 1 7 45 6 7’ 2’ 1’ Pin Original code Code cache Pin transfers control into code cache (block 1)

15 Pin2 Tutorial15 Dynamic Instrumentation 23 1 7 45 6 7’ 2’ 1’ Pin Original code Code cache 3’ 5’ 6’ Pin fetches new trace and ‘links’ it

16 Pin2 Tutorial16 Dynamic Instrumentation 23 1 7 45 6 7’ 2’ 1’ Pin Original code Code cache 3’ 5’ 6’ Pin transfers control into code cache (block 3)

17 Pin2 Tutorial17 Running Pin Three program images are involved: 1.pin 2.pintool/plug-in 3.Application “Shell mode” $ pin –t inscount –- xclock “Gdb mode” - attaching to existing process $ pin –pid 1067 –t inscount (can detach and re-attach with different plug-in)

18 Pin2 Tutorial18 Transparency Program execution under Pin is transparent: Program state is unchanged –Code/data addresses, memory content Will not expose latent bugs Instrumentation sees the original program –Code/data address, memory content (But: intentional program state changes possible, e.g. fault injection)

19 Pin2 Tutorial19 Transparency (Example) Push 0x1006 on stack, then jump to 0x4000 Original Code: 0x1000 call 0x4000 Code cache address mapping: 0x1000 ->0x7000 “caller” 0x4000 -> 0x8000 “callee” Translated Code: 0x7000 Push 0x1006 0x7006 Jmp 0x8000 Stack content remains unchanged

20 Pin2 Tutorial20 Transparency has a Price Pop 0x1006 from stack, then jump to 0x1006 Original Code: 0x4400 ret Translated Code: 0x8400 Pop rx 0x84… ry = Translate(rx) 0x84… Jmp ry Pin needs to translate program address to code cache address. Main reason for slowdowns in dynamic instrumentation systems!

21 Pin2 Tutorial21 Portability Challenges ARMIA-32/EM64TIPF TypeRISCCISCVLIW InstructionFixed lengthVariable length, prefixes Bundled Memory Instruction LD/STAny, ImplicitLD/ST Memory op sizeFixedVariable lengthFixed Addressing modesPre/post/iprel increment Index/offset/ scale/iprel post PredicationCond. codesNonePredicate regs ParametersRegistersStack/registersStacked registers

22 Pin2 Tutorial22 Pin Instrumentation Query API ISA independent part (usually sufficient) –INS_Address(), INS_Size(), INS_IsRet(), INS_IsCall(), INS_MemoryReadSize(), INS_Mnemonic(), etc. ISA dependent part (optional) –INS_GetPredicate(), INS_RegR(), INS_RegW(), etc.

23 Pin2 Tutorial23 Performance Comparison: No Instrumentation  latest numbers are even better

24 Pin2 Tutorial24 Performance Comparison: Basic-Block Counting  latest numbers are even better

25 Pin2 Tutorial25 Pin2 Status ISAs: IA32, IA32E, Xscale, (IPF soon) Distros: Debian, Suse, Mandrake, Red Hat 7.2, 8.0, 9.0, EL3, FC3 >2500 downloads Multithreading support in beta Windows support in preparation

26 Pin2 Tutorial26 Project Engineering Automatic nightly testing –>4 platforms –>7 Linux distributions –>8 compilers –>9000 binaries Automatically generated user manual, internal documentation using Doxygen

27 Pin2 Tutorial27 Outline Pin Overview Instrumentation Basics Advanced Topics

28 Pin2 Tutorial28 Instrumentation vs. Analysis Concepts borrowed from ATOM Instrumentation routines define where instrumentation is inserted –e.g. before instruction Occurs at compile time (JIT time) Analysis routines define what to do when instrumentation is activated –e.g. increment counter Occurs at runtime

29 Pin2 Tutorial29 Instrumentation vs. Analysis (2) In ATOM: Instrumentation and analysis occurred in separate phase Code was in separate files In Pin: Difference is somewhat blurred Instrumentation and analysis are interleaved User plug-in provides code for both These are difficult terms to remember! Mental Bridge: Instrumentation → Insertion Analysis → Action

30 Pin2 Tutorial30 Instrumentation Routine Written in C++ Invoked by Pin via Callback mechanism Invoked when Pin places new code in code cache (different granularities: instruction, trace, …) Instruments using the Pin API for –inserting calls to analysis routines –picking arguments for analysis routines

31 Pin2 Tutorial31 Analysis Routines Written in any language: C, C++, Asm, etc. Invoked when surrounding code executes Isolated from application by –separate memory areas –separate register state Automatically optimized by Pin (inlining, register allocation, etc.)

32 Pin2 Tutorial32 Example: Instruction Count mov r2 = 2 add r3 = 4, r3 beq L1 add r4 = 8, r4 beq L2 IncCounter(); Instrumentation: Insert call to IncCounter() before every instruction Analysis: VOID IncCounter() { icount++; }

33 Pin2 Tutorial33 $ /bin/ls Makefile atrace.o imageload.out $ pin -t inscount -- /bin/ls Makefile atrace.o imageload.out Count 422838 $ Example: Instruction Count Output of inscount plug-in

34 Pin2 Tutorial34 #include #include "pin.H" UINT64 icount = 0; VOID IncCounter() { icount++; } VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)IncCounter, IARG_END); } VOID Fini(INT32 code, VOID *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } inscount.C analysis instrumentation driver ISA independent! 1 2 3

35 Pin2 Tutorial35 Explanations 1.Register Instruction() to be called back for every instruction placed into the code cache 2.Insert call to IncCount() before code cache instruction 3.Register Fini() to be called back at the end

36 Pin2 Tutorial36 2 Instrumentation Points L2:mov r9 = 4 ret beq L2 Relative to an instruction (“beq L2”): 1.Before (IPOINT_BEFORE) 2.After (IPOINT_AFTER) 3.On taken branch (IPOINT_BRANCH_TAKEN) 1 mov r4 = 2 add r3=8,r9 3

37 Pin2 Tutorial37 Example: Instruction Trace mov r2 = 2 add r3 = 4, r3 beq L1 add r4 = 8, r4 beq L2 traceInst(ip);

38 Pin2 Tutorial38 Example: Instruction Trace $ pin -t itrace -- /bin/ls Makefile atrace.o imageload.out $ head itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5 0x40001ee7 0x40001ee8 … $

39 Pin2 Tutorial39 #include #include "pin.H" FILE * trace; VOID traceInst(VOID *ip) { fprintf(trace, "%p\n", ip); } VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)traceInst, IARG_INST_PTR, IARG_END); } int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); return 0; } itrace.C 1

40 Pin2 Tutorial40 Explanations 1.Insert traceIns() before code cache instruction, traceIns() takes extra argument! (Bad coding practice: we should have closed the file descriptor using a Fini function)

41 Pin2 Tutorial41 Analysis Routine Parameters IARG_UINT32 IARG_REG_VALUE [*] IARG_INST_PTR IARG_BRANCH_TAKEN IARG_BRANCH_TARGET_ADDR IARG_G_ARG0_CALLER IARG_MEMORY_READ_EA IARG_SYSCALL_NUMBER … [*] Will result in ISA dependent tool

42 Pin2 Tutorial42 BBL1 BBL2 Example: Fast Instruction Count mov r2 = 2 add r3 = 4, r3 beq L1 add r4 = 8, r4 beq L2 IncCounter(1); IncCounter(3); IncCounter(2);

43 Pin2 Tutorial43 #include #include "pin.H“ UINT64 icount = 0; VOID IncCounter(INT32 c) { icount += c; } VOID Trace(TRACE trace, VOID *v) { for(BBL b=TRACE_BblHead(trace); BBL_Valid(b); b=BBL_Next(b)){ BBL_InsertCall(b, IPOINT_BEFORE, (AFUNPTR)IncCounter, IARG_UINT32, BBL_NumIns(b), IARG_END); } VOID Fini(INT32 code, VOID *v) { fprintf(stderr, "Count %lld\n", icount);} int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } inscount.C 1 2

44 Pin2 Tutorial44 Explanations 1. Register Trace() to be called back for every trace placed in the code cache As first approximation, a “trace” is sequence of basic blocks (BBLs) 2. For each trace walk the BBLs and insert IncCount() with appropriate integer parameter at beginning

45 Pin2 Tutorial45 Further Reading The following material is also covered in the Pin user manual Go to http://rogue.colorado.edu/Pin/ Then follow the “manuals” link

46 Pin2 Tutorial46 Summary Pin instrumentation is: –Robust –Transparent –Easy-to-use –Efficient –Portable Try it: http://rogue.colorado.edu/Pin

47 Pin2 Tutorial47 Outline Pin Overview Instrumentation Basics Advanced Topics

48 Pin2 Tutorial48 Trace vs. Instruction Instrumentation VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE,(AFUNPTR)Cnt, IARG_END); } Can be emulated by: VOID Trace(TRACE trace, VOID *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { for ( INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins)){ INS_InsertCall(ins,IPOINT_BEFORE,(AFUNPTR)Cnt,IARG_END); }

49 Pin2 Tutorial49 Definition: Pin Trace (JITI) List of instructions that is only entered from top, but may have multiple exits No side entries (Pin duplicates code to ensure this!) Multiple copies of instruction in code cache Program: mov r2 = 2 L2:add r3 = 4, r3 add r4 = 8, r4 beq L2 … Trace 1: mov r2 = 2 add r3 = 4, r3 add r4 = 8, r4 beq L2 … Trace 2: add r3 = 4, r3 add r4 = 8, r4 beq L2 …

50 Pin2 Tutorial50 Instrumentation Modes Just-In-Time Instrumentation (JITI) –Per instruction, per trace –“basic block” notion Ahead-Of-Time Instrumentation (AOTI) –Per instruction, per function, per section/image –Emulated using JITI –Functionality similar to ATOM –Extra startup overhead –No “basic blocks” notion

51 Pin2 Tutorial51 Per Image Instrumentation (AOTI) Hooking Image (Un)Loading $pin -t imageload -- /bin/ls Makefile imageload.o inscount0.o $ cat imageload.out Loading /bin/ls Loading /lib/ld-linux.so.2 … Unloading /bin/ls Unloading /lib/ld-linux.so.2 …

52 Pin2 Tutorial52 … FILE * T; VOID ImageLoad(IMG img, VOID *v) { fprintf(T, "Loading %s\n", IMG_Name(img).c_str());} VOID ImageUnload(IMG img, VOID *v) { fprintf(T, "Unloading %s\n", IMG_Name(img).c_str());} VOID Fini(INT32 code, VOID *v) { fclose(T); } int main(int argc, char * argv[]) { trace = fopen("imageload.out", "w"); PIN_Init(argc, argv); IMG_AddInstrumentFunction(ImageLoad, 0); IMG_AddUnloadFunction(ImageUnload, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; }

53 Pin2 Tutorial53 “Walking” Images VOID ImageLoad(IMG img, VOID *v) { for (SEC sec = IMG_SecHead(img); SEC_Valid(sec); sec = SEC_Next(sec)) { for (RTN rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn)) { RTN_Open(rtn); for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins)) static_count++; RTN_Close(rtn); }

54 Pin2 Tutorial54 Explanations Image->Section->Routine->Instruction We are essentially walking the symtab For each functions symbol: –Disassemble function (RTN_Open) –Then walk instructions –NB: no basic blocks available!

55 Pin2 Tutorial55 “Walking” And Instrumenting VOID ImageLoad(IMG img, VOID *v) { for (SEC sec = IMG_SecHead(img); SEC_Valid(sec); sec = SEC_Next(sec)) { for (RTN rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn)) { RTN_Open(rtn); for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins)) { INS_InsertCall(ins,IPOINT_BEFORE, (AFUNPTR)Cnt,IARG_END); } RTN_Close(rtn); }

56 Pin2 Tutorial56 Explanations AOTI, instrumentation request are cached until code is executed Effect like 1 st instruction count example But: –worse (startup) performance –higher memory consumption Requires symbol table → Bad use of AOTI!

57 Pin2 Tutorial57 “Searching” And Instrumenting VOID ImageLoad(IMG img, VOID *v) { RTN mallocRtn = RTN_FindByName(img, "malloc"); if (RTN_Valid(mallocRtn)) { RTN_Open(mallocRtn); RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MBefore, IARG_G_ARG0_CALLEE, IARG_END); RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MAfter, IARG_G_RESULT0, IARG_END); RTN_Close(mallocRtn); } SimpleExamples/malloctrace.C

58 Pin2 Tutorial58 Explanations Instrument prolog and epilogs of malloc() using RTN_InsertCall Instrumentation really happens on instruction level, hence we must call RTN_Open Requires symbol table Good use of AOTI!

59 Pin2 Tutorial59 Performance Considerations VOID count( ADDRINT s, ADDRINT d ) { COUNTER *pedg = Lookup( s,d ); // expensive! pedg->_count++; } VOID Instruction(INS ins, void *v) {... if ( [ins is a branch or a call instruction] ) INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)count, IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR, IARG_END);... }

60 Pin2 Tutorial60 Improved Version VOID count_fast( COUNTER *pedg ) { pedg->_count++;} VOID InstructionFast(INS ins, void *v) { … if (INS_IsDirectBranchOrCall(ins)) { COUNTER *pedg = Lookup( INS_Address(ins), INS_DirectBranchOrCallTargetAddress(ins) ); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) count_fast, IARG_ADDRINT, pedg, IARG_END); } else {... }

61 Pin2 Tutorial61 Remarks If possible move work from analysis to instrumentation! Keep analysis routine small so that they get inlined!

62 Pin2 Tutorial62 Plug-ins Shipped with Pin2 Data cache simulation Malloc/Free tracer Syscall tracer Opcode mix profiler Register usage profiler …

63 Pin2 Tutorial63 Debugging Pin Plug-ins Pause Pin for 7 sec to attach with gdb $ pin -pause_tool 7 -t inscount -- /bin/ls Pausing to attach to pid 28769 $ gdb (gdb) attach 28769 … (gdb) break main... (gdb) cont

64 Pin2 Tutorial64 Summary Pin instrumentation is: –Robust –Transparent –Easy-to-use –Efficient –Portable Try it: http://rogue.colorado.edu/Pin


Download ppt "Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel."

Similar presentations


Ads by Google