Presentation is loading. Please wait.

Presentation is loading. Please wait.

PinOS: A Programmable Framework for Whole-System Dynamic Instrumentation Prashanth P. Bungale 14 th June 2007 Joint work with Chi-Keung Luk.

Similar presentations

Presentation on theme: "PinOS: A Programmable Framework for Whole-System Dynamic Instrumentation Prashanth P. Bungale 14 th June 2007 Joint work with Chi-Keung Luk."— Presentation transcript:

1 PinOS: A Programmable Framework for Whole-System Dynamic Instrumentation Prashanth P. Bungale 14 th June 2007 Joint work with Chi-Keung Luk

2 2 Outline  Pin Overview  PinOS motivation and goals  Architecture  Design Issues  Evaluation  Future work

3 3 What is Pin?  A Dynamic Binary Instrumentation System  Inject and delete instruction stream at run-time without source code  Programmable Instrumentation  Provides APIs to write instrumentation tools (called PinTools) in C/C++  Multiplatform  Supports 32-bit and 64-bit x86, Itanium  Supports Linux, Windows, MacOS  Robust  Instruments real-life and multithreaded applications  Database, search engines, web browsers  Increasingly Popular  Over 10000 downloads since Pin was released in 2004 June

4 4 Pin Instrumentation Uses  Computer Architecture Research –Branch predictor simulation –Cache simulation –Trace generation –Instruction Emulation E.g., emulate newly proposed instructions  Software Instrumentation –Profiling for optimization Basic block counts, edge counts –Bug checking

5 5 PinOS Goals  Extend Pin to instrument OS code as well  Programmable through extended Pintool API  Fine-grain instrumentation of both kernel- and user-level code  No limitation on where and what kind of instrumentation can be inserted  Not achievable by existing probe-based tools (e.g., Dtrace and Kprobe) ‏  Only active when needed  Attach/detach PinOS to/from the guest as and when needed  Generalized Infrastructure Single framework to instrument Linux, Windows, etc.

6 6 PinTool on PinOS: Tracing Memory Writes FILE * trace; // Print a memory write record VOID RecordMemWrite(VOID * ip, VOID * va, VOID * pa, UINT32 size) { Host_fprintf(trace,"%p: W %p %p %d\n", ip, va, pa, size); } // Called for every instruction VOID Instruction(INS ins, VOID *v) { if (INS_IsMemoryWrite(ins)) ‏ INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR(RecordMemWrite), IARG_INST_PTR, IARG_MEMORYWRITE_VA, IARG_MEMORYWRITE_PA, IARG_MEMORYWRITE_SIZE, IARG_END); } int main(int argc, char *argv[]) { PIN_Init(argc, argv); trace = Host_fopen("atrace.out", "w"); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); // Never returns return 0; }

7 7 Architecture Xen-Domain0 Host OS Xen-DomainU Xen Virtual Machine Monitor (VMM) ‏ H a r d w a r e Guest OS PinOS 1 To run PinOS between guest and hardware: Use Xen Virtualize and present a fake processor to the guest OS 1 2 2 PinTool I/O Engine CodeCache

8 8 Xen 3.0 - A Convenient Environment  Uses Intel VT to run unmodified operating systems  Open-source availability  We modify Xen 3.0 to customize for PinOS purposes:  Steal physical and virtual memory for PinOS  Provide I/O services to PinOS  Hijack initial control of guest domain  Perform PinOS attach/detach  Provides support for debugging PinOS

9 9 Stealing Physical Memory  Memory requirements  PinOS exe, Pintool exe, Code Cache, PinOS stack, heap, I/O buffers  Physical Memory  Pre-allocate a separate range of machine pages for PinOS Machine Pages Physical Pages

10 10 Stealing Virtual Memory  Steal some portion of guest address space  Current strategy: steal part of guest’s kernel address space – Minimizes chance of VA space conflicts  Map stolen VA space to pre-allocated pages in Xen shadow  Propagate stealing to every shadow table  i.e., in every address space ever encountered in the guest OS  Detect and report any conflicts  No guest OS mapping activity encountered so far in stolen VA space  Should be less of an issue with 64-bit address space

11 11 Memory Virtualization PiPi ViVi …… P1P1 V1V1 P0P0 V0V0 MnMn VnVn …… M K+1 V k+1 MkMk VkVk …… MiMi ViVi …… M1M1 V1V1 M0M0 V0V0 Page Table Guest OS Shadow Page Table Xen PinOS Memory

12 12 I/O Services for PinOS  I/O Service requirements  PinOS’s own debugging log, Pintools’ input/output  I/O channels implemented as shared ring buffers  PinOS writes I/O requests to buffer shared b/w guest and host domains  Daemon process in host domain periodically polls and processes requests  Sharing the ring buffers  Allocated in guest domain  “Mapped in” by host domain Host Domain Guest Domain PinOS Daemon Process

13 13 PinOS Attach/Detach  Attach/Detach allows PinOS to be used only on subject execution  Avoid overhead  e.g., can avoid PinOS being active during OS boot every time  Precision / accuracy  PinOS on entire run may pollute instrumentation data collections  Implementing Attach  Read entire state of guest machine  Start PinOS activity from that point on  Use VT support for reading and setting hidden register state Attach Detach PinOS Native

14 14 Code-Cache Indexing and Sharing  Pin uses VA as code cache index  In PinOS, different processes can use same VA for different code  Virtual address alone is not sufficient to distinguish code  Option 1:  Easy to implement (On x86, use the CR3 value) ‏  But, no sharing of code across address spaces  Option 2:  Can share code across address spaces  Persistence across application runs  But, much more challenging to implement

15 15 Results on booting FC4-Linux is the Clear Winner! Execution time Code cache space used

16 16 Correctness Issue with Trace Linking jmp V2 Guest Code in Process A jmp V2 Guest Code in Process B V1’: Translation of jmp V2’ Code Cache V2’: Translation of Step 1: Process A is instrumented and its translation is cached. Step 2: Process B is instrumented and finds that is already translated. So, no need to re- translate. However, the jump to V2’ is incorrect because V2 is now mapped to P3 instead of P2!

17 17 Code-Cache Indexing and Sharing V2’: Translation of if (SoftTLB[V2] != P2) ‏ { // is invalid. call PinOS(); // Never return } // is still valid. //Execute the rest of the trace. A Translated Trace in Code Cache SoftTLB P3V2 P1V1 PAVA  Our solution:  Check predicted page mapping against actual one at each trace entry  Maintain “SoftTLB” that caches current guest page mappings  Assign once and always use same TLB entry for a given VA->PA mapping  So that the trace entry check can involve a constant address lookup

18 18 Coherence: Handling Page-Mapping Changes  Problem  Guest’s page mappings may change after PinOS caches them in SoftTLB  Solution  Xen already marks guest page-table pages as read-only and thus tracks all writes to them  Modify Xen to inform PinOS once it figures out which page-table entries get changed  PinOS then invalidates these page mappings in its SoftTLB

19 19 Interrupt/Exception Virtualization  PinOS virtualizes interrupts and exceptions:  Maintaining control  Ex: Timer interrupt triggering process preemption  Maintaining transparency  Ex: Guest interrupt handler attempting to identify thread ID based on ESP  Install own interrupt handlers in Interrupt Descriptor Table (IDT) ‏  So all interrupts and exceptions are routed through PinOS  Handling interrupts (asynchronous) ‏  When received by PinOS, put it on a queue  Add a pending interrupts check at every trace entry  Setup interrupted guest context with trace address and context  Continue instrumentation at corresponding guest interrupt handler  Handling exceptions (synchronous) ‏  Recover excepting guest address and context and setup context  Continue instrumentation at corresponding guest exception handler

20 20 Exception Virtualization  Precise Exception Delivery  In the face of “pseudo” instruction boundaries  Log and Rollback all guest-visible state changes until most recent guest instruction boundary  Faithful Exception Delivery  While emulating instructions, conditions must be checked, and exceptions raised as guaranteed by hardware semantics movw %ds, (%edx) ‏ call proc spill %eax movw M.%ds, %ax movw %ax, (%edx) ‏ restore %eax pushl jmp xlated-proc Original Guest Code Translated Code “Pseudo” Instruction boundary Guest Instruction boundary

21 21 Coherence: Handling Self-Modifying Code  Self-modifying code problem  Content of a code page may change after Pin has cached that page  Write-monitoring Solution  Standard page-table trick  Bookkeeping  Maintain a reverse page-mapping table  i.e., a PA -> VA mapping table  Upon bringing in code from given physical page:  Write-protect all virtual pages that ever map into this physical page

22 22 Experiment Setup  Environment:  Xen 3.0.2 running on Intel VT-enabled machines  Guest domain installed with Fedora Core 4 Linux  Benchmarks:  Fedora Core 4 Linux boot  Apache-bench (web-server) ‏  Mysql-test (database server) ‏  Pintools:  Insmix  Code profiler that collects basic-block and instruction mix info  CMP$im  Cache simulator that models a multi-level cache hierarchy  Results in paper

23 23 Distribution of Kernel and User-level Instructions

24 24 0.32%105170776__might_sleep0xc011d565 0.13%45170776__might_sleep + 0x1a0xc011d57f 0.16%55170776__might_sleep + 0x2a0xc011d58f 0.38%610177398ext3_do_update_inode + 0x820xc8aac20b 1.17%293531291delay_pit + 0x1a0xc0111a40 Ins % Contribution Num-InsCountBbl Symbol NameBbl Addr Top 5 hottest kernel-level basic blocks of mysql-test-alter-table Basic Block Count Results

25 25 17777043RDTSC 801350CLTS 54923619INVLPG 4458207HLT 48403240INSW 31994311762IN 9824990104OUTSW 57181551209OUT 574204599646IRETD 8459212217286STI 28069918912950CLI fc4-boot mysql-test- alter-table Privileged Instruction Insmix Results NA MOV DR 00WRMSR 20LLDT 20LIDT NA MOV CR 00LMSW 00RDPMC 150RDMSR 00WBINVD 00INVD 10LTR 20LGDT fc4-boot mysql-test- alter-table Privileged Instruction

26 26 Performance of PinOS

27 27 Related Work I  Dynamic Optimization  Dynamo [2000], DynamoRIO [2003]  Mojo [2000]  Software Dynamic Translation  Strata [2003]  Dynamic Binary Analysis and Instrumentation  Shade [1994] - SPARC & MIPS  Walkabout [2002], Valgrind [2004]  Pin [2005], HDTrans [2006]  Probe-based Dynamic Binary Instrumentation  KernInst [1999], DynInst [2000], LTT [2000],  DProbes [2001], KProbes [2004]  DTrace [2004], SystemTap [2005]

28 28 Related Work II  Full Machine Simulation/Emulation  Embra (SimOS) [1996] – MIPS  Simics [2002]  Bochs [2002], QEmu [2005]  Para-Virtualization  Denali [2002], Xen [2003]  Full Virtualization  VMware [2002]  Hardware-assisted Virtualization  Intel Virtualization Technology (VT) [2006]  AMD Pacifica Technology [2006]

29 29 Future Work  Make PinOS capable of instrumenting Windows  PinOS Infrastructure Support  64-bit support (x86_64) ‏  Multi-Processor support (MP) ‏  Now that we have this powerful infrastructure, let’s write Pintools!  Interesting Pintools include debuggers, profilers, tracing tools, etc.  Plan to release to public  Interesting users and uses may demand further enhancements

30 30 Acknowledgments  Thanks to the entire Pin team  For giving us a robust Pin to start with  Thanks to:  Mark Charney  For helping us better understand Xed  For fixing XED issues (only a few) very promptly  Greg Lueck  For many helpful discussions, esp. about signals  For fixing related bugs in mainline Pin  Prof. Jonathan Shapiro and Swaroop Sridhar  For collaboration on initial ideas about segmentation virtualization

31 31 Thank You! Questions?

32 32 Backup Slides…

33 33 Virtualization of System-Level State Segmentation Support Segment Registers GDT/LDT Paging Support CR3 (PDBR) ‏ Page-table structures Interrupt/Exception Delivery IDT Task support TR EFLAGS Including privileged bits like IF

34 34 Review of IA-32 Memory Management

35 35 Review of segment addressing CS DS segment selector SS segment selector ES segment selector FS segment selector GS segment selector Segment Registers segment descriptor LDTGDT … … segment descriptor … … 8K Entries Courtesy: Gregory Lueck

36 36 Review of segment addressing index Table indicator 0 – GDT 1 – LDT Privilege info Segment Selector base address limitother Segment Descriptor Courtesy: Gregory Lueck

37 37 Review of segment addressing index1 FS base address limitother LDT + mov %fs:0x10, %eax effective address Courtesy: Gregory Lueck

38 38 Hidden Part of Segment Register index, GDT/LDTbase, limit, acc. rights visible parthidden part  Hidden part “cached” from LDT / GDT  Might be out-of-sync, software depends on this!  Saving segment register writes only visible part to memory  Restoring reads hidden part from GDT / LDT  Asymmetry: save / restore may change contents! Courtesy: Gregory Lueck

39 39 Irreversible Segmentation Problem Instrumentation Engine GDT A 0x10 B GDT B 0x10 GDT Selector: 0x10 Desc. Cache: A DS: Selector: 0x10 Desc. Cache: A DS: Selector: 0x10 Desc. Cache: B DS: Guest Writes B into GDT[0x10] Gratuitous Load performed by Instrumentation System Wrong! Should still be A as the guest has not yet explicitly performed a load into DS! Restore DS Save DS

40 40 Segmentation Virtualization DS Register Guest GDT/LDT 0x10: PinOS GDT active on H/W CS Desc. Cache DS Desc. Cache ES Desc. Cache FS Desc. Cache GS Desc. Cache SS Desc. Cache LDTR Desc. Cache TR Desc. Cache mov 0x10 -> ds Issued by guest PinOS Stolen Entries mov 0x2 -> ds Issued on hardware & Emulated DS Register updated with 0x10 Emulated DS Register  Key Insight: Just virtualize hardware descriptor caches  Don’t virtualize segmentation tables GDT/LDT at all!  As and when guest explicitly loads hardware registers:  Copy guest segment descriptors into corresponding caches  Issue hardware register load instructions with modified selector  Use dynamic translation for doing this

41 41 Irreversible Segmentation Problem Solved Instrumentation Engine GDT A 0x10 B GDT B 0x10 GDT Selector: 0x2 Desc. Cache: A DS: Selector: 0x2 Desc. Cache: A DS: Selector: 0x2 Desc. Cache: A DS: Guest Writes B into GDT[0x10] Gratuitous Load performed by Instrumentation System Correct! Restore DS Save DS H/W GDT A 0x2 H/W GDT A 0x2 H/W GDT A 0x2 Selector: 0x10 Emulated DS: Selector: 0x10 Emulated DS: Selector: 0x10 Emulated DS:

42 42 Implications of Virtualization Scheme  Gratuitous loads now performed with cached descriptors  Ensures preservation of guest-expected hardware semantics  Allows PinOS to easily steal rest of table for own descriptors  With this scheme, no need for tracking guest table writes!  However, need to tame/emulate all segmentation instructions  lds/es/fs/gs/ss  mov ds/es/fs/gs/ss, […]  mov […], ds/es/fs/gs/ss  pop ds/es/fs/gs/ss  push ds/es/fs/gs/ss  lgdt, sgdt  lldt, sldt  lar, lsl, verr, verw  ltr, str, task gate transfer through interrupt  Far jumps, calls and returns, iret, sysenter and sysexit  Software interrupt: int n, into, int 3  Hardware interrupt / exception

Download ppt "PinOS: A Programmable Framework for Whole-System Dynamic Instrumentation Prashanth P. Bungale 14 th June 2007 Joint work with Chi-Keung Luk."

Similar presentations

Ads by Google