Low Overhead Debugging with DISE Marc L. CorlissE Christopher LewisAmir Roth Department of Computer and Information Science University of Pennsylvania.

Slides:



Advertisements
Similar presentations
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Advertisements

Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Corliss, Lewis, + Roth ISCA-30 DISE: A Programmable Macro Engine for Customizing Applications Marc Corliss, E Lewis, Amir Roth University of Pennsylvania.
Using Instruction Block Signatures to Counter Code Injection Attacks Milena Milenković, Aleksandar Milenković, Emil Jovanov The University of Alabama in.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Secure web browsers, malicious hardware, and hardware support for binary translation Sam King.
CS 153 Design of Operating Systems Spring 2015
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.
1 Low Overhead Program Monitoring and Profiling Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania {naveen,
1 DISE: A Programmable Macro Engine for Customizing Applications Marc Corliss, E Lewis, Amir Roth (U Penn), ISCA 2003 DISE (Dynamic Instruction Steam Editing)
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
Using DISE to Protect Return Addresses from Attack Marc L. Corliss, E Christopher Lewis, Amir Roth University of Pennsylvania.
University of Kansas Construction & Integration of Distributed Systems Jerry James Oct. 30, 2000.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Operating Systems Concepts Professor Rick Han Department of Computer Science University of Colorado at Boulder.
Ritu Varma Roshanak Roshandel Manu Prasanna
Microprocessors Introduction to RISC Mar 19th, 2002.
ISBN Lecture 01 Preliminaries. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.1-2 Lecture 01 Topics Motivation Programming.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Bob Gilber, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna University of California, Santa Barbara RAID 2011,9 報告者:張逸文 1.
Cosc 2150: Computer Organization
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
Native Client: A Sandbox for Portable, Untrusted x86 Native Code
29th ACSAC (December, 2013) SPIDER: Stealthy Binary Program Instrumentation and Debugging via Hardware Virtualization Zhui Deng, Xiangyu Zhang, and Dongyan.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.
1 "Efficient Software-based Fault Isolation" by R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Presenter: Tom Burkleaux.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.
QEMU, a Fast and Portable Dynamic Translator Fabrice Bellard (affiliation?) CMSC 691 talk by Charles Nicholas.
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Efficient Software-Based Fault Isolation
Threads Cannot Be Implemented As a Library
nZDC: A compiler technique for near-Zero silent Data Corruption
Chapter 9 – Real Memory Organization and Management
Overview Introduction General Register Organization Stack Organization
Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003
Threads and Memory Models Hal Perkins Autumn 2009
Outline Chapter 2 (cont) OS Design OS structure
Presentation transcript:

Low Overhead Debugging with DISE Marc L. CorlissE Christopher LewisAmir Roth Department of Computer and Information Science University of Pennsylvania

Low Overhead Debugging with DISE – Marc Corliss2 Overview Goal: Low overhead interactive debugging Solution: Implement efficient debugging primitives e.g. breakpoints and watchpoints using Dynamic Instruction Stream Editing (DISE) [ISCA ‘03] : General-purpose tool for dynamic instrumentation

Low Overhead Debugging with DISE – Marc Corliss3 Breakpoints and Watchpoints Breakpoint Interrupts program at specific point Watchpoint Interrupts program when value of expression changes Conditional Breakpoint/Watchpoint Interrupts program only when predicate is true break test.c:100 watch x break test.c:100 if i==93

Low Overhead Debugging with DISE – Marc Corliss4 User Debugging Architecture DebuggerApplication int main() { } User/debugger transitions Debugger/application transitions High overhead May be masked by user/debugger transitions Otherwise perceived as latency Spurious Transitions

Low Overhead Debugging with DISE – Marc Corliss5 Eliminating Spurious Transitions Instrument app. with breakpoint/watchpoint logic No spurious transitions Static approaches already exist During compilation or post-compilation (binary rewriting) We propose dynamic instrumentation Using DISE

Low Overhead Debugging with DISE – Marc Corliss6 Talk Outline Introduction Watchpoint implementations DISE background Watching with DISE Evaluation Related work and conclusion

Low Overhead Debugging with DISE – Marc Corliss7 Watchpoint Implementations Single-stepping Virtual memory support Dedicated hardware registers

Low Overhead Debugging with DISE – Marc Corliss8 Single-Stepping UserApplication int main() { } Trap after every statement Debugger + Easy to implement – Poor performance (many spurious transitions) DebuggerApplication int main() { } diff? yes run no diff?

Low Overhead Debugging with DISE – Marc Corliss9 Virtual Memory Support Trap when pages containing watched variables written DebuggerApplication int main() { } diff? yes run diff? page written + Reduces spurious transitions – Coarse-granularity (still may incur spurious transitions) – Spurious transitions on silent writes

Low Overhead Debugging with DISE – Marc Corliss10 Dedicated Hardware Registers Trap when particular address is written + Reduces spurious transitions – Spurious transitions on silent writes – Number and size of watchpoints limited DebuggerApplication int main() { } diff? yes run diff? watchpt written

Low Overhead Debugging with DISE – Marc Corliss11 Conditional Watchpoints Trap like unconditional, debugger evaluates predicate DebuggerApplication int main() { } + Simple extension of unconditional implementation – Introduces more spurious transitions yes run diff? pred? diff? pred? diff? pred?

Low Overhead Debugging with DISE – Marc Corliss12 Instrumenting the Application Embed (conditional) watchpoint logic into application DebuggerApplication int main() { } run diff? pred? diff? pred? diff? pred? + Eliminates all spurious transitions – Adds small overhead for each write

Low Overhead Debugging with DISE – Marc Corliss13 Dynamic Instruction Stream Editing (DISE) [ISCA ‘03] Programmable instruction macro-expander Like hardware SED (DISE = dynamic instruction SED) General-purpose mechanism for dynamic instrumentation I$executeDISE app app+instrumentation srli r9,4,r1 cmp r1,r2,r1 bne r1,Error store r4,8(r9) Example: memory fault isolation DISE

Low Overhead Debugging with DISE – Marc Corliss14 DISE Productions Production: static rewrite rule T.OPCLASS==store =>srli T.RS,4,dr0 cmp dr0,dr1,dr0 bne dr0,Error T.INST srli r9,4,dr0 cmp dr0,dr1,dr0 bne dr0,Error store r4,8(r9) Expansion: dynamic instantiation of production PatternDirective DISE Register Parameterized replacement sequence

Low Overhead Debugging with DISE – Marc Corliss15 Watching with DISE Monitor writes to memory Check if watched value(s) modified – Requires expensive load(s) for every write Optimization: address match gating Split into address check (fast) and value check (slow) Check if writing to watched address If so, then handler routine called Handler routine does value check

Low Overhead Debugging with DISE – Marc Corliss16 Watchpoint Production Interactive debugger injects production: T.OPCLASS == store =>T.INST# original instruction lda dr1,T.IMM(T.RS)# compute address bic dr1,7,dr1# quad align address cmpeq dr1,dwr,dr1# cmp to watched address ccall dr1,HNDLR# if equal call handler

Low Overhead Debugging with DISE – Marc Corliss17 Other Implementation Issues Conditional watchpoints Inline simple predicates in replacement sequence Put complex predicates in handler routine Multiple watchpoints/complex expressions For small #, inline checks in replacement sequence For large #, use bloom filter Key point: DISE is flexible

Low Overhead Debugging with DISE – Marc Corliss18 Virtues of DISE Versus dedicated hardware registers General-purpose: DISE has many other uses Safety checking [ISCA ‘03], security checking [WASSA ‘04], profiling [TR ‘02 ], (de)compression [LCTES ‘03], etc. Efficient: no spurious transitions to the debugger Flexible: more total watchpoints permitted Versus static binary transformation Simple-to-program: transformation often cumbersome Efficient: no code bloat, no transformation cost Less intrusive: Debugger and application separate

Low Overhead Debugging with DISE – Marc Corliss19 Evaluation Show DISE efficiently supports watchpoints Compare performance to other approaches Analyze debugging implementations in general Characterize performance of different approaches

Low Overhead Debugging with DISE – Marc Corliss20 Methodology Simulation using SimpleScalar Alpha Modeling aggressive, 4-way processor Benchmarks (subset of) SPEC Int 2000 Watchpoints for each benchmark HOT, WARM1, WARM2, COLD Debugger/application transition overhead 100,000 cycles

Low Overhead Debugging with DISE – Marc Corliss21 Unconditional Watchpoints GCC Single-stepping has slowdowns from 6,000-40,000

Low Overhead Debugging with DISE – Marc Corliss22 Unconditional Watchpoints VM sometimes good, sometimes awful Erratic behavior primarily due to coarse-granularity GCC

Low Overhead Debugging with DISE – Marc Corliss23 Unconditional Watchpoints Hardware registers usually good (no overhead) Hardware registers perform poorly for HOT Significant number of silent writes GCC

Low Overhead Debugging with DISE – Marc Corliss24 Unconditional Watchpoints GCC DISE overhead usually less than 25%

Low Overhead Debugging with DISE – Marc Corliss25 Conditional Watchpoints In many cases DISE outperforms hardware regs. Spurious transitions for HW regs. whenever WP written DISE/HW registers can differ by 3 orders of magnitude

Low Overhead Debugging with DISE – Marc Corliss26 Conditional Watchpoints Instrumentation overhead more consistent Instrumentation adds small cost on all writes Non-instrumentation adds high cost on some writes

Low Overhead Debugging with DISE – Marc Corliss27 Multiple Watchpoints For <5 watchpoints can use hardware registers Performance good 1-3, degrades at 4 due to silent writes For >4 watchpoints must use virtual memory Performance degrades due to coarse-granularity GCC

Low Overhead Debugging with DISE – Marc Corliss28 Multiple Watchpoints For <4 watchpoints DISE/Inlined slightly worse DISE/Inlined much better for >3 watchpoints GCC

Low Overhead Debugging with DISE – Marc Corliss29 Multiple Watchpoints For <4 DISE/B.F. slightly worse than Inlined DISE/B.F. replacement sequence includes load For >3 DISE/B.F. does the best DISE/Inlined replacement sequence too large GCC

Low Overhead Debugging with DISE – Marc Corliss30 Evaluation Results DISE watchpoints have low overhead DISE overhead usually less than 25% In many cases DISE outperforms other approaches Silent writes/conditionals  spurious transitions DISE flexibility helps keep low overhead in all scenarios Overhead of instrumentation more consistent Small cost on all writes rather than occasional large cost Non-instrumentation has 1x to 100,000x slowdown

Low Overhead Debugging with DISE – Marc Corliss31 Related Work iWatcher [Zhou et. al ‘04] Hardware-assisted debugger Associates program-specified functions with memory locations Address-based versus instruction-based Not general-purpose mechanism like DISE More significant hardware modifications than DISE Other related areas Static transformation [Kessler ‘90, Wahbe et al. ‘93] Instrumentation mechanisms [Valgrind, ATOM, EEL, Etch]

Low Overhead Debugging with DISE – Marc Corliss32 Conclusion DISE effectively supports low overhead debugging Virtues: general-purpose, flexible, simple-to-program, efficient, non-intrusive Characterize interactive debugging implementations Instrumentation has consistently low overhead