1 Low Overhead Program Monitoring and Profiling Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {naveen,

Slides:



Advertisements
Similar presentations
Artemis: Practical Runtime Monitoring of Applications for Execution Anomalies Long Fei and Samuel P. Midkiff School of Electrical and Computer Engineering.
Advertisements

Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Jason D. Hiser, Daniel Williams, Wei Hu, Jack W. Davidson, Jason.
Using Instruction Block Signatures to Counter Code Injection Attacks Milena Milenković, Aleksandar Milenković, Emil Jovanov The University of Alabama in.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
TDB: A Source-level Debugger for Dynamically Translated Programs Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.
B. Childers, M. L. Soffa, J. Beaver, L. Ber, K. Cammarata, J. Litman, J. Misurda Presented by: Priyanka Puri SOFTTEST: A FRAMEWORK FOR SOFTWARE.
Pipelined Profiling and Analysis on Multi-core Systems Qin Zhao Ioana Cutcutache Weng-Fai Wong PiPA.
Automatic software deployment using user-level virtualization for cloud-computing Future Generation Computer System (2013) Youhui Zhang, Yanhua Li, Weimin.
1 Integrating Influence Mechanisms into Impact Analysis for Increased Precision Ben Breech Lori Pollock Mike Tegtmeyer University of Delaware Army Research.
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.
Memory Systems Performance Workshop 2004© David Ryan Koes MSP 2004 Programmer Specified Pointer Independence David Koes Mihai Budiu Girish Venkataramani.
2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.
Multiprocessing Memory Management
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Ritu Varma Roshanak Roshandel Manu Prasanna
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
Exokernel: An Operating System Architecture for Application-Level Resource Management Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. M.I.T.
Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
Variational Path Profiling Erez Perelman*, Trishul Chilimbi †, Brad Calder* * University of Califonia, San Diego †Microsoft Research, Redmond.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.
Automatic software deployment using user-level virtualization for cloud-computing Future Generation Computer System (2013) Youhui Zhang, Yanhua Li, Weimin.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
1 Dimension: An Instrumentation Tool for Virtual Execution Environments Jing Yang, Shukang Zhou and Mary Lou Soffa Department of Computer Science University.
A Specification Language and Test Planner for Software Testing Aolat A. Adedeji 1 Mary Lou Soffa 1 1 DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF VIRGINIA.
Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems. University of Virginia. Author: Adam J. Ferrari.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Design Patterns Gang Qian Department of Computer Science University of Central Oklahoma.
Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
ARM 2007 Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM)
Determina, Inc. Persisting Information Across Application Executions Derek Bruening Determina, Inc.
Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Efficient Software Based Fault Isolation Author: Robert Wahobe,Steven Lucco,Thomas E Anderson, Susan L Graham Presenter: Maitree kanungo Date:02/17/2010.
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
IMPROVING THE PREFETCHING PERFORMANCE THROUGH CODE REGION PROFILING Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC.
Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.
Dimension: An Instrumentation Tool for Virtual Execution Environments Master’s Project Presentation by Jing Yang Department of Computer Science University.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Efficient Soft Error.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Optimistic Hybrid Analysis
Design III Chapter 13 9/20/2018 Crowley OS Chap. 13.
Operating Systems (CS 340 D)
What we need to be able to count to tune programs
Department of Computer Science University of California, Santa Barbara
Adaptive Code Unloading for Resource-Constrained JVMs
Operating Systems (CS 340 D)
Fault Tolerant Systems in a Space Environment
Department of Computer Science University of California, Santa Barbara
Dynamic Binary Translators and Instrumenters
Presentation transcript:

1 Low Overhead Program Monitoring and Profiling Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania {naveen, Department of Computer Science University of Virginia Charlottesville, Virginia Naveen Kumar, Bruce ChildersMary Lou Soffa

2 Introduction Program instrumentation: Insertion of additional code into a program –Monitor program behavior or gather information –Can be inserted at source intermediate or binary level Applications –Detect program invariants [Ernst] –Dynamic slicing [Zhang] –Software testing [Misurda] –Software security checks [Scott]

3 Running Example Consider a software security system that monitors the memory behavior of untrusted programs (e.g. Dynamo RIO) –Instrumentation at binary instruction level –Instrument all loads and stores –Program can be instrumented statically as well as dynamically

4 Static instrumentation r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe2 jmp probe3 jmp probe4 probe1: M[r[sp] ] = r[l0] save call save_gp_regs … r[o0] = M[r[sp] + 0x68 ] r[o0] = r[o0] +0x10 call secure r[o1] = r[g0] + 1 call restore_gp_regs restore r[sp] = r[sp] M[r[l0 ]+ 0x10 ] = r[o2] jmp probe1_ret probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) Example from gzip. Instrumentation performed before execution starts

5 Dynamic instrumentation r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe2 jmp probe3 jmp probe4 probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) Instrumentation performed at run-time on code that executes More powerful than static instrumentation, possibly less expensive

6 Motivation Stumbling block: high overhead –Slowdown by an order of magnitude or more [Ernst] Existing solutions: user guided –Sampling [Arnold] –Smaller data sets analyzed (test data set of SPEC instead of Ref) [Mock] –Less aggressive uses, especially in dynamic settings [Deusterwald] –User has to decide how best to apply instrumentation What is needed are automatic techniques to mitigate the overheads systematically

7 Goals Gather exact information Separate out the accuracy from efficiency –User should focus on what to gather, rather than how to efficiently gather Efficient –Comparable to hand-optimized instrumentation Automatic –No or little user guidance

8 Instrumentation Optimization Costs associated with instrumentation –Dynamic probe count: Number of probes executed –Probe cost: Number of instructions in a probe –Payload cost: Frequency of invocation and cost of payload Optimize instrumentation code to reduce costs –Dynamic probe coalescing –Partial context switches –Partial payload inlining

9 Base Instrumenter r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe2 jmp probe3 jmp probe4 probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) Base instrumenter generates a list of Instrumentation Points

10 probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) probe5: call secure(…) probe3: call secure(…) probe4: call secure(…) Dynamic Probe Coalescing r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe2 jmp probe3 jmp probe4 jmp probe5 jmp probe6 probe6: call secure(…)

11 jmp probe6 probe6: call secure(…) probe4: call secure(…) Partial Context Switch r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … jmp probe4 probe6: M[r[sp] -20 ] = r[l0] M[r[sp] -28 ] = r[o1] save call save_gp_regs … effective address … call secure … effective address … call secure … effective address … call secure call restore_gp_regs restore … … jmp probe6_ret Regs. used in payload: {…} Not used: {g0…g7} Analyze register usage in payloadRemove spill and reload of GP registers

12 void secure(address) { if(address > REDZONE) return; redAlerts++; createReport(); if(critical(address)) assert(address); } r[o1] = M[r[g1]+0] r[o1] = r[o1] - r[o0] r[i0] = 1 jmp r[31] … r[o3] = M[r[g2] +0] r[o3] = r[o3] + 1 … !call createReport … !call assert call __full_secure probe6: M[r[sp] -20 ] = r[l0] M[r[sp] -28 ] = r[o1] r[sp] = r[sp] -140 … effective address … call secure … effective address … call secure … effective address … call secure r[sp] = r[sp] … jmp probe6_ret jmp probe6 Partial Payload Inlining r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe4 void __full_secure(address, tag) { __full_secure(address, tag); } void __inlined_secure(address) {

13 Implementation Strata: dynamic translation system [Scott et. al.] –Generates code at run-time for an application –Suitable for dynamic instrumentation FIST: base instrumentation system [Kumar et. al.] –Flexible for diverse instrumentation needs –Generates a list of instrumentation points (IP’s) INS-OP: developed in this work –Constructs an IR for the list of IP’s obtained from FIST –Each optimization is a pass that modifies the IR

14 Case Studies Case study 1: Program profiling –Lightweight instrumentation application –Lower initial overhead implies lesser benefits –Demonstrates efficacy of the optimizations in an unfavorable scenario Case study 2: Memory simulation –Relatively heavy-weight instrumentation application –Can compare with state-of-the-art systems to see the benefits of optimization

15 Case study 1: Program profiling The benefit of optimization varies; depends upon the initial overhead The speedups range from 1.26 to 2.63

16 Case study 2: Memory Simulation Strata-Embra is a SPARC implementation of cache simulator from SimOS Strata-Embra-Opt is optimized cache simulator using INS-OP INS-OP optimizes the fastest cache simulator we could find by times

17 Conclusions Introduced “instrumentation optimization” to reduce the cost of instrumented code –Reduced probe count –Reduce cost of an individual probe –Reduce the cost of payload –Speedups between times More detailed information gathering –Accuracy need not be sacrificed for efficiency Feasibility of certain applications –Run-time monitoring more feasible –Example: applications that perform continuous testing

18 Effectiveness of optimizations