Optimistic Hybrid Analysis

Slides:



Advertisements
Similar presentations
Artemis: Practical Runtime Monitoring of Applications for Execution Anomalies Long Fei and Samuel P. Midkiff School of Electrical and Computer Engineering.
Advertisements

Effective Static Deadlock Detection
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Compiler Optimized Dynamic Taint Analysis James Kasten Alex Crowell.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.
Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.
1 Low Overhead Program Monitoring and Profiling Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania {naveen,
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Aliases in a bug finding tool Benjamin Chelf Seth Hallem June 5 th, 2002.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.
Applying Dynamic Analysis to Test Corner Cases First Penka Vassileva Markova Madanlal Musuvathi.
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
- 1 - Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor Chimera: Hybrid Program Analysis for Determinism * Chimera.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.
CSCI Rational Purify 1 Rational Purify Overview Michel Izygon - Jim Helm.
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
Introduction to Software Analysis CS Why Take This Course? Learn methods to improve software quality – reliability, security, performance, etc.
Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.
AppAudit Effective Real-time Android Application Auditing Andrew Jeong
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
INFORMATION-FLOW ANALYSIS OF ANDROID APPLICATIONS IN DROIDSAFE JARED YOUNG.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Chapter 13 Recursion Copyright © 2016 Pearson, Inc. All rights reserved.
YAHMD - Yet Another Heap Memory Debugger
Debugging Memory Issues
Seminar in automatic tools for analyzing programs with dynamic memory
runtime verification Brief Overview Grigore Rosu
CMSC 341 Lecture 5 Stacks, Queues
High Coverage Detection of Input-Related Security Faults
Optimistic Hybrid Analysis:
SUDS: An Infrastructure for Creating Bug Detection Tools
Mock Object Creation for Test Factoring
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Effective and Efficient memory Protection Using Dynamic Tainting
Jason Flinn Michael Chow, David Devecsery, Xianzheng Dou, Andrew Quinn
Dongyun Jin, Patrick Meredith, Dennis Griffith, Grigore Rosu
Pointer analysis.
Speculative execution and storage
Automatically Diagnosing and Repairing Error Handling Bugs in C
Chapter 13 Recursion Copyright © 2010 Pearson Addison-Wesley. All rights reserved.
Dynamic Binary Translators and Instrumenters
Pointer analysis John Rollinson & Kaiyuan Li
Overview of Exception Handling Implementation in Open64
Sampling Dynamic Dataflow Analyses
Presentation transcript:

Optimistic Hybrid Analysis David Devecsery, Peter M. Chen, Satish Narayanasamy, Jason Flinn University of Michigan Thank you for the introduction. We all know prog. Analysis is essential to understanding and debugging our code and systems Today I will tell you about a new analysis method which combines unsound static analysis and speculation to accelerate dynamic analysis without sacrificing its accuracy or soundness

Motivation – Problem Dynamic analysis is essential Memory errors Buffer overflow, use-after-free, double free, memory leak… Information flow Taint tracking, information providence, program slicing… Race freedom Type safety Many more We want to enforce these properties in programs Need precise (no false positives) and sound (no false negatives) program analysis David Devecsery

Motivation – Problem Problem: Dynamic analysis too expensive Information flow 1-2 orders of magnitude overhead Race detection order of magnitude slowdown Question: Can we create a faster dynamic analysis? Code Binary + Dynamic Checks int a = my_malloc() int b = my_malloc() *a = read() *b = read() int c = *a + *b Free(a) Free(b) Instrumenter David Devecsery

Motivation – Hybrid Analysis Use static analysis to optimize dynamic analysis If static analysis is sound dynamic analysis retains soundness Code Sound Static Analysis Binary + int a = my_malloc() int b = my_malloc() *a = read() *b = read() int c = *a + *b Free(a) Free(b) Dynamic Checks Instrumenter David Devecsery

Sound Static Analysis is Inaccurate Makes conservative approximations for soundness Sound static analysis has high false positive rate Sound static analysis considers too many states to adequately optimize dynamic analysis! P – Possible Program States S – Over-approximation used by analysis P S David Devecsery

Optimistic Hybrid Analysis Novel dynamic analysis optimization methodology Combines profiling, unsound static analysis, and speculative dynamic execution Unsound static analysis – better optimize analysis Speculative execution – handles analysis unsoundness No loss of soundness or precision Optimistic Hybrid Backward Slicer: OptSlice Retains accuracy of traditional dynamic slicer On average: 7.7x faster than hybrid dynamic slicer David Devecsery

Overview Motivation Optimistic Hybrid Analysis Overview Likely Invariant Gathering Optimistic Static Analysis Speculative Dynamic Analysis OptSlice – Optimistic Dynamic Backward Slicer Evaluation Conclusion David Devecsery

Idea: Optimistic Analysis Observation: Hybrid analysis need static analysis to explore only states executed by analyzed executions P – Possible Program States S – Over-approximation used by analysis P S C – Common Executed States C C David Devecsery

Idea: Optimistic Analysis Idea: Instead use unsound static analysis We call: Optimistic Static Analysis Have dynamic analysis detect & handle unsoundness P – Possible Program States S – Over-approximation used by analysis C – Common states P S O O - Optimistic Static Analysis C David Devecsery

Motivation – Hybrid Analysis Optimistic Static Analysis Code P C O Sound Static Analysis int a = my_malloc() int b = my_malloc() *a = read() *b = read() int c = *a + *b Free(a) Free(b) Binary + Dynamic Checks Instrumenter Speculation Checks David Devecsery

Optimistic Hybrid Analysis Phases 1 Profiling Likely Invariants 2 Optimistic Static Analysis 3 Speculative Dynamic Analysis Binary + P C O Profiling Dynamic Checks Speculation Checks David Devecsery

Optimistic Hybrid Analysis 1 Profiling Likely Invariants 2 Optimistic Static Analysis 3 Speculative Dynamic Analysis Binary + P C O Profiling Dynamic Checks Speculation Checks David Devecsery

Likely Invariants Goal: Guide optimistic static analysis towards Static analysis assumes likely invariants true Reduce state space searched Add bounded unsoundness – if invariant false Likely Invariants should be: Strong - Help prune states considered by static analysis Cheap - Be inexpensive to check dynamically Stable - Hold true for most analyzed executions Gathered by profiling pass over program C David Devecsery

Our Invariants Likely Unreachable Code Likely Callee Sets Identifies regions of code which are not dynamically run Stops static analysis from considering code Likely Callee Sets Identifies targets of indirect function calls Removes indirect function calls from analysis Likely Unreachable Call Contexts Identifies call-stacks reached dynamically Prunes call-graph in context-sensitive analysis David Devecsery

Likely Callee Sets Assume indirect calls can only call dynamically observed functions Source Code Static Analysis Graph fcn() { 1: hash_fcn_t *fcn = get_hash_fcn(); 2: hash = *fcn(arg1, arg2); } 3: sha1(arg1, arg2) { … } 4: sha256(arg1, arg2) {…} 5: sha3(arg1, arg2) {…} 6: md5 (arg1, arg2) {…} 7: ripemd(arg1, arg2) {…} 8: whirlpool(arg1, arg2) {…} 3 4 5 2 6 7 8 David Devecsery

Likely Callee Sets Source Code Static Analysis Graph 3 4 5 2 6 7 8 fcn() { 1: hash_fcn_t *fcn = get_hash_fcn(); 2: hash = *fcn(arg1, arg2); } 3: sha1(arg1, arg2) { … } 4: sha256(arg1, arg2) {…} 5: sha3(arg1, arg2) {…} 6: md5 (arg1, arg2) {…} 7: ripemd(arg1, arg2) {…} 8: whirlpool(arg1, arg2) {…} 3 4 5 2 6 7 8 David Devecsery

Likely Callee Sets Indirect Functions typically only call a small set of functions Source Code Static Analysis Graph fcn() { 1: hash_fcn_t *fcn = get_hash_fcn(); 2: hash = *fcn(arg1, arg2); } 3: sha1(arg1, arg2) { … } 4: sha256(arg1, arg2) {…} 5: sha3(arg1, arg2) {…} 6: md5 (arg1, arg2) {…} 7: ripemd(arg1, arg2) {…} 8: whirlpool(arg1, arg2) {…} 3 4 Likely Invariant: *fcn is sha2 or sha3 5 2 6 7 8 David Devecsery

Likely Callee Sets Indirect Functions typically only call a small set of functions Source Code Static Analysis Graph fcn() { 1: hash_fcn_t *fcn = get_hash_fcn(); 2: hash = *fcn(arg1, arg2); } 3: sha1(arg1, arg2) { … } 4: sha256(arg1, arg2) {…} 5: sha3(arg1, arg2) {…} 6: md5 (arg1, arg2) {…} 7: ripemd(arg1, arg2) {…} 8: whirlpool(arg1, arg2) {…} 4 Likely Invariant: *fcn is sha2 or sha3 5 2 David Devecsery

Likely Unused Call Contexts Source Code Analysis Graph main() { 1: a = my_malloc(); 2: b = my_malloc();} my_malloc() { if (!g_init) 3: return do_init(); 4: return malloc(…);} do_init() { g_init = true; 5: // Long complex initialization code} Context-Sensitive 11 21 31 41 32 42 51 52 David Devecsery

Likely Unused Call Contexts Remove call-stacks which never dynamically happen Source Code Analysis Graph main() { 1: a = my_malloc(); 2: b = my_malloc();} my_malloc() { if (!g_init) 3: return do_init(); 4: return malloc(…);} do_init() { g_init = true; 5: // Long complex initialization code} Context-Sensitive 11 21 Dynamically Never called 31 41 32 42 51 52 David Devecsery

Optimistic Hybrid Analysis 1 Profiling Likely Invariants 2 Optimistic Static Analysis 3 Speculative Dynamic Analysis Binary + P C O Profiling Dynamic Checks Speculation Checks David Devecsery

Optimistic Static Analysis Assumes likely invariants from profiling are true Likely Invariants allow analysis to approximate C If likely invariants hold, analysis is sound for execution Sound Static Analysis Perfect Static Analysis Optimistic Static Analysis P S P = S P C O Do transition from traditional analysis to optimistic analysis C C S > P S = P O < P David Devecsery

Optimistic Hybrid Analysis 1 Profiling Likely Invariants 2 Optimistic Static Analysis 3 Speculative Dynamic Analysis Binary + P C O Profiling Dynamic Checks Speculation Checks David Devecsery

Speculative Dynamic Analysis Runs optimized analysis Fraction of original checks Added likely-invariant speculation checks Lightweight by design (by likely-invariant choices) May have to roll-back on mis-speculation Mis-speculations are rare Worst case: full-execution roll-back re-execute On re-execute – use conservative dynamic analysis Eg. A traditional hybrid analysis David Devecsery

Overview Motivation Optimistic Hybrid Analysis Overview Likely Invariants Optimistic Static Analysis Speculative Dynamic Analysis OptSlice – Optimistic Dynamic Backward Slicer Evaluation Conclusion David Devecsery

OptSlice Optimistic Dynamic Backwards Slicer Uses three likely invariants discussed Uses optimistic static points-to and slicing analyses Heavyweight (full-execution) roll-back recovery Profiled Likely Invariants Optimistic Points-To Used Likely Invariants Instrumented Binary w/ Invariant Checks Profiler Dynamic Slicing Instrumenter Dynamic Execution Points-to Sets Optimistic Static Slicer Profiled Likely Invariants Static Slices & Used Likely Invariants David Devecsery

OptSlice – Misc. Details Static analyses have: Call context-sensitive and insensitive versions Use the most accurate version that will scale Optimistic Static Analysis processes less states Generally more scalable Built on-top of Giri dynamic slicer (ASPLOS 13) David Devecsery

Evaluation All experiments bounded to 8hrs, 16GB RAM We use the best static analysis that will complete Analysis built in LLVM-3.1 Goal: Show how OHA techniques enable OptSlice to accelerate dynamic analysis Evaluate three portions OptSlice speedup Invariant profiling requirements Static analysis improvements David Devecsery

Runtime Graph David Devecsery

Runtime On Average: 7.7x Speedup Graph David Devecsery

Profiling Graph David Devecsery

Static Points-to Analysis Graph David Devecsery

Conclusion Optimistic Hybrid Analysis OptSlice Novel analysis methodology: combines unsound static analysis, and speculative dynamic analysis Accelerates dynamic analysis without sacrificing accuracy or soundness OptSlice Optimistic Hybrid Backward Slicer 7.7x average speedup versus traditional hybrid slicing David Devecsery

Questions? David Devecsery