DieHard: Memory Error Fault Tolerance in C and C++ Ben Zorn Microsoft Research In collaboration with Emery Berger and Gene Novark, Univ. of Massachusetts.

Slides:



Advertisements
Similar presentations
Why Have The OSGi Specifications Been Based On Java Technology ? By Peter Kriens, CEO aQute OSGi Technology Officer
Advertisements

Lazy Asynchronous I/O For Event-Driven Servers Khaled Elmeleegy, Anupam Chanda and Alan L. Cox Department of Computer Science Rice University, Houston,
Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute.
Software Fault Tolerance for Type-unsafe Languages Ben Zorn Microsoft Research In collaboration with Emery Berger, Univ. of Massachusetts Karthik Pattabiraman,
1 Concurrency: Deadlock and Starvation Chapter 6.
Chapter 17 vector and Free Store Bjarne Stroustrup
Distributed Systems Architectures
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Remote Educational Programming Of Robots (REPOR) Tord Fauskanger Aurelie Aurilla Bechina Arntzen Dag Samuelsen Buskerud University College.
1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
By Rick Clements Software Testing 101 By Rick Clements
DCV: A Causality Detection Approach for Large- scale Dynamic Collaboration Environments Jiang-Ming Yang Microsoft Research Asia Ning Gu, Qi-Wei Zhang,
Chapter 5 Input/Output 5.1 Principles of I/O hardware
Chapter 6 File Systems 6.1 Files 6.2 Directories
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, ThanassisAvgerinos,
SE-292 High Performance Computing
Configuration management
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.
Troubleshooting Startup Problems
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Database Performance Tuning and Query Optimization
ETS4 - What's new? - How to start? - Any questions?
Software testing.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
SE-292: High Performance Computing
Page Replacement Algorithms
Cache and Virtual Memory Replacement Algorithms
Chapter 10: Virtual Memory
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Component-Based Software Engineering Main issues: assemble systems out of (reusable) components compatibility of components.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Error-handling using exceptions
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Chapter 10 Software Testing
Restartability Manage- ment in the Cisco Core Router CRS/NG Stefan Schaeckeler (Cisco Systems, Inc.) Ashwin Narasimha Murthy (Google, Inc.)
Executional Architecture
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 15 Programming and Languages: Telling the Computer What to Do.
SE-292 High Performance Computing
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
PSSA Preparation.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
1 Symbolic Execution Kevin Wallace, CSE
 2003 Prentice Hall, Inc. All rights reserved. 1 Chapter 13 - Exception Handling Outline 13.1 Introduction 13.2 Exception-Handling Overview 13.3 Other.
Installing Windows XP Professional Using Attended Installation Slide 1 of 30Session 8 Ver. 1.0 CompTIA A+ Certification: A Comprehensive Approach for all.
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Computer Security: Principles and Practice EECS710: Information Security Professor Hossein Saiedian Fall 2014 Chapter 10: Buffer Overflow.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2007 Exterminator: Automatically Correcting Memory Errors with High Probability Gene.
DIEHARDER: SECURING THE HEAP. Previously in DieHard…  Increase Reliability by random positioning of data  Replicated Execution detects invalid memory.
By Gene Novark, Emery D. Berger and Benjamin G. Zorn Presented by Matthew Kent Exterminator: Automatically Correcting Memory Errors with High Probability.
Ben Livshits Based in part of Stanford class slides from and slides from Ben Zorn’s talk slides.
By Emery D. Berger and Benjamin G. Zorn Presented by: David Roitman.
Tolerating and Correcting Memory Errors in C and C++ Ben Zorn Microsoft Research In collaboration with: Emery Berger and Gene Novark, Umass - Amherst Karthik.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science PLDI 2006 DieHard: Probabilistic Memory Safety for Unsafe Programming Languages Emery.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2006 Exterminator: Automatically Correcting Memory Errors Gene Novark, Emery Berger.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Computer Security and Penetration Testing
Presentation of Failure- Oblivious Computing vs. Rx OS Seminar, winter 2005 by Lauge Wullf and Jacob Munk-Stander January 4 th, 2006.
A Tool for Pro-active Defense Against the Buffer Overrun Attack D. Bruschi, E. Rosti, R. Banfi Presented By: Warshavsky Alex.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
Fault Tolerant, Efficient, and Secure Runtimes Ben Zorn Research in Software Engineering (RiSE) Microsoft Research In collaboration with: Emery Berger.
Presented by: Daniel Taylor
Presentation transcript:

DieHard: Memory Error Fault Tolerance in C and C++ Ben Zorn Microsoft Research In collaboration with Emery Berger and Gene Novark, Univ. of Massachusetts Ted Hart, Microsoft Research Ben Zorn, Microsoft Research1DieHard: Memory Error Fault Tolerance in C and C++

Buffer overflow char *c = malloc(100); c[101] = a; Dangling reference char *p1 = malloc(100); char *p2 = p1; free(p1); p2[0] = x; a Focus on Heap Memory Errors Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 2 c 0 99 p p2 x

Ben Zorn, Microsoft Research Motivation Consider a shipped C program with a memory error (e.g., buffer overflow) By language definition, undefined In practice, assertions turned off – mostly works I.e., data remains consistent What if you know it has executed an illegal operation? Raise an exception? Continue unsoundly (failure oblivious computing) Continue with well-defined semantics 3 DieHard: Memory Error Fault Tolerance in C and C++

Research Vision Increase robustness of installed code base Potentially improve millions of lines of code Minimize effort – ideally no source mods, no recompilation Reduce requirement to patch Patches are expensive (detect, write, deploy) Patches may introduce new errors Enable trading resources for robustness E.g., more memory implies higher reliability Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 4

Ben Zorn, Microsoft Research Research Themes Make existing programs more fault tolerant Define semantics of programs with errors Programs complete with correct result despite errors Go beyond all-or-nothing guarantees Type checking, verification rarely a 100% solution C#, Java both call to C/C++ libraries Traditional engineering allows for errors by design Complement existing approaches Static analysis has scalability limits Managed code especially good for new projects DART, Fuzz testing effective for generating illegal test cases 5 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Approaches to Protecting Programs Unsound, may work or abort Windows, GNU libc, etc. Unsound, might continue Failure oblivious (keep going) [Rinard] Invalid read => manufacture value Illegal write => ignore Sound, definitely aborts (fail-safe, fail-fast) CCured [Necula], others Sound and continues DieHard, Rx, Boundless Memory Blocks, hardware fault tolerance 6 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Outline Motivation DieHard Collaboration with Emery Berger Replacement for malloc/free heap allocation No source changes, recompile, or patching, required Exterminator Collaboration with Emery Berger, Gene Novark Automatically corrects memory errors Suitable for large scale deployment Conclusion 7 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research DieHard: Probabilistic Memory Safety Collaboration with Emery Berger Plug-compatible replacement for malloc/free in C lib We define infinite heap semantics Programs execute as if each object allocated with unbounded memory All frees ignored Approximating infinite heaps – 3 key ideas Overprovisioning Randomization Replication Allows analytic reasoning about safety 8 DieHard: Memory Error Fault Tolerance in C and C++

Overprovisioning, Randomization Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 9 Expand size requests by a factor of M (e.g., M=2) Randomize object placement Pr(write corrupts) = ½ ? Pr(write corrupts) = ½ !

Replication (optional) Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 10 Replicate process with different randomization seeds P P3 input Broadcast input to all replicas Compare outputs of replicas, kill when replica disagrees P1 Voter

Ben Zorn, Microsoft Research DieHard Implementation Details Multiply allocated memory by factor of M Allocation Segregate objects by size (log2), bitmap allocator Within size class, place objects randomly in address space Randomly re-probe if conflicts (expansion limits probing) Separate metadata from user data Fill objects with random values – for detecting uninit reads Deallocation Expansion factor => frees deferred Extra checks for illegal free 11 DieHard: Memory Error Fault Tolerance in C and C++

Segregated size classes - Static strategy pre-allocates size classes - Adaptive strategy grows each size class incrementally Ben Zorn, Microsoft Research Over-provisioned, Randomized Heap 2 H = max heap size, class i L = max live size H/2 F = free = H-L object size = 16object size = 8 … 12 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Randomness enables Analytic Reasoning Example: Buffer Overflows k = # of replicas, Obj = size of overflow With no replication, Obj = 1, heap no more than 1/8 full: Pr(Mask buffer overflow), = 87.5% 3 replicas: Pr(ibid) = 99.8% 13 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research DieHard CPU Performance (no replication) 14 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research DieHard CPU Performance (Linux) 15 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Correctness Results Tolerates high rate of synthetically injected errors in SPEC programs Detected two previously unreported benign bugs (197.parser and espresso) Successfully hides buffer overflow error in Squid web cache server (v 2.3s5) But dont take my word for it… 16 DieHard: Memory Error Fault Tolerance in C and C++

DieHard Demo DieHard (non-replicated) Windows, Linux version implemented by Emery Berger Available: Adaptive, automatically sizes heap Detours-like mechanism to automatically redirect malloc/free calls to DieHard DLL Application: Mozilla, version Known buffer overflow crashes browser Takeaways Usable in practice – no perceived slowdown Roughly doubles memory consumption 20.3 Mbytes vs Mbytes with DieHard Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 17

Ben Zorn, Microsoft Research Caveats Primary focus is on protecting heap Techniques applicable to stack data, but requires recompilation and format changes DieHard trades space, extra processors for memory safety Not applicable to applications with large footprint Applicability to server apps likely to increase DieHard requires non-deterministic behavior to be made deterministic (on input, gettimeofday(), etc.) DieHard is a brute force approach Improvements possible (efficiency, safety, coverage, etc.) 18 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Outline Motivation DieHard Collaboration with Emery Berger Replacement for malloc/free heap allocation No source changes, recompile, or patching, required Exterminator Collaboration with Emery Berger, Gene Novark Automatically corrects memory errors Suitable for large scale deployment Conclusion 19 DieHard: Memory Error Fault Tolerance in C and C++

Exterminator Motivation DieHard limitations Tolerates errors probabilistically, doesnt fix them Memory and CPU overhead Provides no information about source of errors Note – DieHard still extremely useful Ideal addresses the limitations Program automatically detects and fixes memory errors Corrected program has no memory, CPU overhead Sources of errors are pinpointed, easier for human to fix Exterminator = correcting allocator Joint work with Emery Berger, Gene Novark Random allocation => isolates bugs instead of tolerating them Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 20

Exterminator Components Architecture of Exterminator dictated by solving specific problems How to detect heap corruptions effectively? DieFast allocator How to isolate the cause of a heap corruption precisely? Heap differencing algorithms How to automatically fix buggy C code without breaking it? Correcting allocator + hot allocator patches Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 21

DieFast Allocator Randomized, over-provisioned heap Canary = random bit pattern fixed at startup Leverage extra free space by inserting canaries Inserting canaries Initialization – all cells have canaries On allocation – no new canaries On free – put canary in the freed object with prob. P Remember where canaries are (bitmap) Checking canaries On allocation – check cell returned On free – check adjacent cells Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C

12 Installing and Checking Canaries Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 23 Allocate Install canaries with probability P Check canary Free Initially, heap full of canaries 1

Heap Differencing Strategy Run program multiple times with different randomized heaps If detect canary corruption, dump contents of heap Identify objects across runs using allocation order Key insight: Relation between corruption and object causing corruption is invariant across heaps Detect invariant across random heaps More heaps => higher confidence of invariant Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 24

12 Attributing Buffer Overflows Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 25 One candidate! 43 corrupted canary Which object caused? delta is constant but unknown ? 1243 Run 2 Run 1 Now only 2 candidates Run Precision increases exponentially with number of runs

Detecting Dangling Pointers (2 cases) Dangling pointer read/written (easy) Invariant = canary in freed object X has same corruption in all runs Dangling pointer only read (harder) Sketch of approach (paper explains details) Only fill freed object X with canary with probability P Requires multiple trials: log 2 (number of callsites) Look for correlations, i.e., X filled with canary => crash Establish conditional probabilities Have: P(callsite X filled with canary | program crashes) Need: P(crash | filled with canary), guess prior to compute Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 26

Correcting Allocator Group objects by allocation site Patch object groups at allocate/free time Associate patches with group Buffer overrun => add padding to size request malloc(32) becomes malloc(32 + delta) Dangling pointer => defer free free(p) becomes defer_free(p, delta_allocations) Fixes preserve semantics, no new bugs created Correcting allocation may != DieFast or DieHard Correction allocator can be space, CPU efficient Patches created separately, installed on-the-fly Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 27

Deploying Exterminator Exterminator can be deployed in different modes Iterative – suitable for test environment Different random heaps, identical inputs Complements automatic methods that cause crashes Replicated mode Suitable in a multi/many core environment Like DieHard replication, except auto-corrects, hot patches Cumulative mode – partial or complete deployment Aggregates results across different inputs Enables automatic root cause analysis from Watson dumps Suitable for wide deployment, perfect for beta release Likely to catch many bugs not seen in testing lab Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 28

DieFast Overhead Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 29

Exterminator Effectiveness Squid web cache buffer overflow Crashes glibc malloc 3 runs sufficient to isolate 6-byte overflow Mozilla buffer overflow (recall demo) Testing scenario - repeated load of buggy page 23 runs to isolate overflow Deployed scenario – bug happens in middle of different browsing sessions 34 runs to isolate overflow Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 30

Comparison with Existing Approaches Static analysis, annotations Finds individual bugs, developer still has to fix High cost developing, testing, deploying patches DieHard reduces threat of all memory errors Testing, OCA / Watson dumps Finds crashes, developer still has find root cause Type-safe languages (C#, etc.) Large installed based of C, C++ Managed runtimes, libraries have lots of C, C++ Also has a memory cost Ben Zorn, Microsoft Research DieHard: Memory Error Fault Tolerance in C and C++ 31

Ben Zorn, Microsoft Research Conclusion Programs written in C / C++ can execute safely and correctly despite memory errors Research vision Improve existing code without source modifications Reduce human generated patches required Increase reliability, security by order of magnitude Current projects and results DieHard: overprovisioning + randomization + replicas = probabilistic memory safety Exterminator: automatically detect and correct memory errors (with high probability) Demonstrated success on real applications 32 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Hardware Trends Hardware transient faults are increasing Even type-safe programs can be subverted in presence of HW errors Academic demonstrations in Java, OCaml Soft error workshop (SELSE) conclusions Intel, AMD now more carefully measuring Not practical to protect everything Faults need to be handled at all levels from HW up the software stack Measurement is difficult How to determine soft HW error vs. software error? Early measurement papers appearing 33 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Power to Spare DRAM prices dropping 2Gb, Dual Channel PC 6400 DDR2 800 MHz $85 Multicore CPUs Quad-core Intel Core 2 Quad, AMD Quad-core Opteron Eight core Intel by 2008? Challenge: How should we use all this hardware? 34 DieHard: Memory Error Fault Tolerance in C and C++

Additional Information Web sites: Ben Zorn: DieHard: Exterminator: Publications Emery D. Berger and Benjamin G. Zorn, "DieHard: Probabilistic Memory Safety for Unsafe Languages", PLDI06. Gene Novark, Emery D. Berger and Benjamin G. Zorn, Exterminator: Correcting Memory Errors with High Probability", PLDI07. Ben Zorn, Microsoft Research35 DieHard: Memory Error Fault Tolerance in C and C++

Backup Slides Ben Zorn, Microsoft Research36 DieHard: Memory Error Fault Tolerance in C and C++

Ben Zorn, Microsoft Research Related Work Conservative GC (Boehm / Demers / Weiser) Time-space tradeoff (typically >3X) Provably avoids certain errors Safe-C compilers Jones & Kelley, Necula, Lam, Rinard, Adve, … Often built on BDW GC Up to 10X performance hit N-version programming Replicas truly statistically independent Address space randomization (as in Vista) Failure-oblivious computing [Rinard] Hope that program will continue after memory error with no untoward effects 37 DieHard: Memory Error Fault Tolerance in C and C++