Backwards-Compatible Array Bounds Checking for C with Very Low Overhead Dinakar Dhurjati and Vikram Adve ICSE 2006 Itay Polack 02.03.2014.

Slides:



Advertisements
Similar presentations
Symbol Table.
Advertisements

Dynamic memory allocation
An introduction to pointers in c
Dynamic Memory Management
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
Computer Security: Principles and Practice EECS710: Information Security Professor Hossein Saiedian Fall 2014 Chapter 10: Buffer Overflow.
Lecture 16 Buffer Overflow modified from slides of Lawrie Brown.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Pointer applications. Arrays and pointers Name of an array is a pointer constant to the first element whose value cannot be changed Address and name refer.
Kernighan/Ritchie: Kelley/Pohl:
Memory allocation CSE 2451 Matt Boggus. sizeof The sizeof unary operator will return the number of bytes reserved for a variable or data type. Determine:
Malloc Recitation Section K (Kevin Su) November 5 th, 2012.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
A Practical Dynamic Buffer Overflow Detector (CRED) Olatunji Ruwase Monica S. Lam Transmeta Corp. Stanford University Network and Distributed Security.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
1 Pointers A pointer variable holds an address We may add or subtract an integer to get a different address. Adding an integer k to a pointer p with base.
SAFECode SAFECode: Enforcing Alias Analysis for Weakly Typed Languages Dinakar Dhurjati University of Illinois at Urbana-Champaign Joint work with Sumant.
Chapter 10.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
Memory Arrangement Memory is arrange in a sequence of addressable units (usually bytes) –sizeof( ) return the number of units it takes to store a type.
More on protocol implementation Packet parsing Memory management Data structures for lookup.
1 Efficient Memory Safety for TinyOS 2.1 Yang Chen Nathan Cooprider Will Archer Eric Eide David Gay † John Regehr University of Utah School of Computing.
CS 61C L03 C Arrays (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #3: C Pointers & Arrays
HARDBOUND: ARCHITECURAL SUPPORT FOR SPATIAL SAFETY OF THE C PROGRAMMING LANGUAGE Kyle Yan Yu Xing 2014/10/15.
Adapted from Dr. Craig Chase, The University of Texas at Austin.
Guide To UNIX Using Linux Third Edition
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
. Memory Management. Memory Organization u During run time, variables can be stored in one of three “pools”  Stack  Static heap  Dynamic heap.
Pointers Applications
Lecture 16 Buffer Overflow
1 CISC181 Introduction to Computer Science Dr. McCoy Lecture 19 Clicker Questions November 3, 2009.
Review C++ exception handling mechanism Try-throw-catch block How does it work What is exception specification? What if a exception is not caught?
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Presentation of Failure- Oblivious Computing vs. Rx OS Seminar, winter 2005 by Lauge Wullf and Jacob Munk-Stander January 4 th, 2006.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Chapter 0.2 – Pointers and Memory. Type Specifiers  const  may be initialised but not used in any subsequent assignment  common and useful  volatile.
Lecture slides prepared for “Computer Security: Principles and Practice”, 3/e, by William Stallings and Lawrie Brown, Chapter 10 “Buffer Overflow”.
Computer Science and Software Engineering University of Wisconsin - Platteville 2. Pointer Yan Shi CS/SE2630 Lecture Notes.
Storage Management. The stack and the heap Dynamic storage allocation refers to allocating space for variables at run time Most modern languages support.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
1 C++ Classes and Data Structures Jeffrey S. Childs Chapter 4 Pointers and Dynamic Arrays Jeffrey S. Childs Clarion University of PA © 2008, Prentice Hall.
Pointers and Dynamic Memory Allocation Copyright Kip Irvine 2003, all rights reserved. Revised 10/28/2003.
1 Dynamic Memory Allocation –The need –malloc/free –Memory Leaks –Dangling Pointers and Garbage Collection Today’s Material.
Dynamic memory allocation and Pointers Lecture 4.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
CSCI Rational Purify 1 Rational Purify Overview Michel Izygon - Jim Helm.
Protecting C Programs from Attacks via Invalid Pointer Dereferences Suan Hsi Yong, Susan Horwitz University of Wisconsin – Madison.
Computer Graphics 3 Lecture 1: Introduction to C/C++ Programming Benjamin Mora 1 University of Wales Swansea Pr. Min Chen Dr. Benjamin Mora.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 10 – C: the heap and manual memory management.
1 Lecture07: Memory Model 5/2/2012 Slides modified from Yin Lou, Cornell CS2022: Introduction to C.
CS 106 Introduction to Computer Science I 03 / 02 / 2007 Instructor: Michael Eckmann.
ICOM 4035 – Data Structures Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 – August 23, 2001.
MORE POINTERS Plus: Memory Allocation Heap versus Stack.
SEQUENTIAL AND OBJECT ORIENTED PROGRAMMING Arrays.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Code improvement: Coverity static analysis Valgrind dynamic analysis GABRIELE COSMO CERN, EP/SFT.
Dynamic Allocation in C
YAHMD - Yet Another Heap Memory Debugger
CSE 374 Programming Concepts & Tools
Storage Management.
Checking Memory Management
Pointers Revisited What is variable address, name, value?
Dynamic Memory Allocation
Optimizing Malloc and Free
Pointers C#, pointers can only be declared to hold the memory addresses of value types int i = 5; int *p; p = &i; *p = 10; // changes the value of i to.
CETS: Compiler-Enforced Temporal Safety for C
Presentation transcript:

Backwards-Compatible Array Bounds Checking for C with Very Low Overhead Dinakar Dhurjati and Vikram Adve ICSE 2006 Itay Polack

Introduction Unsafe programming language gives unlimited freedom to programmers – Direct access to memory – Manual resource handling This has many benefits: – Performance – Flexibility – Simpler compilers

Resulting Problems The programmer is responsible for maintaining correct code – In particular, only access valid memory Stack, global or heap – But things can easily go wrong Accessing memory that was not allocated or already released Pointer arithmetic that goes out of bounds

Resulting Problems - Cont. Consequences are severe and unpredictable – Program crashes Unexpected Hard to debug – Unexpected behavior – Security vulnerabilities

Goal Detect out-of-bounds bugs Input-sensitve bugs – For example: string manipulation – Not always caught on development or testing systems – Detect on production systems – And then what?

Proposed Solution: Runtime Monitoring Monitor programs during runtime Detect out-of-bounds errors during runtime – Illegal pointer access – Out-of-bounds arrays access Crash and burn!

Runtime Monitoring Keep track of all pointers during runtime Detect illegal access and react immediately This is not easy to achieve – Performance cost – Memory cost – Compile time cost – Compatibility External libraries Legacy code

How Can We Achieve That? Runtime bounds checking – Add checks during compile time – Keep and validate pointers state during runtime Standard library function wrappers Optimizations!

Pointer Tracking Based on the ANSI-C standard – Pointers must point at valid memory – Pointer arithmetic result must stay within the same object or one byte after it We will keep track of all objects – For a given pointer value, search the object it's pointing on – Make sure that arithemtic operations are not getting out of the same object bounds(+1) Assign illegal pointer values for illegal operations – Immediate crash when the program tries to access it

Pointer Tracking - Example int *p = (int*)malloc(sizeof(int)*4); p[0] = 1; if (p[4] != 5) {... }

Pointer Tracking - Example int *p = (int*)my_alloc(sizeof(int)*4); int *tmp_p = bounds_check(p + 0); *tmp_p = 1; int *tmp_p2 = bounds_check(p + 4); if (*tmp_p2 != 5) {... }

Pointer Tracking - Example my_alloc(void *ptr, size_t size) { ptr = malloc(size); add_object(ptr, size); } bounds_check(void * ptr) { if (ptr == -2 || ! find_object(ptr)) return -2; return ptr; }

Pointer Tracking - One Off We are allowed to point one byte past an object – How can we distinguish pointing one byte past an object, and pointing on another one? – Solution: padding –...but what about backward compatibility ?

Pointer Tracking - One Off

Pointer Tracking - Data Structures What data structure should we use to store the memory objects mapping? Splay-tree: quick insertion and lookup, range searching, good locality – Provides O(log(N)) for basic operations – Use a global splay-tree for the whole application. What happens when N is large

Splay Tree - Reminder Binary tree Self-adjusting (splay) O(log(N)) amortized time for basic operations Splay operation – Re-arrange the tree - bring elements to the top Perform tree rotations – Faster access – Can perform on local variables

Monitoring Out-Of-Bounds Pointers: Improvement Real-world: many programs (~60%) do not follow the rules – Illegal values are fine if we do not access them – We need to keep track of pointers even if they are out-of- bounds Introduce out-of-bound objects – When pointer arithmetic operation results in illegal values, replace in a special out-of-bounds (OOB) object – Keep track of the pointer using this OOB object Holds the original pointer value, and the pre-OOB operation value Use a hash-table to map address-to-OOB – Restore the pointer when its value got back to safety

Maintaining OOB Objects Is Pricey Allocate a new object if arithmetic operation led to illegal value Search the OOB for any pointer arithmetic operation resulting in unknown memory All load/store operations must be checked for OOB De-allocation of any object requires extensive search – Any OOB might originally pointed on this object – Must search the whole OOB table

Are We Done? We have good detection of illegal pointer access, illegal arithmetic operations, and maybe more Backward compatibility is still an issue – Padding requirement Performance cost is very high, it is not really suitable for production – We can limit the checks to string operations only – Still not good enough

Introduction: Automatic Pool Allocation Original purpose: memory access optimization and easier analysis Allocate whole data-structures in a designated pool – All data-structure nodes are in the same memory pool – Locality – better cache and prefetching performance – Easier to analyze

Automatic Pool Allocation - Implementation Pointer analysis Build a data-structure graph – Each node represent memory object – Edge between memory objects that might point to each other – Merge nodes that point on the same data- structure Order all nodes of a data-structure subgraph in their own pool Pools are short-lived, and follow the call-graph

Automatic Pool Allocation

Leveraging Automatic Pool Allocation Reminder: previous works used a single data- structure to maintain all memory objects – Huge splay-tree - high performance cost Apply automatic pool allocation – Each pool will have its own splay-tree – Computed in compile time

Leveraging Automatic Pool Allocation Validating pointer arithmetic operations is much faster now – Result of pointer arithmetic must stay in the same pool – We only need to search one, smaller, splay tree – The pool ID is already known - low overhead

Leveraging Automatic Pool Allocation

Leveraging Automatic Pool Allocation - Challenges Do we always have the pool descriptor? – Casting – External code – Just ignore What about non-heap objects? – Global variables – Stack allocated objects – Create dummy pool descriptors

Handling Out-Of-Bound Objects Reminder: previous works used a special “out-of-bounds” objects – Keep track of pointer arithmetic operations that went out-of-bounds – Very high cost

Handling Out-Of-Bound Objects Assign special memory values to out-of-bounds pointers – Use a reserved range For example: kernel-reserved memory range – Unique address for each OOB pointer – Maintain an additional table for mapping those addresses to OOBs – Hash-table per pool Immediate crash on load/store – no need to monitor Very little to search on free

Compatibility With External Code The modifications we introduced cannot always work with external libraries Memory allocation and deallocation is changed – External libraries are not aware of it – Sometimes they modify variables Functions interfaces change – Functions passed as callbacks cannot change their interface

Compatibility - Solutions Do not change calls to external code Suspect pointers that were passed to external code – Check if they still reside in the same pool Callback functions – Maintain “checked” and “unchecked” versions of the function – Not always possible - exclude functions from bound checking

Library Functions Incorrect usage of library functions is extremely common Considered as an external code – But too important to skip Create instrumented standard library wrappers – Bounds checking based on parameters and pointers status – Optional

Library Functions - Example memcpy(void *p1, void *p2, size_t n) { // Is n > 0? // Are p1 and p2 valid? // Is (p1 + n) valid? // Is (p2 + n) valid? }

Library Functions - Challenges Wrapper functions need to be hand-crafted We don't always have all the information – For example: strlen() – Wrapper might not be always enough

More Optimizations Single-object elements objects are common – Scalar values – Single-element arrays – We still need to check for out-of-bounds errors Avoid entering such objects to splay-trees – Detection: pool size equals the object size – If it has no splay tree but belongs to the pool – it's a single-object element

And Even More Optimizations Caching – Very small cache, before even checking the splay-tree LICM – Do we really need to check the same object each loop iteration?

Implementation LLVM – Compiler infrastructure – Supports automatic pool allocation Apply optimizations and then use GCC for generating the binaries

Evaluation Performance – How are we doing compared to previous works? – How is the overall performance? Effectiveness – Did we spot all the bugs?

Evaluation - Benchmarks Use the Olden benchmark and Linux daemons for comparing performance – Common benchmark used in many relevant works Use Zitser's suite for testing the detection ratio

Evaluation - Baselines Baselines: standard compilation with no instrumentation We want to evaluate each of the steps – Are they really effective? – Pool allocation, with no bounds checking (PA) – Pool allocation, with bound checking (BoundsCheck) – Pool allocation with one pool – Pool allocation with one pool and bound checking

Evaluation – Performance Results

Evaluation - Discussion Automatic pool allocation by itself usually improves performances Average slowdown ratio is 12% – In some cases, it’s much worse – In some cases, it’s better – Why?

Evaluation – Efficiency Results All known bugs were found in the test- suite Checking standard-library functions was mandatory

Evaluation – Conclusions Low overhead in many scenarios – Could be useful for non-critical production systems – Is it possible to evaluate the possible overhead? Good bug-detection ratio – Still limited – Do we really cover anything?

Related Works– Augmented Pointers Pointers hold additional meta-data – Pointer base address and size – Efficient lookup Compatibility with external code is problematic – Need to strip pointers before calling external functions – Need manually written wrappers – What if external library modifies a global variable?

Related Works– Augmented Pointers Suggested improvement: decouple meta- data – Keep the pointers meta-data in a separate table – High performance cost – Global variables issue is not resolved

Related Works – Binary Instrumentation Tools such as Valgrind and Purify Binary instrumentation – No backward compatibility problem Performance cost is too high for production

Questions?

Discussion Is it ready for production ? What about other problems ? – Double-free? – Accessing initialized data? – Memory leaks? Could we use a better data-structure? – Hash-map with partial keys? Other suggestions?