Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, Armando Solar-Lezama MIT.

Slides:



Advertisements
Similar presentations
Undefined Behavior What happened to my code?
Advertisements

Page Replacement Algorithms
Improving Integer Security for Systems with KINT Xi Wang, Haogang Chen, Zhihao Jia, Nickolai Zeldovich, Frans Kaashoek MIT CSAIL Tsinghua IIIS.
1 Symbolic Execution Kevin Wallace, CSE
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Introduction to Linked Lists In your previous programming course, you saw how data is organized and processed sequentially using an array. You probably.
The Case for a SC-preserving Compiler Madan Musuvathi Microsoft Research Dan Marino Todd Millstein UCLA University of Michigan Abhay Singh Satish Narayanasamy.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
Fortran Jordan Martin Steven Devine. Background Developed by IBM in the 1950s Designed for use in scientific and engineering fields Originally written.
MPI and C-Language Seminars Seminar Plan (1/3)  Aim: Introduce the ‘C’ Programming Language.  Plan to cover: Basic C, and programming techniques.
Moving Target Defense in Cyber Security
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
CMPT 225 Sorting Algorithms Algorithm Analysis: Big O Notation.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
ISBN Chapter 3 Describing Syntax and Semantics.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
CS 1114: Data Structures – memory allocation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Pointer. Warning! Dangerous Curves C (and C++) have just about the most powerful, flexible and dangerous pointers in the world. –Most other languages.
Linked lists and memory allocation Prof. Noah Snavely CS1114
Describing Syntax and Semantics
Christo Wilson Project 3: Virtual Memory in Pintos
Guide To UNIX Using Linux Third Edition
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Maninder Kaur VIRTUAL MEMORY 24-Nov
University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,
CS 350 Operating Systems & Programming Languages Ethan Race Oren Rasekh Christopher Roberts Christopher Rogers Anthony Simon Benjamin Ramos.
Charles Curtsinger UMass at Amherst Benjamin Livshits and Benjamin Zorm Microsoft Research Christian Seifert Microsoft 20 th USENIX Security Symposium.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Testing CSE 140 University of Washington 1. Testing Programming to analyze data is powerful It’s useless if the results are not correct Correctness is.
Program Development Life Cycle (PDLC)
O VERVIEW OF THE IBM J AVA J UST - IN -T IME C OMPILER Presenters: Zhenhua Liu, Sanjeev Singh 1.
Introduction and Overview Summer 2014 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.
1 Assertions. 2 assertions communicate assumptions about the state of the program, and stop processing if they turn out to be false very often comments.
Vasileios P. Kemerlis, Georgios Portokalidis, Angelos D. Keromytis Network Security Lab, Department of Computer Science, Columbia University, USA 21 st.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
1 Problem Solving with C++ The Object of Programming Walter Savitch Chapter 1 Introduction to Computers and C++ Programming Slides by David B. Teague,
Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 15: More-Advanced Concepts.
Advanced Computer Architecture Lab University of Michigan USENIX Security ’03 Slide 1 High Coverage Detection of Input-Related Security Faults Eric Larson.
Retroactive Auditing Xi Wang Nickolai Zeldovich Frans Kaashoek MIT CSAIL.
Arithmetic Expressions
The Fail-Safe C to Java translator Yuhki Kamijima (Tohoku Univ.)
Provably Correct Peephole Optimizations with Alive.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
A Tool for Pro-active Defense Against the Buffer Overrun Attack D. Bruschi, E. Rosti, R. Banfi Presented By: Warshavsky Alex.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
CMSC 202 Advanced Section Classes and Objects: Object Creation and Constructors.
S ECURE P ROGRAMMING 6. B UFFER O VERFLOW (S TRINGS AND I NTEGERS ) P ART 2 Chih Hung Wang Reference: 1. B. Chess and J. West, Secure Programming with.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
17 th ACM CCS (October, 2010).  Introduction  Problem Statement  Approach  RG Design  Implementation  Related Work 2 A Seminar at Advanced Defense.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Prof. Necula CS 164 Lecture 171 Operational Semantics of Cool ICOM 4029 Lecture 10.
Introduction to C Programming
Beyond Stack Smashing: Recent Advances In Exploiting Buffer Overruns Jonathan Pincus and Brandon Baker Microsoft Researchers IEEE Security and.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Finding and Understanding Bugs in C Compilers Xuejun Yang Yang Chen Eric Eide John Regehr University of Utah.
Introduction and Overview Winter 2013 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.
Dynamic Allocation in C
More important details More fun Part 3
High Coverage Detection of Input-Related Security Faults
Loops CIS 40 – Introduction to Programming in Python
Chapter 15 Debugging.
CSC-682 Advanced Computer Security
CSE 153 Design of Operating Systems Winter 19
COMP755 Advanced Operating Systems
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, Armando Solar-Lezama MIT CSAIL 24 th ACM SOSP (November, 2013) Best Paper

O UTLINE Introduction Model for Unstable Code Design & Implementation Evaluation 2013/11/26 A Seminar at Advanced Defense Lab 2

I NTRODUCTION The specifications of C-family languages designate certain code fragments as having undefined behavior. giving compilers the freedom to generate instructions Aiming for system programming, the specifications choose to trust programmers and assume that their code will never invoke undefined behavior. 2013/11/26 3 A Seminar at Advanced Defense Lab

U NDEFINED B EHAVIOR IN C p, q, p’: n-bit pointer x, y : n-bit integer a : array 2013/11/26 4 A Seminar at Advanced Defense Lab

C OMPILER O PTIMIZATION One way in which compilers exploit undefined behavior is to optimize a program under the assumption that the program NEVER invokes undefined behavior. Consequence: Origin program ≠ Optimized program We call such code optimization-unstable code, or just unstable code for short. 2013/11/26 5 A Seminar at Advanced Defense Lab

U NSTABLE C ODE E XAMPLE Vulnerability Note VU# (US-CERT) [link]link 2013/11/26 6 A Seminar at Advanced Defense Lab =>Compiler think: always false

U NSTABLE C ODE E XAMPLE ( CONT.) CVE [link]link Linux Kernel [LXR link]LXR link Programmer put the check at an improper position, but it can work /11/26 7 A Seminar at Advanced Defense Lab =>Compiler think: always false

Is this programmers’ fault? Poor understanding of unstable code is a major obstacle to reasoning about system behavior. However, these bugs are quite subtle, and understanding them requires detailed knowledge of the language specification. 2013/11/26 8 A Seminar at Advanced Defense Lab

Is this compilers’ fault? A story: GCC bug #30475 (2007/01/15) [link]link “This will create MAJOR SECURITY ISSUES in ALL MANNER OF CODE. I don’t care if your language lawyers tell you gcc is right.... FIX THIS! NOW!” A GCC user “I am not joking, the C standard explictly says signed integer overflow is undefined behavior.... GCC is not going to change.” A GCC developer 2013/11/26 9 A Seminar at Advanced Defense Lab

U NSTABLE C ODE T EST The default optimization level for release build is -O /11/26 10 A Seminar at Advanced Defense Lab

M ODEL FOR U NSTABLE C ODE 2013/11/26 11 A Seminar at Advanced Defense Lab

A PPROACH FOR I DENTIFYING U NSTABLE C ODE Stack does this using a two-phase scheme 1. Run optimizer O without taking advantage of undefined behavior, which resembles optimizations under C* 2. Run optimizer O again, this time taking advantage of undefined behavior, which resembles (more aggressive) optimizations under C. 2013/11/26 12 A Seminar at Advanced Defense Lab

W ELL - DEFINED P ROGRAM A SSUMPTION 2013/11/26 13 A Seminar at Advanced Defense Lab

E LIMINATING U NREACHABLE C ODE 2013/11/26 14 A Seminar at Advanced Defense Lab

S IMPLIFYING U NNECESSARY C OMPUTATION 2013/11/26 15 A Seminar at Advanced Defense Lab

S IMPLIFICATION O RACLE Boolean oracle: propose true and false in turn for a boolean expression, enumerating possible values Algebra oracle: propose to eliminate common terms on both sides of a comparison if one side is a subexpression of the other x + y y < /11/26 16 A Seminar at Advanced Defense Lab

L IMITATION It is possible to exploit the well-defined program assumption in other forms. 2013/11/26 17 A Seminar at Advanced Defense Lab

D ESIGN & I MPLEMENTATION Implement with LLVM + Boolector solver 2013/11/26 18 A Seminar at Advanced Defense Lab

C OMPILER F RONTEND To reduce false warnings, Stack ignores such compiler-generated code by tracking code origins, at the cost of missing possible bugs. 2013/11/26 19 A Seminar at Advanced Defense Lab

UB C ONDITION I NSERTION Stack inserts a special function call into the IR at the corresponding instruction void bug_on(bool expr) 2013/11/26 20 A Seminar at Advanced Defense Lab

S OLVER - BASED A LGORITHM To implement these algorithms, Stack consults the Boolector solver to decide satisfiability for elimination and simplification queries. But it is practically infeasible to precisely compute them for large programs. To address this challenge, Stack computes approximate queries by limiting the computation to a single function. With Tu and Padua’s algorithm 2013/11/26 21 A Seminar at Advanced Defense Lab

E VALUATION New bug: 160 (July 2012  March 2013) 2013/11/26 22 A Seminar at Advanced Defense Lab

A NALYSIS OF B UG R EPORTS Non-optimization bugs Urgent optimization bugs Time bombs Redundant code (false alarm) 2013/11/26 23 A Seminar at Advanced Defense Lab

A NALYSIS OF B UG R EPORTS ( CONT.) Non-optimization Bugs Example: PostgreSQL [link]link 2013/11/26 24 A Seminar at Advanced Defense Lab Time bomb!!

P RECISION Kerberos: 11 warning Developers accepted every patch false warning rate: 0/11 Postgres: STACK produced 68 warnings 9 patches accepted 29 patches in discussion: developers blamed compilers 26 time bombs 4 false warnings 2013/11/26 25 A Seminar at Advanced Defense Lab

P ERFORMANCE 64-bit Ubuntu (Linux) Intel Core i GHz 24GB memory Solver time out: 5s 2013/11/26 26 A Seminar at Advanced Defense Lab

P REVALENCE OF U NSTABLE C ODE All packages in Debian Wheezy archive: 17,432 Containing C/C++ code: 8,575 Containing unstable code: 3,471 (40%) 150 CPU day to analyze 2013/11/26 27 A Seminar at Advanced Defense Lab

P REVALENCE OF U NSTABLE C ODE ( CONT.) 2013/11/26 28 A Seminar at Advanced Defense Lab

C OMPLETENESS It is difficult to known precisely how much unstable code Stack would miss in general. We analyze what kind of unstable code Stack misses. A total of ten tests from real systems Result: 7/ /11/26 29 A Seminar at Advanced Defense Lab

Q & A 2013/11/26 A Seminar at Advanced Defense Lab 30