Undefined Behavior What happened to my code?

Slides:



Advertisements
Similar presentations
B261 Systems Architecture
Advertisements

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.
Improving Integer Security for Systems with KINT Xi Wang, Haogang Chen, Zhihao Jia, Nickolai Zeldovich, Frans Kaashoek MIT CSAIL Tsinghua IIIS.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
SPLINT STATIC CHECKING TOOL Sripriya Subramanian 10/29/2002.
CSE , Autumn 2011 Michael Bond.  Name  Program & year  Where are you coming from?  Research interests  Or what’s something you find interesting?
CS16 Week 2 Part 2 Kyle Dewey. Overview Type coercion and casting More on assignment Pre/post increment/decrement scanf Constants Math library Errors.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Bitwise Operators Pointers Functions Structs Interrupts (in C)
6/10/2015C++ for Java Programmers1 Pointers and References Timothy Budd.
Pointer. Warning! Dangerous Curves C (and C++) have just about the most powerful, flexible and dangerous pointers in the world. –Most other languages.
1 ES 314 Advanced Programming Lec 3 Sept 8 Goals: complete discussion of pointers discuss 1-d array examples Selection sorting Insertion sorting 2-d arrays.
C Programming for Embedded Systems. fig_06_00 Computer Layers Low-level hardware to high-level software (4GL: “domain-specific”, report-driven, e.g.)
Statically Detecting Likely Buffer Overflow Vulnerabilities David Larochelle David Evans University of Virginia Department of Computer Science Supported.
Week 3 Part I Kyle Dewey. Overview Odds & Ends Constants Errors Functions Expressions versus statements.
Carnegie Mellon 1 This Week: Integers Integers  Representation: unsigned and signed  Conversion, casting  Expanding, truncating  Addition, negation,
Types for Programs and Proofs Lecture 1. What are types? int, float, char, …, arrays types of procedures, functions, references, records, objects,...
Natalia Yastrebova What is Coverity? Each developer should answer to some very simple, yet difficult to answer questions: How do I find new.
Computer Security 2014 – Ymir Vigfusson Based on “Undefined Behavior: What Happened to My Code?” by Wang et al. APsys 2012 and Wang et al. OSDI ‘2012.
Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, Armando Solar-Lezama MIT.
8-1 Embedded Systems Fixed-Point Math and Other Optimizations.
'C' Bitwise Operators Relevance to 'C' of bitwise applications Syntax and expressions Example getter and setter functions Eratothene's prime number sieve.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Retroactive Auditing Xi Wang Nickolai Zeldovich Frans Kaashoek MIT CSAIL.
Axel Naumann. Outline  Static Code Analysis  Coverity  Reporting Tools, Report Quality  "Demo": Examples Axel Naumann Application Area Meeting2.
University of Washington Today Finished up virtual memory On to memory allocation Lab 3 grades up HW 4 up later today. Lab 5 out (this afternoon): time.
DTS ( Defect Testing System ) Yang Zhao Hong, Gong Yun Zhan,Xiao Qing, Wang Ya Wen Beijing University of Posts and Telecommunications
Overflow Examples 01/13/2012. ACKNOWLEDGEMENTS These slides where compiled from the Malware and Software Vulnerabilities class taught by Dr Cliff Zou.
The Fail-Safe C to Java translator Yuhki Kamijima (Tohoku Univ.)
Types(1). Lecture 52 Type(1)  A type is a collection of values and operations on those values. Integer type  values..., -2, -1, 0, 1, 2,...  operations.
CS 261 – Data Structures Introduction to C Programming.
Provably Correct Peephole Optimizations with Alive.
C++ for Java Programmers Chapter 2. Fundamental Daty Types Timothy Budd.
S ECURE P ROGRAMMING 6. B UFFER O VERFLOW (S TRINGS AND I NTEGERS ) P ART 2 Chih Hung Wang Reference: 1. B. Chess and J. West, Secure Programming with.
Institute for Applied Information Processing and Communications (IAIK) – Secure & Correct Systems 1 Static Checking  note for.
Pointer Lecture 2 Course Name: High Level Programming Language Year : 2010.
Week 4 - Friday.  What did we talk about last time?  Some extra systems programming stuff  Scope.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
C is a high level language (HLL)
 Data Type is a basic classification which identifies different types of data.  Data Types helps in: › Determining the possible values of a variable.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
Finding and Understanding Bugs in C Compilers Xuejun Yang Yang Chen Eric Eide John Regehr University of Utah.
Variables and Types. Primitive Built-in Type Type Meaning Minimum Size bool boolean NA char character 8 bits wchar_t wide character 16 bits char16_t Unicode.
Integer VariablestMyn1 Integer Variables It must be possible to store data items in a program, and this facility is provided by variables. A variable is.
Language-Based Security: Overview of Types Deepak Garg Foundations of Security and Privacy October 27, 2009.
Memory-Related Perils and Pitfalls in C
Dynamic Allocation in C
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
Content Coverity Static Analysis Use cases of Coverity Examples
Static Code Analysis What it is and does. Copyright © 2016 Curt Hill.
Bride of Buffer Overflow
Types for Programs and Proofs
This Week: Integers Integers Summary
2.0 FUNDAMENTALS OF JAVA PROGRAMMING LANGUAGE
Instructor: David Ferry
C Basics.
The Hardware/Software Interface CSE351 Winter 2013
Secure Coding Rules for C++ Copyright © Curt Hill
Pointers and References
Chapter 1 C for Embedded Systems.
Lecture 8.
Chapter 1 C for Embedded Systems.
Bride of Buffer Overflow
Faces of undefined behavior
Pointers.
CSC 495/583 Topics of Software Security Format String Bug (2) & Heap
Covering CWE with Programming Languages and Tools
CISC/CMPE320 - Prof. McLeod
Introduction to Static Analyzer
Presentation transcript:

Undefined Behavior What happened to my code? Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia, Nickolai Zeldovich, Frans Kaashoek MIT CSAIL Tsinghua IIIS

Undefined behavior (UB) Many languages have UB C, C++, Haskell, Java, Scheme, … UB definition in the C standard “behavior, … for which this International Standard imposes no requirements” Compilers are allowed to generate any code UB example: integer division by zero 1 / 0 → trap (gcc/x86) 1 / 0 → no trap (gcc/ppc, clang/any)

UB allows generating efficient code x / y: on ppc (no hardware div trap) If division by zero is defined to trap if (y == 0) raise(SIGFPE); … = x / y; Compiler assumes no UB: y != 0

UB leads to broken code Programmers invoke UB on purpose Break compiler’s no-UB assumption Programmers are unaware of UB Optimizations exploit UB unexpectedly

Problem 1: programmers invoke UB Libgcrypt & Python: division by zero if (x == 0) … = 1 / x; /* provoke a signal */ Doesn’t work on ppc: no div trap Doesn’t work on ANY architecture if (x == 0) 1 / x x != 0 no UB clang: dead check

Problem 2: innocent UB consequence PostgreSQL & Ruby if (y == 0) my_raise(); /* call longjmp() never return */ … = x / y; gcc: division is always reachable

Contributions Survey & identify 7 UB bug patterns in systems Sanity checks gone Sanity checks reordered (after uses) Expressions rewritten & broken Happen to major C compilers gcc, clang, icc, … With just -O2 (even -O0)

Outline 7 UB bug patterns Finding UB is difficult Division by zero Oversized shift Signed integer overflow Out-of-bounds pointer Null pointer dereference Type-punned pointer dereference Uninitialized read Finding UB is difficult Research opportunities Better language & tools

Bug 1: signed integer overflow INT_MAX 1 1..1 0..0 1 1 1 0..0 INT_MIN (wrap) 1 1..1 INT_MAX (saturate) (trap)

Signed integer overflow in C Undefined behavior Compilers assume no signed integer overflow Post-overflow check x + 1 < x Common idiom for unsigned integers Doesn’t work for signed integers → false

Broken overflow check in Linux kernel signed long offset = <userspace>; signed long len = <userspace>; /* Reject negative values */ if (offset < 0 || len <= 0) return -EINVAL; /* Sum too large? */ if (offset + len > s_maxbytes) return -EFBIG; /* Check for wrap through zero too */ if (offset + len < 0) /* Allocate offset + len bytes */ gcc: offset >= 0 len > 0 ① offset + len > 0 (no signed overflow) ② ③ gcc: if (false)

Signed overflow is widely misused A lot of systems got bitten by signed overflow glibc, MySQL, PostgreSQL, … IntegerLib & SafeInt from security experts Many analysis tools get this wrong KLEE & clang static analyzer Conclude x + 1 < x when x = INT_MAX!

Bug 2: uninitialized read A local variable in C is uninitialized Hold a random value? Undefined behavior Assign arbitrary value to uninitialized variable Assign arbitrary value to derived expression

Seeding random numbers in BSD libc struct timeval tv; unsigned long junk; /* XXX left uninitialized on purpose */ gettimeofday(&tv, NULL); srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk); Seed Process ID Time Junk value

Seeding random numbers in BSD libc struct timeval tv; unsigned long junk; /* XXX left uninitialized on purpose */ gettimeofday(&tv, NULL); srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk); (getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ clang: expression undefined

Uninitialized read for randomness More predictable random numbers Also appeared in: FreeBSD (DragonFly/OS X), Varnish, Android app

Finding bugs due to UB is challenging Slip through code review Even with comments to explain why “correct” Hard to confirm broken checks Inspect assembly code How to help programmers?

Existing solutions & ideas Disable optimizations Warn against UB bugs Revise the C language

Approach 1: disable optimizations GCC & Clang: incomplete workarounds Force to wrap for signed integers: -fwrapv No workarounds for division by zero, etc. Workarounds don’t work for other compilers Intel’s icc doesn’t have –fwrapv Fix code if programmers want portability Hurt performance?

Performance impact is inconclusive Our results on SPECint 2006 Only on 2 out of 12 programs Up to 7.2% (GCC) and 11.8% (Clang) slowdown Fix slowdown by minor code changes Mixed reports from others 0 ~ 50% slowdown Noticeable on Android

Approach 2: generate good warnings if (x == 0) x = 1 / x; /* provoke a signal */ ^ Warning: division by zero So what? I did it intentionally!

A better warning if (x == 0) ^^^^^^ x = 1 / x; /* provoke a signal */ ~~~~^ Warning: eliminating check (x == 0), under the assumption that x, used as a divisor, cannot be zero.

Approach 3: safer C Disallow UB Any violation → trap Compilers insert checks for UB Define UB from common impression (e.g., Java) Division by zero → trap Signed integer overflow → wrap Uninitialized variable → zero Will future compilers exploit other UB? Data race: hard to redefine

Research questions How does disabling UB-based optimizations hurt performance? How to generate informative warnings on uses of UB? To what degree can we reduce UB in the language?

Conclusion We identified 7 UB bug patterns Real-world systems with major compilers Subtle to notice and understand Existing solutions are insufficient Performance impact evaluation Better compiler warnings Revise UB in C