Presentation on theme: "Undefined Behavior What happened to my code?"— Presentation transcript:
1Undefined Behavior What happened to my code? Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia,Nickolai Zeldovich, Frans KaashoekMIT CSAIL Tsinghua IIIS
2Undefined behavior (UB) Many languages have UBC, C++, Haskell, Java, Scheme, …UB definition in the C standard“behavior, … for which this International Standard imposes no requirements”Compilers are allowed to generate any codeUB example: integer division by zero1 / 0 → trap (gcc/x86)1 / 0 → no trap (gcc/ppc, clang/any)
3UB allows generating efficient code x / y: on ppc (no hardware div trap)If division by zero is defined to trapif (y == 0)raise(SIGFPE);… = x / y;Compiler assumes no UB: y != 0
4UB leads to broken code Programmers invoke UB on purpose Break compiler’s no-UB assumptionProgrammers are unaware of UBOptimizations exploit UB unexpectedly
5Problem 1: programmers invoke UB Libgcrypt & Python: division by zeroif (x == 0)… = 1 / x; /* provoke a signal */Doesn’t work on ppc: no div trapDoesn’t work on ANY architectureif (x == 0)1 / xx != 0no UBclang: dead check
7Contributions Survey & identify 7 UB bug patterns in systems Sanity checks goneSanity checks reordered (after uses)Expressions rewritten & brokenHappen to major C compilersgcc, clang, icc, …With just -O2 (even -O0)
8Outline 7 UB bug patterns Finding UB is difficult Division by zeroOversized shiftSigned integer overflowOut-of-bounds pointerNull pointer dereferenceType-punned pointer dereferenceUninitialized readFinding UB is difficultResearch opportunitiesBetter language & tools
9Bug 1: signed integer overflow INT_MAX11..10..01110..0INT_MIN (wrap)11..1INT_MAX (saturate)(trap)
10Signed integer overflow in C Undefined behaviorCompilers assume no signed integer overflowPost-overflow check x + 1 < xCommon idiom for unsigned integersDoesn’t work for signed integers→ false
11Broken overflow check in Linux kernel signed long offset = <userspace>; signed long len = <userspace>; /* Reject negative values */ if (offset < 0 || len <= 0) return -EINVAL; /* Sum too large? */ if (offset + len > s_maxbytes) return -EFBIG; /* Check for wrap through zero too */ if (offset + len < 0) /* Allocate offset + len bytes */gcc: offset >= 0len > 0①offset + len > 0(no signed overflow)②③gcc: if (false)
12Signed overflow is widely misused A lot of systems got bitten by signed overflowglibc, MySQL, PostgreSQL, …IntegerLib & SafeInt from security expertsMany analysis tools get this wrongKLEE & clang static analyzerConclude x + 1 < x when x = INT_MAX!
13Bug 2: uninitialized read A local variable in C is uninitializedHold a random value?Undefined behaviorAssign arbitrary value to uninitialized variableAssign arbitrary value to derived expression
14Seeding random numbers in BSD libc struct timeval tv; unsigned long junk; /* XXX left uninitialized on purpose */ gettimeofday(&tv, NULL); srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);SeedProcess IDTimeJunk value
15Seeding random numbers in BSD libc struct timeval tv; unsigned long junk; /* XXX left uninitialized on purpose */ gettimeofday(&tv, NULL); srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);(getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^clang: expression undefined
16Uninitialized read for randomness More predictable random numbersAlso appeared in:FreeBSD (DragonFly/OS X), Varnish, Android app
17Finding bugs due to UB is challenging Slip through code reviewEven with comments to explain why “correct”Hard to confirm broken checksInspect assembly codeHow to help programmers?
18Existing solutions & ideas Disable optimizationsWarn against UB bugsRevise the C language
19Approach 1: disable optimizations GCC & Clang: incomplete workaroundsForce to wrap for signed integers: -fwrapvNo workarounds for division by zero, etc.Workarounds don’t work for other compilersIntel’s icc doesn’t have –fwrapvFix code if programmers want portabilityHurt performance?
20Performance impact is inconclusive Our results on SPECint 2006Only on 2 out of 12 programsUp to 7.2% (GCC) and 11.8% (Clang) slowdownFix slowdown by minor code changesMixed reports from others0 ~ 50% slowdownNoticeable on Android
21Approach 2: generate good warnings if (x == 0) x = 1 / x; /* provoke a signal */ ^ Warning: division by zeroSo what? I did it intentionally!
22A better warningif (x == 0) ^^^^^^ x = 1 / x; /* provoke a signal */ ~~~~^ Warning: eliminating check (x == 0), under the assumption that x, used as a divisor, cannot be zero.
23Approach 3: safer C Disallow UB Any violation → trapCompilers insert checks for UBDefine UB from common impression (e.g., Java)Division by zero → trapSigned integer overflow → wrapUninitialized variable → zeroWill future compilers exploit other UB?Data race: hard to redefine
24Research questionsHow does disabling UB-based optimizations hurt performance?How to generate informative warnings on uses of UB?To what degree can we reduce UB in the language?
25Conclusion We identified 7 UB bug patterns Real-world systems with major compilersSubtle to notice and understandExisting solutions are insufficientPerformance impact evaluationBetter compiler warningsRevise UB in C