Presentation on theme: "Undefined Behavior What happened to my code?"— Presentation transcript:
1 Undefined Behavior What happened to my code? Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia,Nickolai Zeldovich, Frans KaashoekMIT CSAIL Tsinghua IIIS
2 Undefined behavior (UB) Many languages have UBC, C++, Haskell, Java, Scheme, …UB definition in the C standard“behavior, … for which this International Standard imposes no requirements”Compilers are allowed to generate any codeUB example: integer division by zero1 / 0 → trap (gcc/x86)1 / 0 → no trap (gcc/ppc, clang/any)
3 UB allows generating efficient code x / y: on ppc (no hardware div trap)If division by zero is defined to trapif (y == 0)raise(SIGFPE);… = x / y;Compiler assumes no UB: y != 0
4 UB leads to broken code Programmers invoke UB on purpose Break compiler’s no-UB assumptionProgrammers are unaware of UBOptimizations exploit UB unexpectedly
5 Problem 1: programmers invoke UB Libgcrypt & Python: division by zeroif (x == 0)… = 1 / x; /* provoke a signal */Doesn’t work on ppc: no div trapDoesn’t work on ANY architectureif (x == 0)1 / xx != 0no UBclang: dead check
6 Problem 2: innocent UB consequence PostgreSQL & Rubyif (y == 0)my_raise(); /* call longjmp()never return */… = x / y;gcc: division is always reachable
7 Contributions Survey & identify 7 UB bug patterns in systems Sanity checks goneSanity checks reordered (after uses)Expressions rewritten & brokenHappen to major C compilersgcc, clang, icc, …With just -O2 (even -O0)
8 Outline 7 UB bug patterns Finding UB is difficult Division by zeroOversized shiftSigned integer overflowOut-of-bounds pointerNull pointer dereferenceType-punned pointer dereferenceUninitialized readFinding UB is difficultResearch opportunitiesBetter language & tools
9 Bug 1: signed integer overflow INT_MAX11..10..01110..0INT_MIN (wrap)11..1INT_MAX (saturate)(trap)
10 Signed integer overflow in C Undefined behaviorCompilers assume no signed integer overflowPost-overflow check x + 1 < xCommon idiom for unsigned integersDoesn’t work for signed integers→ false
11 Broken overflow check in Linux kernel signed long offset = <userspace>; signed long len = <userspace>; /* Reject negative values */ if (offset < 0 || len <= 0) return -EINVAL; /* Sum too large? */ if (offset + len > s_maxbytes) return -EFBIG; /* Check for wrap through zero too */ if (offset + len < 0) /* Allocate offset + len bytes */gcc: offset >= 0len > 0①offset + len > 0(no signed overflow)②③gcc: if (false)
12 Signed overflow is widely misused A lot of systems got bitten by signed overflowglibc, MySQL, PostgreSQL, …IntegerLib & SafeInt from security expertsMany analysis tools get this wrongKLEE & clang static analyzerConclude x + 1 < x when x = INT_MAX!
13 Bug 2: uninitialized read A local variable in C is uninitializedHold a random value?Undefined behaviorAssign arbitrary value to uninitialized variableAssign arbitrary value to derived expression
14 Seeding random numbers in BSD libc struct timeval tv; unsigned long junk; /* XXX left uninitialized on purpose */ gettimeofday(&tv, NULL); srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);SeedProcess IDTimeJunk value
15 Seeding random numbers in BSD libc struct timeval tv; unsigned long junk; /* XXX left uninitialized on purpose */ gettimeofday(&tv, NULL); srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);(getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^clang: expression undefined
16 Uninitialized read for randomness More predictable random numbersAlso appeared in:FreeBSD (DragonFly/OS X), Varnish, Android app
17 Finding bugs due to UB is challenging Slip through code reviewEven with comments to explain why “correct”Hard to confirm broken checksInspect assembly codeHow to help programmers?
18 Existing solutions & ideas Disable optimizationsWarn against UB bugsRevise the C language
19 Approach 1: disable optimizations GCC & Clang: incomplete workaroundsForce to wrap for signed integers: -fwrapvNo workarounds for division by zero, etc.Workarounds don’t work for other compilersIntel’s icc doesn’t have –fwrapvFix code if programmers want portabilityHurt performance?
20 Performance impact is inconclusive Our results on SPECint 2006Only on 2 out of 12 programsUp to 7.2% (GCC) and 11.8% (Clang) slowdownFix slowdown by minor code changesMixed reports from others0 ~ 50% slowdownNoticeable on Android
21 Approach 2: generate good warnings if (x == 0) x = 1 / x; /* provoke a signal */ ^ Warning: division by zeroSo what? I did it intentionally!
22 A better warningif (x == 0) ^^^^^^ x = 1 / x; /* provoke a signal */ ~~~~^ Warning: eliminating check (x == 0), under the assumption that x, used as a divisor, cannot be zero.
23 Approach 3: safer C Disallow UB Any violation → trapCompilers insert checks for UBDefine UB from common impression (e.g., Java)Division by zero → trapSigned integer overflow → wrapUninitialized variable → zeroWill future compilers exploit other UB?Data race: hard to redefine
24 Research questionsHow does disabling UB-based optimizations hurt performance?How to generate informative warnings on uses of UB?To what degree can we reduce UB in the language?
25 Conclusion We identified 7 UB bug patterns Real-world systems with major compilersSubtle to notice and understandExisting solutions are insufficientPerformance impact evaluationBetter compiler warningsRevise UB in C