Presentation on theme: "IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State."— Presentation transcript:
iBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State University D’Crypt Pte Ltd School of Information Systems, Singapore Management University 123 1 2 3
Introduction Binary Hunting: automatically finding Semantic Differences in binary programs Need to capture Semantic Differences –Differences in functionality (input-output behavior) Syntactic Differences cause false positives –Differences in instructions –Register allocation –Basic-block reordering –Variables rename –….
An example: gzip Different instructions in two versions, but with the same semantics A patch with 5 lines of code All the 75 non-empty functions are changed xor eax, eaxand ebx, 0 1 Gzip Long File Name Buffer Overflow Vulnerability http://www.securityfocus.com/bid/3712 1
Importance of Binary Hunting Security applications of binary hunting Finding security vulnerabilities with patched binary –“BinHunt: Automatically finding semantic differences in binary programs”, ICICS 2008 Automatic patch-based exploit (1-day exploit ) generation –“Automatic Patch-Based Exploit Generation is Possible”, IEEE S&P 2008 Software plagiarism detection –“GPLAG: detection of software plagiarism by program dependence graph analysis”, KDD 2006 Adapting trained anomaly detectors to software patches –“Automatically adapting a trained anomaly detector to software patches”, RAID 2009 Malware analysis –“Polymorphic worm detection using structural information of executables”, RAID 2005 –“Large-scale malware indexing using function-call graphs”, CCS 2009 …
Challenge Source code of binary files is not available Function name extracted from these binary files are unreliable Variety of obfuscation …… Latest solutions -- find similarity/difference in control flow structure rather than binary instructions –Resistant to “superficial” changes –Example: BinDiff, BinHunt, DarunGrim, SMIT
Intra-procedural control flow vs. Inter-procedural control flow Intra-procedural control flow –Most previous work focus on the intra- procedural control flow. –Sub-graph isomorphism problem is NP- complete. –Example: 96% of non-empty functions of thttpd have fewer than 30 basic blocks. –Graph isomorphism is practical in analyzing intra-procedural control flow Inter-procedural control flow –No function boundary –Huge graph with large size of nodes, where graph isomorphism is impractical –Example: thttpd-2.25 totally has more than 4,300 basic blocks. More than 4,000 candidate matchings for single basic block
Function Transformation Obfuscation Function transformation obfuscation is well-studied –Inlining functions –Outlining functions –Cloning functions –Interleaving functions Performing such obfuscation is simple and without intensive analysis of the binaries. 1 C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations. Technical Report 148, Department of Computer Sciences, The University of Auckland, July 1997. Inlining and outlining transformations 1
Advanced control flow obfuscation Control flow flattening –“Protection of software-based survivability mechanisms”, DSN 2001 –“An Approach to the Obfuscation of Control- Flow of Sequential Computer Programs” ， ISC 2001 Redirecting control-flow with exceptions –“Binary Obfuscation Using Signals”, USENIX Security 2007 – “binOb+: a framework for potent and stealthy binary obfuscation” ， AsiaCCS 2010 Function boundary information (Intra-procedural control flow) is not reliable !
Overview of iBinHunt iBinHunt: Binary Diffing with Inter-Procedural Control Flow Graphs iBinHunt provides practical solutions to large number of basic block matchings –Dynamic Tainting: Monitor the execution of the two binary programs under a common input and use taint analysis to record all basic blocks involved in the processing of the input. –Deep taint: assign different taint tags to various parts of the input; only basic blocks from two binary programs that are marked with the same taint tags are considered matching candidates ( a reduction factor of up to 74% ). –Basic block comparison: symbolic execution is first used to represent outputs of the basic blocks with their input symbols, and a theorem prover is then used to check if the outputs from the two basic block are semantically equivalent. –Automatic input generation: increases the coverage of tainted basic blocks by automatically generating inputs that result in different execution traces.
Deep taint for basic block comparison Inter-Procedural Control Flow Graphs Deep taint execution trace Deep Taint Basic block comparison
An example: thttpd Input and its taint tag colors Dynamic execution traces with Deep taint
Basic Blocks comparison Symbolic execution and theorem proving –Use symbolic execution to represent final values of outputs (registers and variables) –Use a theorem prover to test if the outputs of two basic blocks are always the same given the same inputs Context aware –the permutation of outputs of the equivalent basic blocks is the permutation of inputs of the successor blocks. Obtain the matching strength based on the result from the theorem
Basic block matching we need to consider two other groups of blocks for finding matched blocks. Blocks are not semantically equivalent but with the same taint tags Blocks are not tainted but on the dynamic execution trace They could very likely be the differences between the two programs that iBinHunt is trying to locate. E.g., BB_13232 and BB_16184 are the location of binary difference Due to various reasons including limitations of taint analysis, not directly processing program inputs (e.g., signal processing), etc.
Matching Strength Basic blocks B 1 and B 2 are considered matched to one another if B 1 and B 2 have the same taint tags (possibly non-tainted) and B 1 and B 2 are semantically equivalent (evaluated by symbolic execution and a theorem proving); or a predecessor of B 1 and a predecessor of B 2 match; or a successor of B 1 and a successor of B 2 match. predecessor successor
Automatic Input Generation Symbolic ExecutionConcrete Execution Symbolic Formula Initial Input: GET index.html HTTP/1.1 Host:. Constraint Solver (STP) New Input
Evaluation We applied iBinHunt to find semantic differences in several versions of thttpd and gzip. There are two main aspects on which we want to evaluate: – Efficiency: how many basic blocks can be matched under our definition of matching strength, how many matchings are identified by deep taint, and how long it takes to find these matchings. – Accuracy: confirm these differences by comparing them to the ground truth (program source code). Different versions of thttpd and gzip (number of lines changed / total number of lines) thttpd -2.202.20c2.212.25 2.19252/6059254/58431483/66412908/7271 gzip-18.104.22.168.131.40 1.2.41317/49591351/49291446/4841
Matching basic blocks We evaluate: Matched basic blocks that are semantically the same; Matched ones that are not semantically equivalent but have both a predecessor and a successor matched; Basic blocks are not semantically equivalent but have either a predecessor or a successor matched. The time taken by input generation and deep taint;
Effectiveness of deep taint Results show that more than 34% and 67% of the matched basic blocks in thttpd and gzip contain the same taint tags. – a large number of these matchings do contain the same taint tags; – even though many basic blocks are not tainted by our limited number of program inputs, their neighbors are tainted in most cases and the tainted neighbors help matchings to be identified. Percentage of matched basic blocks with the same taint representation thttpd-2.202.20c2.212.25 2.1934.8%38.2%39.9%37.4% gzip-22.214.171.124.131.40 1.2.467.9%72.2%72.6%
Accuracy BB_1371 from thttpd-2.19 should match with BB_1689 in thttpd-2.25, both of which deal with the “-i” argument. However, BB_1687 in thttpd-2.25 also contains the same (type of) instructions, which confuses the binary diffing tool in the matching.
Discussions Limitations –The power of iBinHunt is limited by the non-perfect basic block coverage. –In our experiments with thttpd and gzip, some basic blocks are not covered even if we continue to generate new program inputs –Performance Future work –More optimization on the code to improve efficiency. –Parallelizing Dynamic Taint Tracking –More in-depth binary difference analysis, in which (part of) the programs are only semantically equivalent on certain subset of the inputs.
Conclusion Introduce function obfuscation attacks in existing binary diffing tools that analyze intra-procedural control flow of programs. Propose a novel binary diffing tool called iBinHunt which analyzes the inter-procedural control flow. iBinHunt makes use of a novel technique called deep taint.