Presentation is loading. Please wait.

Presentation is loading. Please wait.

Saumya Debray The University of Arizona Tucson, AZ 85721.

Similar presentations

Presentation on theme: "Saumya Debray The University of Arizona Tucson, AZ 85721."— Presentation transcript:

1 Saumya Debray The University of Arizona Tucson, AZ 85721

2 The Problem  Rapid analysis and understanding of malware code essential for swift response to new threats ‒ Malicious software are usually heavily obfuscated against analysis  Existing approaches to reverse engineering such code are primitive ‒ not a lot of high-level tool support ‒ requires a lot of manual intervention ‒ slow, cumbersome, potentially error-prone  Delays development of countermeasures

3 Goals Develop automated techniques for analysis and reverse engineering of obfuscated binaries  semantics-based ‒ output is functionally equivalent to, but simpler than, the input program  generality ‒ should work on any obfuscation  even ones we haven’t thought of yet! ‒ should minimize assumptions about obfuscations

4 Challenges  can’t make assumptions about obfuscations ‒ what do we leverage for deobfuscation? ‒ distinguishing code we care about from code we don’t  how do we know which instructions we care about?  scale ‒ “needle in haystack”  no. of instructions executed increases by  270 x (VMprotect) to  4300 x (Themida) [Lau 2008]  anti-analysis defenses ‒ runtime unpacking ‒ anti-emulation, anti-debug checks

5 Our Approach  no obfuscation-specific assumptions ‒ treat programs as input-to-output transformations ‒ use semantics-preserving transformations to simplify execution traces  dynamic analysis to handle runtime unpacking Taint analysis (bit-level) Control flow reconstruction Semantics- preserving transformations input program control flow graph map flow of values from input to output simplify logic of input-to-output transformation reconstruct logic of simplified computation

6 Ex 1:Emulation-based Obfuscation  examination of the code reveals only the emulator’s logic ‒ actual program logic embedded in byte code  lots of “chaff” during execution ‒ separating emulator logic from payload logic tricky  emulators can be nested Obfuscator input program random seed bytecode logic (data) emulator (code) mutation engine

7 Ex 2:Return-Oriented Programs (ROP)  Originally designed to bypass anti-code-injection defenses ‒ stitches together existing code fragments ( “gadgets” ), e.g., in system libraries  Logic can be difficult to discern ‒ gadgets are typically scattered across many different functions and/or libraries ‒ gadgets can overlap in memory in weird ways ‒ control flow structures (if-else, loops, function calls) are typically implemented using non-standard idioms

8 Example 1 (emulation-obfuscation) factorial (Themida)

9 Example 2 (ROP) o originalROP factorial

10 Interactions between Obfuscations Example: Unpacking + Emulation unpack output input instructions “tainted” as propagating values from input to output execution trace input-to-output computation (further simplified) used to construct control flow graph

11 Results  Ex. 1. binary search : Themida originalobfuscated (cropped) deobfuscated

12 Results  Ex. 2. Hunatcha (drive infection code) : ExeCryptor originalobfuscated (cropped) deobfuscated

13 Results  Ex. 3. Stuxnet (encryption routine) : Code Virtualizer originalobfuscated (cropped) deobfuscated

14 Results  Ex. 3. fibonacci: ROP originalobfuscated deobfuscated

15 Results  Ex. 4. Win32/Kryptik.OHY: Code Virtualizer obfuscateddeobfuscated multiple layers of runtime code generation unpacking cod e initial unpacker is emulation-obfuscated the CFG shown materializes incrementally

16 Results: CFG Similarity

17 Lessons and Issues  Static vs. dynamic analysis ‒ multiple layers of runtime code generation/unpacking limits utility of static analysis ‒ dynamic analysis can run into problems of scale  O(n 2 ) algorithms impractical ; even O(n log n) can be problematic  trade memory space for execution time/complexity  code coverage — multi-path exploration?  Taint propagation ‒ byte/word-level analyses may not be precise enough  we use (enhanced) bit-level taint propagation  Simplified trace → CFG: NP-hard ‒ semantic considerations?

18 Conclusions  Rapid analysis and understanding of malware code essential for swift response to new threats ‒ need to deal with advanced code obfuscations ‒ obfuscation-specific solutions tend to be fragile  We describe a semantics-based framework for automatic code deobfuscation ‒ no assumptions about the obfuscation(s) used ‒ promising results on obfuscators (e.g., Themida) not handled by prior research


20 Semantics-based simplification  Quasi-invariant locations: locations that have the same value at each use.  Our transformations (currently): ‒ Arithmetic simplification  adaptation of constant folding to execution traces  consider quasi-invariant locations as constants  controlled to avoid over-simplification ‒ Data movement simplification  use pattern-driven rules to identify and simplify data movement. ‒ Dead code elimination  need to consider implicit destinations, e.g., condition code flags.

Download ppt "Saumya Debray The University of Arizona Tucson, AZ 85721."

Similar presentations

Ads by Google