Download presentation

Presentation is loading. Please wait.

Published byRylee Larkey Modified over 2 years ago

1
Saumya Debray The University of Arizona Tucson, AZ 85721

2
The Problem Rapid analysis and understanding of malware code essential for swift response to new threats ‒ Malicious software are usually heavily obfuscated against analysis Existing approaches to reverse engineering such code are primitive ‒ not a lot of high-level tool support ‒ requires a lot of manual intervention ‒ slow, cumbersome, potentially error-prone Delays development of countermeasures

3
Goals Develop automated techniques for analysis and reverse engineering of obfuscated binaries semantics-based ‒ output is functionally equivalent to, but simpler than, the input program generality ‒ should work on any obfuscation even ones we haven’t thought of yet! ‒ should minimize assumptions about obfuscations

4
Challenges can’t make assumptions about obfuscations ‒ what do we leverage for deobfuscation? ‒ distinguishing code we care about from code we don’t how do we know which instructions we care about? scale ‒ “needle in haystack” no. of instructions executed increases by 270 x (VMprotect) to 4300 x (Themida) [Lau 2008] anti-analysis defenses ‒ runtime unpacking ‒ anti-emulation, anti-debug checks

5
Our Approach no obfuscation-specific assumptions ‒ treat programs as input-to-output transformations ‒ use semantics-preserving transformations to simplify execution traces dynamic analysis to handle runtime unpacking Taint analysis (bit-level) Control flow reconstruction Semantics- preserving transformations input program control flow graph map flow of values from input to output simplify logic of input-to-output transformation reconstruct logic of simplified computation

6
Ex 1:Emulation-based Obfuscation examination of the code reveals only the emulator’s logic ‒ actual program logic embedded in byte code lots of “chaff” during execution ‒ separating emulator logic from payload logic tricky emulators can be nested Obfuscator input program random seed bytecode logic (data) emulator (code) mutation engine

7
Ex 2:Return-Oriented Programs (ROP) Originally designed to bypass anti-code-injection defenses ‒ stitches together existing code fragments ( “gadgets” ), e.g., in system libraries Logic can be difficult to discern ‒ gadgets are typically scattered across many different functions and/or libraries ‒ gadgets can overlap in memory in weird ways ‒ control flow structures (if-else, loops, function calls) are typically implemented using non-standard idioms

8
Example 1 (emulation-obfuscation) factorial (Themida)

9
Example 2 (ROP) o originalROP factorial

10
Interactions between Obfuscations Example: Unpacking + Emulation unpack output input instructions “tainted” as propagating values from input to output execution trace input-to-output computation (further simplified) used to construct control flow graph

11
Results Ex. 1. binary search : Themida originalobfuscated (cropped) deobfuscated

12
Results Ex. 2. Hunatcha (drive infection code) : ExeCryptor originalobfuscated (cropped) deobfuscated

13
Results Ex. 3. Stuxnet (encryption routine) : Code Virtualizer originalobfuscated (cropped) deobfuscated

14
Results Ex. 3. fibonacci: ROP originalobfuscated deobfuscated

15
Results Ex. 4. Win32/Kryptik.OHY: Code Virtualizer obfuscateddeobfuscated multiple layers of runtime code generation unpacking cod e initial unpacker is emulation-obfuscated the CFG shown materializes incrementally

16
Results: CFG Similarity

17
Lessons and Issues Static vs. dynamic analysis ‒ multiple layers of runtime code generation/unpacking limits utility of static analysis ‒ dynamic analysis can run into problems of scale O(n 2 ) algorithms impractical ; even O(n log n) can be problematic trade memory space for execution time/complexity code coverage — multi-path exploration? Taint propagation ‒ byte/word-level analyses may not be precise enough we use (enhanced) bit-level taint propagation Simplified trace → CFG: NP-hard ‒ semantic considerations?

18
Conclusions Rapid analysis and understanding of malware code essential for swift response to new threats ‒ need to deal with advanced code obfuscations ‒ obfuscation-specific solutions tend to be fragile We describe a semantics-based framework for automatic code deobfuscation ‒ no assumptions about the obfuscation(s) used ‒ promising results on obfuscators (e.g., Themida) not handled by prior research

20
Semantics-based simplification Quasi-invariant locations: locations that have the same value at each use. Our transformations (currently): ‒ Arithmetic simplification adaptation of constant folding to execution traces consider quasi-invariant locations as constants controlled to avoid over-simplification ‒ Data movement simplification use pattern-driven rules to identify and simplify data movement. ‒ Dead code elimination need to consider implicit destinations, e.g., condition code flags.

Similar presentations

OK

Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.

Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on audio spotlighting powerpoint Ppt on leverages in financial management Ppt on pricing strategy in retail Ppt on green revolution and its impact on india Ppt on asian development bank Ppt on solar system formation Ppt on classical economics theories Ppt on mission to mars Ppt on email etiquettes presentation college Ppt on quality education academy