Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat

Slides:



Advertisements
Similar presentations
Smashing the Stack for Fun and Profit
Advertisements

Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
© 2006 Nathan RosenblumMarch 2006Unconventional Code Constructs The New Dyninst Code Parser: Binary Code Isn't as Simple as it Used to Be Nathan Rosenblum.
David Brumley Carnegie Mellon University Credit: Some slides from Ed Schwartz.
Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
ITEC 352 Lecture 27 Memory(4). Review Questions? Cache control –L1/L2  Main memory example –Formulas for hits.
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
Pipelined Profiling and Analysis on Multi-core Systems Qin Zhao Ioana Cutcutache Weng-Fai Wong PiPA.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
PipelinedImplementation Part I CSC 333. – 2 – Overview General Principles of Pipelining Goal Difficulties Creating a Pipelined Y86 Processor Rearranging.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
Accessing parameters from the stack and calling functions.
Position Independent Code self sufficiency of combining program.
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
Practical Session 8 Computer Architecture and Assembly Language.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
6.828: PC hardware and x86 Frans Kaashoek
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 7 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Paradyn Project Dyninst/MRNet Users’ Meeting Madison, Wisconsin August 7, 2014 The Evolution of Dyninst in Support of Cyber Security Emily Gember-Jacobson.
A Model for Self-Modifying Code Bertrand Anckaert, Matias Madou and Koen De Bosschere 8 th Information Hiding Conference, July 11 th 2006.
University of Washington x86 Programming III The Hardware/Software Interface CSE351 Winter 2013.
Today’s topics Parameter passing on the system stack Parameter passing on the system stack Register indirect and base-indexed addressing modes Register.
Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
CS412/413 Introduction to Compilers and Translators April 14, 1999 Lecture 29: Linking and loading.
Introduction to Information Security ROP – Recitation 5.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
Compiler Construction Code Generation Activation Records
Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.
© 2006 Andrew R. BernatMarch 2006Generalized Code Relocation Generalized Code Relocation for Instrumentation and Efficiency Andrew R. Bernat University.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 1, 2013 Detecting Code Reuse Attacks Using Dyninst Components Emily Jacobson, Drew.
1 Assembly Language: Function Calls Jennifer Rexford.
Calling Procedures C calling conventions. Outline Procedures Procedure call mechanism Passing parameters Local variable storage C-Style procedures Recursion.
Carnegie Mellon Midterm Review : Introduction to Computer Systems October 15, 2012 Instructor:
Practical Session 8. Position Independent Code- self sufficiency of combining program Position Independent Code (PIC) program has everything it needs.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Recitation 3: Procedures and the Stack
Instruction Set Architecture
Introduction to Information Security
Introduction to programming languages, Algorithms & flowcharts
C function call conventions and the stack
Introduction to programming languages, Algorithms & flowcharts
Performance Optimizations in Dyninst
Conditional Branch Example
Introduction to Information Security
143A: Principles of Operating Systems Lecture 4: Calling conventions
Exploiting & Defense Day 2 Recap
Optimization Code Optimization ©SoftMoore Consulting.
Introduction to Compilers Tim Teitelbaum
Chapter 3 Machine-Level Representation of Programs
Introduction to programming languages, Algorithms & flowcharts
asum.ys A Y86 Programming Example
Computer Architecture and Assembly Language
The Runtime Environment
Optimizing Your Dyninst Program
The Runtime Environment
Efficient x86 Instrumentation:
Miscellaneous Topics.
Chapter 3 Machine-Level Representation of Programs
X86 Assembly Review.
ICS51 Introductory Computer Organization
Computer Architecture and System Programming Laboratory
Computer Architecture and Assembly Language
Return-to-libc Attacks
Presentation transcript:

Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat

Correct Relocation -2- Binary Manipulation We want to: –Insert new code –Modify or delete code –These operations move program code Binaries are brittle –Code movement may affect program semantics We want to move code without breaking the program

Correct Relocation -3- Relocation Relocation moves code while maintaining its original execution semantics –May radically transform the code Does not rely on external information Binary tools use relocation extensively –Execute original + relocated code (Dyninst) –Always execute relocated code (PIN, Valgrind, DynamoRIO, VMWare, DELI) Relocation is critical for binary manipulation

Correct Relocation -4- 0x5000: mov 0x1011, ebx... 0x5000: mov 0x1011, ebx... 0x3000: mov 0x8(ebp), eax... 0x3000: mov 0x8(ebp), eax... 0x4000: ja 0x x4000: ja 0x x5000: call ebx_thunk foo: 0x1000: push ebp 0x1001: mov esp, ebp 0x1003: mov 0x8(ebp), eax 0x1006: cmp 0x5, eax 0x1009: ja 0x30 0x100b: call ebx_thunk 0x1011: add ebx, eax... ebx_thunk: 0x2000: mov (esp), ebx 0x2003: ret foo: 0x1000: push ebp 0x1001: mov esp, ebp 0x1003: mov 0x8(ebp), eax 0x1006: cmp 0x5, eax 0x1009: ja 0x30 0x100b: call ebx_thunk 0x1011: add ebx, eax... ebx_thunk: 0x2000: mov (esp), ebx 0x2003: ret Relocation Examples 0x4000: ja -0x2ff7... 0x4000: ja -0x2ff7... 0x5000: push $0x1011 0x5005: jmp ebx_thunk... 0x5000: push $0x1011 0x5005: jmp ebx_thunk... 0x5000: push $0x1011 0x5005: jmp ebx_thunk’... ebx_thunk’: 0x6000: mov (esp), ebx 0x6003: call map_return 0x6008: ret 0x5000: push $0x1011 0x5005: jmp ebx_thunk’... ebx_thunk’: 0x6000: mov (esp), ebx 0x6003: call map_return 0x6008: ret

Correct Relocation -5- Current Approaches Strict Relocation –Maintains the semantics of each individual instruction –Safe in nearly all cases –Can impose severe slowdown –Trades speed for strictness Ad-Hoc Relocation –Emit more efficient code by partially emulating the original code –Pattern matching may fail and generate incorrect code –Trades strictness for speed

Correct Relocation -6- Benefits and Drawbacks SafeFast Strict Relocation GoodPoor Ad-Hoc Relocation PoorGood Partial Relocation Good

Correct Relocation -7- Our Approach Develop a formal model of relocation –Reason about the relationship of the moved code to: Its new location Surrounding code –Based on semantics of code instead of pattern- matching against syntax Strictness of emulation based on demands of the moved code (and surrounding code)

Correct Relocation -8- Effects of Code Movement Moving certain instructions will change their semantics –Relative branches, loads, stores –We call these PC referencing instructions Patching tools overwrite program code –Other code that references this code will be affected Relocation may affect non-relocated code!

Correct Relocation -9- Effects of Moving Code foo: 0x1000: push ebp 0x1001: mov esp, ebp 0x1003: mov 0x8(ebp), eax 0x1004: cmp 0x5, eax 0x1006: ja 0x30 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax), edx 0x1011: jmp edx foo: 0x1000: push ebp 0x1001: mov esp, ebp 0x1003: mov 0x8(ebp), eax 0x1004: cmp 0x5, eax 0x1006: ja 0x30 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax), edx 0x1011: jmp edx No change Relative branch Relative load Branch to result of relative load

Correct Relocation xf000: push ebp 0xf001: mov esp, ebp... 0xf000: push ebp 0xf001: mov esp, ebp... main:... 0x0050: call foo... foo: 0x1002: push ebp 0x1003: mov esp, ebp... bar:... 0x2010: mov (0x1000), eax 0x2015: add (0x1004), eax main:... 0x0050: call foo... foo: 0x1002: push ebp 0x1003: mov esp, ebp... bar:... 0x2010: mov (0x1000), eax 0x2015: add (0x1004), eax 0x1002: jmp 0xf Effects of Overwriting Code

Correct Relocation -11- Approach Model –Relocated code, surrounding code –Properties of code affected by relocation Analysis –Deriving these properties from the binary Transformations –How do we modify code to run correctly and efficiently?

Correct Relocation -12- Model Define properties of code that relocation affects –PC referencing –Dependence on moved or overwritten code A single instruction may have multiple properties These combinations of properties determine how to relocate the instruction –Or compensate non-relocated instructions

Correct Relocation -13- Program Regions R = {i i, …, i j } –Instructions to relocate A = {i k, …, i l } –Analyzed region –Surrounds R U = {i 0, … i n } – R - A –Unanalyzed region –Models limits of analysis R’ = {i p,..., i q } –Relocated instructions R A U R’

Correct Relocation -14- Direct (REF) –Control (REF C ) –Data (REF D ) –Predicate (REF P ) Indirect (REF*) –Control (REF* C ) –Data (REF* D ) –Predicate (REF* P ) Properties of Moved Code foo: 0x1000: push ebp 0x1001: mov esp, ebp 0x1003: mov 0x8(ebp), eax 0x1004: cmp 0x5, eax 0x1006: ja 0x30 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax), edx 0x1011: jmp edx foo: 0x1000: push ebp 0x1001: mov esp, ebp 0x1003: mov 0x8(ebp), eax 0x1004: cmp 0x5, eax 0x1006: ja 0x30 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax), edx 0x1011: jmp edx

Correct Relocation -15- Predicate PC References bool dl_open_check(char *name, void *calladdr) { // Check if the caller is // from libdl or libc bool safe_open = false; if (IN(libc_obj, calladdr) || IN(libdl_obj, calladdr) safe_open = true; if (!safe_open) return false; // Perform further checks... } Safety check in library load –Address of caller passed in –Checked against legal callers Predicate expressions

Correct Relocation -16- Properties of Overwritten Code Control (CF) –Instructions with successors in R Data (DF) –Loads from R –Stores to R main:... 0x0050: call foo... foo:... 0x1004: cmp 0x5, eax 0x1006: ja 0x30... bar:... 0x2010: mov (0x1000), eax 0x2015: add (0x1004), eax main:... 0x0050: call foo... foo:... 0x1004: cmp 0x5, eax 0x1006: ja 0x30... bar:... 0x2010: mov (0x1000), eax 0x2015: add (0x1004), eax foo:... 0x1004: cmp 0x5, eax 0x1006: ja 0x30... R A A {0x0050, 0x1004} CF {0x2010, 0x2015} DF

Correct Relocation -17- Properties Summary DF CF REF REF* C C P P D D

Correct Relocation -18- Analysis Overview 1.Choose R and A –R: instruction, basic block, function, … –A: how much do we analyze? 2.Identify sources of REF and REF* in R –Follow data dependence chains into A and U 3.Determine {...} CF and {...} DF –Begin with interprocedural CFG and points-to analysis –Be conservative and assume incomplete information

Correct Relocation -19- REF/REF* Analysis Create the Program Dependence Graph –Covering R + A Identify source instructions Follow data dependence edges –Into A (or U) foo:... 0x1004: cmp 0x5, eax 0x1006: ja 0x30 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax) edx 0x1011: jmp edx foo:... 0x1004: cmp 0x5, eax 0x1006: ja 0x30 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax) edx 0x1011: jmp edx 0x1006: ja 0x30 0x1008: call ebx_thunk 0x100d: add ebx, eax REF* C

Correct Relocation -20- Transformation Goals We want to emulate the smallest set of original code semantics Transformations must maintain the properties determined by analysis –But any others are not required Our approach: define transformations for each combination of properties

Correct Relocation -21- Granularity of Relocation Current methods relocate by instruction –Maintain equivalence at the instruction boundary “Unobserved” results Relocate instructions as a group –Maintain boundary semantics of the code –Reduce complexity and improve efficiency

Correct Relocation x5000: mov 0x1011, ebx 0x5005: add ebx, eax... 0x5000: mov 0x1011, ebx 0x5005: add ebx, eax... 0x5000: push $0x1011 0x5005: mov (esp), ebx 0x5008: pop 0x5009: add ebx, eax... 0x5000: push $0x1011 0x5005: mov (esp), ebx 0x5008: pop 0x5009: add ebx, eax... Partial Relocation Example 0x5000: push $0x1011 0x5005: jmp ebx_thunk’ 0x500a: add ebx, eax... ebx_thunk’: 0x6000: mov (esp), ebx 0x6003: call map_return 0x6008: ret 0x5000: push $0x1011 0x5005: jmp ebx_thunk’ 0x500a: add ebx, eax... ebx_thunk’: 0x6000: mov (esp), ebx 0x6003: call map_return 0x6008: ret 0x5000: add 0x1011, eax... 0x5000: add 0x1011, eax... REF C REF D

Correct Relocation -23- Research Plan This work is preliminary –Properties are defined –Analysis requirements are defined Still a lot to do –Determine transformations –Implementation in Dyninst –Performance analysis

Correct Relocation -24- Questions?

Correct Relocation xf008: mov 0x1008, ebx 0xf00e: add ebx, eax 0xf010: mov (eax, 4), ebx 0xf012: jmp ebx 0xf040:... 0xf060:... 0xf080:... 0xf0a0:... foo: 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax, 4), ebx 0x1011: jmp ebx 0x1012:... 0x1040:... 0x1060:... 0x1080:... 0x10a0:... foo: 0x1008: call ebx_thunk 0x100d: add ebx, eax 0x100f: mov (eax, 4), ebx 0x1011: jmp ebx 0x1012:... 0x1040:... 0x1060:... 0x1080:... 0x10a0:... Relocating a Jump Table foo3: 0xf008: call ebx_thunk 0xf00d: add ebx, eax 0xf00f: mov (eax, 4), ebx 0xf011: jmp ebx 0xf012: <relocated jump table data> 0xf040: 0xf060: 0xf080: 0xf0a0: foo3: 0xf008: call ebx_thunk 0xf00d: add ebx, eax 0xf00f: mov (eax, 4), ebx 0xf011: jmp ebx 0xf012: <relocated jump table data> 0xf040: 0xf060: 0xf080: 0xf0a0: foo1: 0x1008: jmp 0x1012:... 0x1040:... 0x1060:... 0x1080:... 0x10a0:... foo1: 0x1008: jmp 0x1012:... 0x1040:... 0x1060:... 0x1080:... 0x10a0:... foo2: 0x1008: jmp 0x1012:... 0x1040: jmp 0x1060: jmp 0x1080: jmp 0x10a0: jmp foo2: 0x1008: jmp 0x1012:... 0x1040: jmp 0x1060: jmp 0x1080: jmp 0x10a0: jmp

Correct Relocation -26- Complex Instructions Instructions may have multiple properties –Example: a relative branch in R may be both CF and REF C Some overlap is due to implicit control flow –Instructions in R may be tagged as REF C due to fallthrough to next instruction We can model instructions as combinations of independent operations if necessary –Separate out the “next PC” calculation