Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.

Slides:



Advertisements
Similar presentations
Practical Malware Analysis
Advertisements

A Binary Agent Technology for COTS Software Integrity Richard Schooler Anant Agarwal InCert Software.
ByteWeight: Learning to Recognize Functions in Binary Code
Copyright © 2000, Daniel W. Lewis. All Rights Reserved. CHAPTER 5 MIXING C AND ASSEMBLY.
Assembly Language for x86 Processors 6th Edition Chapter 5: Procedures (c) Pearson Education, All rights reserved. You may modify and copy this slide.
Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
COMP 2003: Assembly Language and Digital Logic
Recitation 8 – 3/25/02 Outline Dynamic Linking Review prior test questions 213 Course Staff Office Hours: See Posting.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Assembly Language for Intel-Based Computers Chapter 5: Procedures Kip R. Irvine.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 Lecture 5: Procedures Assembly Language for Intel-Based Computers, 4th edition Kip R. Irvine.
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
Accessing parameters from the stack and calling functions.
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
1 Homework Reading –PAL, pp , Machine Projects –Finish mp2warmup Questions? –Start mp2 as soon as possible Labs –Continue labs with your.
Position Independent Code self sufficiency of combining program.
Web siteWeb site ExamplesExamples Irvine, Kip R. Assembly Language for Intel-Based Computers, Defining and Using Procedures Creating Procedures.
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh and Dmitry Ponomarev Department of Computer.
CEG 320/520: Computer Organization and Assembly Language ProgrammingIntel Assembly 1 Intel IA-32 vs Motorola
6.828: PC hardware and x86 Frans Kaashoek
Paradyn Project Dyninst/MRNet Users’ Meeting Madison, Wisconsin August 7, 2014 The Evolution of Dyninst in Support of Cyber Security Emily Gember-Jacobson.
Program Behavior Characterization and Clustering: An Empirical Study for Failure Clustering Danqing Zhang School of Software Engineering, Tongji University.
Code Generation Gülfem Savrun Yeniçeri CS 142 (b) 02/26/2013.
Lecture-1 Compilation process
1 Malware Analysis and Instrumentation Andrew Bernat and Kevin Roundy Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011.
1 Malware Analysis and Instrumentation Andrew Bernat and Kevin Roundy Paradyn Project Center for Computing Science June 14, 2011.
IA32 (Pentium) Processor Architecture. Processor modes: 1.Protected (mode we will study) – 32-bit mode – 32-bit (4GB) address space 2.Virtual 8086 modes.
Today’s topics Procedures Procedures Passing values to/from procedures Passing values to/from procedures Saving registers Saving registers Documenting.
Arithmetic Flags and Instructions
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 22: Unconventional.
CNIT 127: Exploit Development Ch 1: Before you begin.
Assembly Language. Symbol Table Variables.DATA var DW 0 sum DD 0 array TIMES 10 DW 0 message DB ’ Welcome ’,0 char1 DB ? Symbol Table Name Offset var.
AMD64/EM64T – Dyninst & ParadynMarch 17, 2005 The AMD64/EM64T Port of Dyninst and Paradyn Greg Quinn Ray Chen
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
Introduction to Assembly II Abed Asi Extended System Programming Laboratory (ESPL) CS BGU Fall 2013/2014.
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
Compiler Construction Code Generation Activation Records
1 The Stack and Procedures Chapter 5. 2 A Process in Virtual Memory  This is how a process is placed into its virtual addressable space  The code is.
Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 1, 2013 Detecting Code Reuse Attacks Using Dyninst Components Emily Jacobson, Drew.
CSC 221 Computer Organization and Assembly Language Lecture 16: Procedures.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
Practical Session 3.
Recitation 3: Procedures and the Stack
Instruction Set Architecture
Exploiting & Defense Day 2 Recap
Aaron Miller David Cohen Spring 2011
Homework In-line Assembly Code Machine Language
Introduction to Compilers Tim Teitelbaum
Assembly IA-32.
High-Level Language Interface
Computer Architecture and Assembly Language
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Machine-Level Representation of Programs III
Practical Session 4.
Efficient x86 Instrumentation:
Multi-modules programming
X86 Assembly Review.
ICS51 Introductory Computer Organization
Computer Architecture and System Programming Laboratory
Computer Architecture and Assembly Language
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
Presentation transcript:

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat

Binary Instrumentation 2 Safe and Efficient Instrumentation Instrumentation modifies the original code Moves original code Allocates new memory Overwrites original code This affects the behavior of: Moved code Code that references moved code Code that references changed memory And can cause incorrect execution

Sensitivity Models A program is sensitive to a particular modification if that modification changes the program’s behavior Current binary instrumenters rely on fixed sensitivity models And may fail to preserve behavior Compensating for sensitivity imposes overhead 3 Safe and Efficient Instrumentation push $(ret_addr) jmp printf call printf ret pop %eax call compensate_ret_addr jmp %eax

Safe and Efficient Approach Safe and Efficient Approach Efficiency vs Sensitivity 4 Safe and Efficient Instrumentation Sensitivity Malware Optimized Code Conventional Code Efficiency Pin, Valgrind, … Dyninst Safe and Efficient Approach

How do we do this? Formalization of code relocation Visible behavior Instruction sensitivity External sensitivity Implementation in Dyninst Analysis phase Transformation phase Analysis and performance results 5 Safe and Efficient Instrumentation

Three Questions What program behavior do we wish to preserve? How does modification affect instructions? How do instructions change program behavior? 6 Safe and Efficient Instrumentation

Approach Preserve visible behavior Relationship of input to output Identify sensitive instructions Those whose behavior is changed Only compensate for externally sensitive instructions Those whose sensitivity affects visible behavior 7 Safe and Efficient Instrumentation

Original Binary Instrumented Binary Visible Behavior Intuition: we can change anything that does not affect the output of the program 8 Safe and Efficient Instrumentation XYX + AY + B Instrumentation Input Instrumentation Output

Sensitivity What does instrumentation change? Addresses of instructions Contents of memory Shape of the address space Directly affected instructions: Access the PC (and are moved) Read modified memory Test allocated memory 9 Safe and Efficient Instrumentation

Sensitivity Examples 10 Safe and Efficient Instrumentation main: push %ebp mov %esp, %ebp … call worker … leave ret worker: push %ebp mov %esp, %ebp … ret jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx get_pc_thunk: mov (%esp), %ebx ret Call/Return Pair:Jumptable: protect: call initialize … initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebx loop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base) Self-Unpacking Code (Simplified):

Sensitivity Is Not Enough An instruction is externally sensitive if it causes a visible change in behavior Approximation: or changes control flow This requires: The sensitive instruction must produce different values These differences must reach an instruction that affects output (or control flow) … and change its behavior 11 Safe and Efficient Instrumentation

Program Modification 12 Safe and Efficient Instrumentation Analysis Compensation Code Original Binary Modified Binary Original Code Relocated Code

Analysis Phase Identify sensitive instructions InstructionAPI: used and defined sets Determine affected instructions DepGraphAPI: forward slice Analyze effects of modification SymEval: symbolic expansion of the slice 13 Safe and Efficient Instrumentation

Analysis Example: Call/Return Pair 14 Safe and Efficient Instrumentation main: push %ebp mov %esp, %ebp … call worker … leave ret worker: push %ebp mov %esp, %ebp … ret Call/Return Pair: Sensitivity: call (moved, uses PC) Slice: call ret Symbolic Expansion: call: ret:

Analysis Example: Jumptable 15 Safe and Efficient Instrumentation Sensitivity: call (moved, uses PC) Slice: call mov (%esp), %ebx Symbolic Expansion: call: mov: add: mov: jmp: jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx get_pc_thunk: mov (%esp), %ebx ret Jumptable: add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx

Analysis Example: Unpacking Code 16 Safe and Efficient Instrumentation Sensitivity: call (moved, uses PC) Slice: call initialize pop %esi mov (%esi, %ebx, 4), %eax call unpack … Symbolic Expansion: call: pop: mov: protect: call initialize … initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebx loop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base) Self-Unpacking Code (Simplified)

Compensation Phase Generates the relocated code Current instrumenter approach: Treat each instruction individually May miss optimization opportunities New approach: group transformation Derived from Dyninst heuristics 17 Safe and Efficient Instrumentation

Instruction Transformation Emulate each externally sensitive instruction Replace some instructions (e.g., calls) with sequences Some sequences impose high overhead E.g., run-time compensation 18 Safe and Efficient Instrumentation pop %eax call compensate_ret_addr jmp %eax ret call printf push $(orig_ret_addr) jmp printf

Group Transformation Emulate the behavior of a group of instructions Motivating example: compiler thunk functions Open questions: Which instructions are included in the group? How is the replacement sequence determined? Current status: hand-crafted templates 19 Safe and Efficient Instrumentation mov (%esp), %ebx ret call ebx_thunk mov $(ret_addr), %ebx

Transformation: Call/Return Pair 20 Safe and Efficient Instrumentation main: push %ebp mov %esp, %ebp … call worker … leave ret worker: push %ebp mov %esp, %ebp … ret Original Code main: push %ebp mov %esp, %ebp … call worker … leave ret worker: push %ebp mov %esp, %ebp … ret Relocated Code

Transformation: Jumptable 21 Safe and Efficient Instrumentation Original CodeRelocated Code jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx get_pc_thunk: mov (%esp), %ebx ret jumptable: push %ebp mov %esp, %ebp mov $(ret_addr), %ebx add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx

Transformation: Unpacking Code 22 Safe and Efficient Instrumentation Relocated Code protect: call initialize … initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebx loop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base) Original Code protect: jmp initialize … initialize: mov $(ret_addr), %esi mov $(unpack_base), %edi mov $0x0, %ebx loop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)

Results Type of Binary% PC Sensitive% Externally Sensitive % Unanalyzable Executable (a.out)9.0%0.01%0.59% Library (.so)7.9%0.55%0.72% 23 Safe and Efficient Instrumentation Percentage of PC-Sensitive Instructions (32-bit, GCC, static analysis) Instrumentation Overhead (go, 32-bit, 12.3s base time) Current Dyninst: 23.4s (90.2%) Safe and Efficient Algorithm: 16.3s (32.5%)

Future Work Memory sensitivity and compensation Improved pointer analysis Useful user intervention? Investigate group transformations Widen range of input binaries Expand supported platforms 24 Safe and Efficient Instrumentation

Questions? 25 Safe and Efficient Instrumentation

ASProtect code loop 26 Safe and Efficient Instrumentation : call : mov EDX, ECX : pop EDI : push EAX : pop ESI : add EDI, c: mov ESI, EDI e: push : jz c : adc DH, c: pop EBX d: mov EAX, : mov ECX, EBX(EDI) : jmp c c: add ECX, a2: xor ESI, a8: xor ECX, ae: jmp 80497c c3: sub ECX, c9: sub ESI, ce: push ECX, ESP 80497cf: mov EAX, d4: pop EBX(EDI) 80497d7: jmp 80497ed 80497ed: adc AL, f0: sub EBX, f6: xor EAX, fb: add EBX, : call c c: mov AX, : pop ESI : cmp EBX, : jnz d: or ESI, : jmp : mov ESI, : jmp

Emulation Examples 27 Safe and Efficient Instrumentation add %eax, %ebx jnz 0xf3e call fprintf mov (%esi, %ebx, 4), %eax jnz 0xe498d3 add %eax, %ebx push $ jmp fprintf lea (%esi, %ebx, 4), %eax call mem_addr_translate mov (%eax), %eax ret pop %eax call addr_translate jmp %eax