Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily.

Slides:



Advertisements
Similar presentations
PASTE 2011 Szeged, Hungary September 5, 2011 Labeling Library Functions in Stripped Binaries Emily R. Jacobson, Nathan Rosenblum, and Barton P. Miller.
Advertisements

ByteWeight: Learning to Recognize Functions in Binary Code
© 2006 Nathan RosenblumMarch 2006Unconventional Code Constructs The New Dyninst Code Parser: Binary Code Isn't as Simple as it Used to Be Nathan Rosenblum.
Recitation 8 – 3/25/02 Outline Dynamic Linking Review prior test questions 213 Course Staff Office Hours: See Posting.
Machine-Learning Assisted Binary Code Analysis
Practical session 7 review. Little – Endian What’s in memory? Section.rodata a: DB ‘hello’, 0x20, ’world’, 10, 0 b: DW ‘hello’, 0x20, ’world’, 10, 0 c:
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
CS2422 Assembly Language & System Programming October 26, 2006.
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
INVOKE Directive The INVOKE directive is a powerful replacement for Intel’s CALL instruction that lets you pass multiple arguments Syntax: INVOKE procedureName.
Attacks Using Stack Buffer Overflow Boxuan Gu
Recitation 2: Assembly & gdb Andrew Faulring Section A 16 September 2002.
Y86 Processor State Program Registers
Instrumentation - initial results Sung Kim, Jeff Perkins MIT.
Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison
The Deconstruction of Dyninst: Experiences and Future Directions Drew Bernat, Madhavi Krishnan, Bill Williams, Bart Miller Paradyn Project 1.
Paradyn Project Petascale Tools Workshop Madison, Wisconsin Aug 4-Aug 7, 2014 Binary Code is Not Easy Xiaozhu Meng, Emily Gember-Jacobson, and Bill Williams.
Recitation 4: The Stack & Lab3 Andrew Faulring Section A 30 September 2002.
1 #include void silly(){ char s[30]; gets(s); printf("%s\n",s); } main(){ silly(); return 0; }
Recitation 6 – 2/26/01 Outline Linking Exam Review –Topics Covered –Your Questions Shaheen Gandhi Office Hours: Wednesday.
Recitation 2 – 2/11/02 Outline Stacks & Procedures Homogenous Data –Arrays –Nested Arrays Mengzhi Wang Office Hours: Thursday.
Recitation 2: Outline Assembly programming Using gdb L2 practice stuff Minglong Shao Office hours: Thursdays 5-6PM Wean Hall.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 22: Unconventional.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
Stack Usage with MS Visual Studio Without Stack Protection.
Compiler Construction Code Generation Activation Records
Machine Learning for Program Language Research Yao Peisen Prism Group, HKUST
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
1 Linking. 2 Outline Symbol Resolution Relocation Suggested reading: 7.6~7.7.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 1, 2013 Detecting Code Reuse Attacks Using Dyninst Components Emily Jacobson, Drew.
Recitation 3 Outline Recursive procedure Complex data structures –Arrays –Structs –Unions Function pointer Reminders Lab 2: Wed. 11:59PM Lab 3: start early.
Improvements to the Compiler Lecture 27 Mon, Apr 26, 2004.
Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables.
Linking I Topics Assembly and symbol resolution Static linking Systems I.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
OUTLINE 2 Pre-requisite Bomb! Pre-requisite Bomb! 3.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Recitation 2 – 2/11/02 Outline Stacks & Procedures Homogenous Data –Arrays –Nested Arrays Structured Data –struct s / union s –Arrays of structs.
Recitation 3: Procedures and the Stack
Machine-Level Programming 2 Control Flow
Instruction Set Architecture
Static and dynamic analysis of binaries
C function call conventions and the stack
Performance Optimizations in Dyninst
Computer Architecture and Assembly Language
Exploiting & Defense Day 2 Recap
Recitation 2 – 2/11/02 Outline Stacks & Procedures
Aaron Miller David Cohen Spring 2011
143A: Principles of Operating Systems Lecture 8: Basic Architecture of a Program Anton Burtsev October, 2017.
Introduction to Compilers Tim Teitelbaum
Recitation 2 – 2/4/01 Outline Machine Model
Emily Jacobson and Nathan Rosenblum
Machine-Level Programming 1 Introduction
asum.ys A Y86 Programming Example
Machine-Level Programming III: Switch Statements and IA32 Procedures
Y86 Processor State Program Registers
Discussion Section – 11/3/2012
Machine-Level Programming 4 Procedures
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Machine-Level Programming 2 Control Flow
Assembly Language Programming II: C Compiler Calling Sequences
The Runtime Environment
Machine-Level Programming 2 Control Flow
The Runtime Environment
Machine-Level Programming: Introduction
Multi-modules programming
Khaled Yakdan University of Bonn Fraunhofer FKIE
X86 Assembly Review.
Presentation transcript:

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily Jacobson and Nathan Rosenblum

Binary Tools Need Symbol Tables o Debugging Tools o GDB, IDA Pro… o Instrumentation Tools o PIN, Dyninst,… o Static Analysis Tools o CodeSurfer/x86,… o Security Analysis Tools o IDA Pro,… 2 unstrip : Restoring Function Information to Stripped Binaries

: push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call push %eax call mov %ebp,%esp pop %ebp 3 unstrip : Restoring Function Information to Stripped Binaries push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call 80c3bd0 push %eax call mov %ebp,%esp pop %ebp unstrip unstrip = stripped parsing + binary rewriting

New Semantic Information o Important semantic information: program’s interaction with the operating system (system calls) o These calls are encapsulated in wrapper functions Library fingerprinting: identify functions based on patterns learned from exemplar libraries 4 unstrip : Restoring Function Information to Stripped Binaries

: push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call push %eax call mov %ebp,%esp pop %ebp 5 unstrip : Restoring Function Information to Stripped Binaries push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call 80c3bd0 push %eax call mov %ebp,%esp pop %ebp unstrip = stripped parsing + + binary rewriting library fingerprinting call unstrip

: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae ret mov %esi,%esi mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx Set up system call arguments int $0x80 Invoke a system call mov %edx, %ebx cmp %0xffffff83,%eax jae ret Error check and return

: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae ret mov %esi,%esi mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae ret : cmpl $0x0,%gs:0xc jne 80f669c mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx call *0x814e93c mov %edx, %ebx cmp %0xffffff83,%eax jae ret push %esi call libc_enable_asyncancel mov %eax,%esi mov %ebx,%edx mov $0x66,%eax mov $0x5,%ebx lea 0x8(%esp),%ecx call *0x mov %edx, %ebx xchg %eax,%esi call libc_disable_acynancel mov %esi,%eax pop %esi cmp $0xffffff83,%eax jae syscall_error ret glibc 2.5 on RHEL with GCC glibc on RHEL The same function can be realized in a variety of ways in the binary : cmpl $0x0,%gs:0xc jne 80f669c mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae ret push %esi call libc_enable_asyncancel mov %eax,%esi mov %ebx,%edx mov $0x66,%eax mov $0x5,%ebx lea 0x8(%esp),%ecx int $0x80 mov %edx, %ebx xchg %eax,%esi call libc_disable_acynancel mov %esi,%eax pop %esi cmp $0xffffff83,%eax jae syscall_error ret glibc 2.5 on RHEL with GCC 4.1.2

Semantic Descriptors o Instead, we’ll take a semantic approach o Record information that is likely to be invariant across multiple versions of the function 8 unstrip : Restoring Function Information to Stripped Binaries : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae ret mov %esi,%esi int $0x80 mov %0x66,%eax mov $0x5,%ebx { }, 5

Building Semantic Descriptors 9 unstrip : Restoring Function Information to Stripped Binaries We parse an input binary, locate system calls and wrapper function calls, and employ dataflow analysis. binary reboot: push %ebp mov %esp,%ebp sub $0x10,%esp push %edi push %ebx mov 0x8(%ebp),%edx mov $0xfee1dead,%edi mov $0x ,%ecx push %ebx mov %edi,%ebx mov $0x58,%eax int $0x80 … SYSTEM CALL 0x580x EAX EBXECX %edi 0xfee 1 dead { } EAX

unstrip Building a Descriptor Database 10 unstrip : Restoring Function Information to Stripped Binaries Descriptor Database : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … Locate wrapper functions Build semantic descriptors { }: accept { }: listen { }: getpid … glibc reference library

glibc reference library glibc reference library glibc reference library glibc reference library unstrip Building a Descriptor Database 11 unstrip : Restoring Function Information to Stripped Binaries Descriptor Database Build semantic descriptors Locate wrapper functions { }: accept { }: listen { }: getpid … { }: accept { }: listen { }: getpid … { }: accept { }: listen { }: getpid … { }: accept { }: listen { }: getpid … 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 …

Identifying Functions in a Stripped Binary unstrip Building a Descriptor Database 12 unstrip : Restoring Function Information to Stripped Binaries Build semantic descriptors Locate functions Descriptor Database glibc reference library glibc reference library glibc reference library glibc reference library 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 : mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … { }: accept { }: listen { }: getpid … { }: accept { }: listen { }: getpid … { }: accept { }: listen { }: getpid … { }: accept { }: listen { }: getpid …

unstrip Identifying Functions in a Stripped Binary 13 unstrip : Restoring Function Information to Stripped Binaries stripped binary unstripped binary Descriptor Database For each wrapper function { 1. Build the semantic descriptor. 2. Search the database for a match (two stages). 3. Add label to symbol table. }

Evaluation o To evaluate across three dimensions of variation, we constructed three data sets: o compiler version o library version o distribution vendor o In each set, we compiled a test binary for each glibc instance, built a descriptor database, and applied unstrip and IDA Pro FLIRT o Our evaluation measure is accuracy 14 unstrip : Restoring Function Information to Stripped Binaries

Evaluation Results: Compiler Version Study 15 unstrip : Restoring Function Information to Stripped Binaries

Evaluation Results: Library Version Study 16 unstrip : Restoring Function Information to Stripped Binaries

Evaluation Results: Distribution Study 17 unstrip : Restoring Function Information to Stripped Binaries

18 unstrip : Restoring Function Information to Stripped Binaries For full details, tech report available online unstrip is available at: Come see the unstrip demo today at 2:00 or 2:30 (in 1260 WID/MIR)