Emily Jacobson and Nathan Rosenblum

Slides:



Advertisements
Similar presentations
PASTE 2011 Szeged, Hungary September 5, 2011 Labeling Library Functions in Stripped Binaries Emily R. Jacobson, Nathan Rosenblum, and Barton P. Miller.
Advertisements

ByteWeight: Learning to Recognize Functions in Binary Code
© 2006 Nathan RosenblumMarch 2006Unconventional Code Constructs The New Dyninst Code Parser: Binary Code Isn't as Simple as it Used to Be Nathan Rosenblum.
Introduction to Information Security מרצים : Dr. Eran Tromer: Prof. Avishai Wool: מתרגלים : Itamar Gilad
Practical session 7 review. Little – Endian What’s in memory? Section.rodata a: DB ‘hello’, 0x20, ’world’, 10, 0 b: DW ‘hello’, 0x20, ’world’, 10, 0 c:
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Attacks Using Stack Buffer Overflow Boxuan Gu
Recitation 2: Assembly & gdb Andrew Faulring Section A 16 September 2002.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Y86 Processor State Program Registers
Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison
The Deconstruction of Dyninst: Experiences and Future Directions Drew Bernat, Madhavi Krishnan, Bill Williams, Bart Miller Paradyn Project 1.
Paradyn Project Petascale Tools Workshop Madison, Wisconsin Aug 4-Aug 7, 2014 Binary Code is Not Easy Xiaozhu Meng, Emily Gember-Jacobson, and Bill Williams.
Recitation 4: The Stack & Lab3 Andrew Faulring Section A 30 September 2002.
1 #include void silly(){ char s[30]; gets(s); printf("%s\n",s); } main(){ silly(); return 0; }
Recitation 6 – 2/26/01 Outline Linking Exam Review –Topics Covered –Your Questions Shaheen Gandhi Office Hours: Wednesday.
Recitation 2 – 2/11/02 Outline Stacks & Procedures Homogenous Data –Arrays –Nested Arrays Mengzhi Wang Office Hours: Thursday.
Recitation 2: Outline Assembly programming Using gdb L2 practice stuff Minglong Shao Office hours: Thursdays 5-6PM Wean Hall.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 22: Unconventional.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
Stack Usage with MS Visual Studio Without Stack Protection.
Compiler Construction Code Generation Activation Records
Machine Learning for Program Language Research Yao Peisen Prism Group, HKUST
Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
1 Linking. 2 Outline Symbol Resolution Relocation Suggested reading: 7.6~7.7.
Recitation 3 Outline Recursive procedure Complex data structures –Arrays –Structs –Unions Function pointer Reminders Lab 2: Wed. 11:59PM Lab 3: start early.
Improvements to the Compiler Lecture 27 Mon, Apr 26, 2004.
Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily.
Linking I Topics Assembly and symbol resolution Static linking Systems I.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004.
OUTLINE 2 Pre-requisite Bomb! Pre-requisite Bomb! 3.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Recitation 2 – 2/11/02 Outline Stacks & Procedures Homogenous Data –Arrays –Nested Arrays Structured Data –struct s / union s –Arrays of structs.
Recitation 3: Procedures and the Stack
Machine-Level Programming 2 Control Flow
Instruction Set Architecture
Static and dynamic analysis of binaries
C function call conventions and the stack
Performance Optimizations in Dyninst
Computer Architecture and Assembly Language
Exploiting & Defense Day 2 Recap
Recitation 2 – 2/11/02 Outline Stacks & Procedures
Aaron Miller David Cohen Spring 2011
143A: Principles of Operating Systems Lecture 8: Basic Architecture of a Program Anton Burtsev October, 2017.
Introduction to Compilers Tim Teitelbaum
Recitation 2 – 2/4/01 Outline Machine Model
asum.ys A Y86 Programming Example
Machine-Level Programming III: Switch Statements and IA32 Procedures
Y86 Processor State Program Registers
Discussion Section – 11/3/2012
Condition Codes Single Bit Registers
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Stack Frames and Advanced Procedures
Machine-Level Programming 2 Control Flow
Assembly Language Programming II: C Compiler Calling Sequences
The Runtime Environment
Machine-Level Programming 2 Control Flow
EECE.3170 Microprocessor Systems Design I
The Runtime Environment
EECE.3170 Microprocessor Systems Design I
Machine-Level Programming: Introduction
Multi-modules programming
Khaled Yakdan University of Bonn Fraunhofer FKIE
X86 Assembly Review.
Computer Architecture and Assembly Language
Computer Architecture and System Programming Laboratory
Presentation transcript:

Emily Jacobson and Nathan Rosenblum unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily Jacobson and Nathan Rosenblum

Binary Tools Need Symbol Tables Debugging Tools GDB, IDA Pro… Instrumentation Tools PIN, Dyninst,… Static Analysis Tools CodeSurfer/x86,… Security Analysis Tools IDA Pro,… unstrip: Restoring Function Information to Stripped Binaries

unstrip: Restoring Function Information to Stripped Binaries unstrip = stripped parsing + binary rewriting push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call 80c3bd0 call 8057220 mov %ebp,%esp pop %ebp <targ8056f50>: push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call <targ80c3bd0> call <targ8057220> mov %ebp,%esp pop %ebp unstrip unstrip: Restoring Function Information to Stripped Binaries

New Semantic Information Important semantic information: program’s interaction with the operating system (system calls) These calls are encapsulated in wrapper functions Library fingerprinting: identify functions based on patterns learned from exemplar libraries unstrip: Restoring Function Information to Stripped Binaries

unstrip: Restoring Function Information to Stripped Binaries unstrip = stripped parsing + library fingerprinting + binary rewriting push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call 80c3bd0 call 8057220 mov %ebp,%esp pop %ebp <targ8056f50>: push %ebp mov %esp,%ebp sub %0x8,%esp mov 0x8(%ebp),%eax add $0xfffffff8,%esp push %eax call <targ80c3bd0> call <targ8057220> mov %ebp,%esp pop %ebp unstrip call <getpid> call <kill> unstrip: Restoring Function Information to Stripped Binaries

Set up system call arguments <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae 8048300 ret mov %esi,%esi mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx Set up system call arguments int $0x80 Invoke a system call mov %edx, %ebx cmp %0xffffff83,%eax jae 8048300 ret Error check and return

The same function can be realized in a variety of ways in the binary <accept>: mov %ebx,%edx cmpl $0x0,%gs:0xc mov $0x66,%eax jne 80f669c mov %ebx, %edx lea 0x8(%esp),%ecx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx xchg %eax,%esi int $0x80 call libc_disable_acynancel mov %edx, %ebx cmp %0xffffff83,%eax mov %esi,%eax jae 8048460 pop %esi ret cmp $0xffffff83,%eax push %esi jae syscall_error call libc_enable_asyncancel mov %eax,%esi <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae 8048300 ret mov %esi,%esi glibc 2.5 on RHEL with GCC 4.1.2 glibc 2.2.4 on RHEL <accept>: mov %ebx,%edx cmpl $0x0,%gs:0xc mov $0x66,%eax jne 80f669c mov %ebx, %edx lea 0x8(%esp),%ecx mov %0x66,%eax call *0x8181578 mov $0x5,%ebx lea 0x4(%esp),%ecx xchg %eax,%esi call *0x814e93c call libc_disable_acynancel mov %edx, %ebx cmp %0xffffff83,%eax mov %esi,%eax jae 8048460 pop %esi ret cmp $0xffffff83,%eax push %esi jae syscall_error call libc_enable_asyncancel mov %eax,%esi The same function can be realized in a variety of ways in the binary glibc 2.5 on RHEL with GCC 3.4.4

Semantic Descriptors Instead, we’ll take a semantic approach Record information that is likely to be invariant across multiple versions of the function <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jae 8048300 ret mov %esi,%esi mov %0x66,%eax mov $0x5,%ebx {<socketcall >} int $0x80 , 5 unstrip: Restoring Function Information to Stripped Binaries

Building Semantic Descriptors reboot: push %ebp mov %esp,%ebp sub $0x10,%esp push %edi push %ebx mov 0x8(%ebp),%edx mov $0xfee1dead,%edi mov $0x28121969,%ecx mov %edi,%ebx mov $0x58,%eax int $0x80 … 0xfee1dead binary 0x58 %edi 0x28121969 EAX EAX EBX ECX SYSTEM CALL {<reboot, 0xfee1dead, 0x2812969>} We parse an input binary, locate system calls and wrapper function calls, and employ dataflow analysis. unstrip: Restoring Function Information to Stripped Binaries

Building a Descriptor Database unstrip <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … Locate wrapper functions glibc reference library Descriptor Database {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … Build semantic descriptors unstrip: Restoring Function Information to Stripped Binaries

Building a Descriptor Database unstrip 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … Locate wrapper functions 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … glibc reference library 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … glibc reference library glibc reference library glibc reference library Descriptor Database {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … Build semantic descriptors unstrip: Restoring Function Information to Stripped Binaries

Building a Descriptor Database Identifying Functions in a Stripped Binary Building a Descriptor Database unstrip 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … Locate functions 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … glibc reference library 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … 1 <accept>: mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx int $0x80 … glibc reference library glibc reference library glibc reference library Descriptor Database {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … Build semantic descriptors unstrip: Restoring Function Information to Stripped Binaries

Identifying Functions in a Stripped Binary unstrip For each wrapper function { 1. Build the semantic descriptor. 2. Search the database for a match (two stages). 3. Add label to symbol table. } stripped binary Descriptor Database unstripped binary unstrip: Restoring Function Information to Stripped Binaries

unstrip: Restoring Function Information to Stripped Binaries Evaluation To evaluate across three dimensions of variation, we constructed three data sets: compiler version library version distribution vendor In each set, compile statically-linked binaries, build a DBB, compare unstrip to IDA Pro’s FLIRT Evaluation measure is accuracy unstrip: Restoring Function Information to Stripped Binaries

Evaluation Results: Compiler Version Study unstrip: Restoring Function Information to Stripped Binaries

Evaluation Results: Library Version Study unstrip: Restoring Function Information to Stripped Binaries

Evaluation Results: Distribution Study unstrip: Restoring Function Information to Stripped Binaries

unstrip: Restoring Function Information to Stripped Binaries For full details, tech report available online at: ftp://ftp.cs.wisc.edu/paradyn/papers/Jacobson11Unstrip.pdf unstrip is available at: http://www.paradyn.org/html/tools/unstrip.html Come see the unstrip demo today at 2:00 or 2:30 (in 1260 WID/MIR) unstrip: Restoring Function Information to Stripped Binaries

unstrip: Restoring Function Information to Stripped Binaries Extra Slides Some additional results unstrip: Restoring Function Information to Stripped Binaries

Evaluation Results: Distribution Study unstrip: Restoring Function Information to Stripped Binaries

Evaluation Results: Toolchain Study (one predicts the rest) Results from one-predicts-the-rest testing, i.e., 3.4.4 patterns predicting all other binaries in this group (averaged), then 4.0.2 patterns predicting all other binaries in this group (averaged),… unstrip: Restoring Function Information to Stripped Binaries

Evaluation Results: Library Version Study (one predicts the rest) Results from one-predicts-the-rest testing (as on previous slide) unstrip: Restoring Function Information to Stripped Binaries

Evaluation Results: Distribution Study (one predicts the rest) Results from one-predicts-the-rest testing (as on previous slide) unstrip: Restoring Function Information to Stripped Binaries