Ramblr Making Reassembly Great Again

Slides:



Advertisements
Similar presentations
Fabián E. Bustamante, Spring 2007 Machine-Level Programming II: Control Flow Today Condition codes Control flow structures Next time Procedures.
Advertisements

Smashing the Stack for Fun and Profit
ByteWeight: Learning to Recognize Functions in Binary Code
© 2006 Nathan RosenblumMarch 2006Unconventional Code Constructs The New Dyninst Code Parser: Binary Code Isn't as Simple as it Used to Be Nathan Rosenblum.
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
Position Independent Code self sufficiency of combining program.
1 Assemblers and Linkers Professor Jennifer Rexford
1 Assemblers and Linkers. 2 Goals for this Lecture Help you to learn about: The assembly process IA-32 machine language Why? Machine language is the last.
Practical Session 8 Computer Architecture and Assembly Language.
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh and Dmitry Ponomarev Department of Computer.
Recitation 2: Assembly & gdb Andrew Faulring Section A 16 September 2002.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 22: Unconventional.
ELF binary # readelf -a foo.out ELF Header:
Assembly Language. Symbol Table Variables.DATA var DW 0 sum DD 0 array TIMES 10 DW 0 message DB ’ Welcome ’,0 char1 DB ? Symbol Table Name Offset var.
Where’s the FEEB?: Effectiveness of Instruction Set Randomization Nora Sovarel, David Evans, Nate Paul University of Virginia Computer Science USENIX Security.
1 Linking. 2 Outline Symbol Resolution Relocation Suggested reading: 7.6~7.7.
Computer Organization & Assembly Language University of Sargodha, Lahore Campus Prepared by Ali Saeed.
Practical Session 8. Position Independent Code- self sufficiency of combining program Position Independent Code (PIC) program has everything it needs.
Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
OUTLINE 2 Pre-requisite Bomb! Pre-requisite Bomb! 3.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Practical Session 3.
Practical Session 5.
Recitation 3: Procedures and the Stack
Instruction Set Architecture
Computer Architecture and Assembly Language
Introduction to Information Security
Assembly language.
Static and dynamic analysis of binaries
Assembly Lab 3.
Computer Architecture and Assembly Language
Exploiting & Defense Day 2 Recap
Aaron Miller David Cohen Spring 2011
Binary, Decimal and Hexadecimal Numbers
Introduction to Compilers Tim Teitelbaum
Assembly IA-32.
Chapter 3 Machine-Level Representation of Programs
asum.ys A Y86 Programming Example
Computer Architecture and Assembly Language
Introduction to Intel x86-64 Assembly, Architecture, Applications, & Alliteration Xeno Kovah – 2014 xkovah at gmail.
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Machine-Level Programming V: Control: loops Comp 21000: Introduction to Computer Organization & Systems Systems book chapter 3* * Modified slides from.
EECE.3170 Microprocessor Systems Design I
Practical Session 4.
Code Shape I – Basic Constructs Hal Perkins Autumn 2011
EECE.3170 Microprocessor Systems Design I
Bits and Bytes Topics Representing information as bits
EECE.3170 Microprocessor Systems Design I
Bits and Bytes Topics Representing information as bits
Machine-Level Programming: Introduction
Getting Started Download the tarball for this session. It will include the following files: driver 64-bit executable driver.c C driver source bomb.h declaration.
Multi-modules programming
EECE.3170 Microprocessor Systems Design I
Chapter 3 Machine-Level Representation of Programs
Comp Org & Assembly Lang
CSC 497/583 Advanced Topics in Computer Security
Machine-Level Programming V: Control: loops Comp 21000: Introduction to Computer Organization & Systems Systems book chapter 3* * Modified slides from.
Getting Started Download the tarball for this session. It will include the following files: driver 64-bit executable driver.c C driver source bomb.h declaration.
Computer Architecture and System Programming Laboratory
Computer Architecture and Assembly Language
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
Presentation transcript:

Ramblr Making Reassembly Great Again Ruoyu “Fish” Wang, Yan Shoshitaishvili, Antonio Bianchi, Aravind Machiry, John Grosen, Paul Grosen, Christopher Kruegel, Giovanni Vigna

Motivation

Available Solutions Pure Static Dynamic Static + Dynamic

What is Binary Reassembly?

Disassemble .text 400100 mov [6000a0], eax 400105 jmp 0x40020d ... .data 6000a0 .long 0xc0debeef 6000a4 .long 0x0 Disassemble

Disassemble .text mov [data_0], eax jmp target ... target .long 0xc0debeef data_1 .long 0x0 Disassemble

Non-relocatable Assembly .text 400100 mov [6000a0], eax 400105 jmp 40020d ... Patch & Assemble 40020d CRASH! Disassemble 40020f 40020d mov [6000a4], 1 .data 6000a0 “cat\x00” 6000a0 .long 0xc0debeef 6000a4 .long 0x0 6000a4 6000a8 Non-relocatable Assembly

Patch & Assemble Disassemble Relocatable Assembly .text mov [data_0], eax jmp target ... CRASH! target mov [data_1], 1 .data new “cat\x00” data_0 .long 0xc0debeef data_1 .long 0x0 .text mov [data_0], eax jmp target ... mov [data_1], 1 .data data_0 .long 0xc0debeef data_1 .long 0x0 Patch & Assemble Disassemble Relocatable Assembly

.text .rodata .data .bss Code regions Data regions 0x80486f0 0x804d000 0x804e000

Uroboros .text .rodata .data .bss 0x80486f0 .text: 80488e3: push ebp 80488e4: mov ebp, esp 80488e6: sub esp, 0x48 80488e9: mov DWORD PTR [ebp-0x10], 0x0 80488f0: mov DWORD PTR [ebp-0xc], 0x0 80488f7: mov DWORD PTR [ebp-0xc], 0x80540a0 80488fe: mov eax, 0xfb7 8048903: mov WORD PTR [ebp-0x10], ax 8048907: mov eax, ds:0x805be60 804890c: test eax, eax 804890e: jne 0x804895b 8048910: mov eax, ds:0x805be5c ... .data: 804d538: ec 8e 04 08 804d53c: 05 8f 04 08 804d540: 1e 8f 04 08 0x48 0x804d000 0x0 0x0 0x80540a0 0x804d200 0xfb7 0x805be60 0x804e000 0x804895b 0x805be5c Uroboros 0x8048eec 0x8048f05 USENIX Sec ‘15 0x8048f1e

Problems

False Positives False Negatives Thanks xkcd :-)

Problem: Value Collisions False Positives Problem: Value Collisions 8060080 .db 3d 8060081 .db ec 8060082 .db 04 8060083 .db 08 /* stored at 0x8060080 */ static float a = 4e-34; Byte Representation A Floating-point Variable a 8060080 label_804ec3d Interpreted as a Pointer

Problem: Compiler Optimization False Negatives Problem: Compiler Optimization int ctrs[2] = {0}; int main() { int input = getchar(); switch (input – 'A') case 0: ctrs[input – 'A']++; break; ... } int ctrs[2] = {0}; int main() { int input = getchar(); switch (input – 'A') case 0: ctrs[input – 'A']++; break; ... } A code snippet allows constant folding

Problem: Compiler Optimization False Negatives Problem: Compiler Optimization int ctrs[2] = {0}; int main() { int input = getchar(); switch (input – 'A') case 0: ctrs[input – 'A']++; break; ... } 0x804a034 – ‘A’ * sizeof(int) = 0x8049f30 ; Assuming ctrs is stored at 0x804a034 ; eax holds the input character ; ctrs[input – 'A']++; add 0x8049f30[eax * 4], 1 ... .bss 804a034: ctrs[0] 804a038: ctrs[1] 0x8049f30 does not belong to any section A code snippet allows constant folding Compiled in Clang with –O1

Our Approach

Naïve Strategy False Positives False Negatives Heuristics

Ramblr Heuristics False Positives False Negatives Program Analysis

Content Classification Pipeline 0x804850b Pointer 0xa Integer 0xdc5 63 61 74 00 String 0x80484a2 0x804840b 0xa0000 push offset label_34 push offset label_35 cmp eax, ecx jne label_42 .label_42: mov eax, 0x12fa9e5 ... Symbolization & Reassembly CFG Recovery Content Classification

Content Classification Pipeline 0x804850b Pointer 0xa Integer 0xdc5 63 61 74 00 String 0x80484a2 0x804840b 0xa0000 push offset label_34 push offset label_35 cmp eax, ecx jne label_42 .label_42: mov eax, 0x12fa9e5 ... Symbolization & Reassembly CFG Recovery Content Classification

Recursive Disassembly CFG Recovery 31 ed 5e 89 e1 83 e4 f0 50 54 52 68 00 25 05 08 0x80486f0: xor ebp, ebp pop esi mov ecx, esp and esp, 0xfffffff0 push eax push esp push edx ... Recursive Disassembly Iterative Refinement

Content Classification ((value * 42) ^ 5) / 3 *((int*)0x8045010) value * 42 5 xor / 3 result .text .rodata .data .bss *ptr A Typical Pointer A Typical Value

Content Classification Type Category Examples Primitive types Pointers, shorts, DWORDs, QWORDs, Floating-point values, etc. Strings Null-terminated ASCII strings, Null-terminated UTF-16 strings Jump tables A list of jump targets Arrays of primitive types An array of pointers, a sequence of integers Data Types that Ramblr Recognizes

Content Classification MOVe Scalar Double-precision floating-point value Two floating-points 804d750 Floating point integer 804d758 movsd xmm0, ds:0x804d750 movsd xmm1, ds:0x804d758 Recognizing Types during CFG Recovery

Content Classification chr = _getch(); switch (i) { case 1: a += 2; break; case 2: b += 4; break; case 3: c += 6; break; default: a = 0; break; } switch (i) { case 1: ... case 2: case 3: default: } Slicing Recognizing Types with Slicing & VSA

Content Classification jmp table[i * 4] Quit switch i = [0, 2] with a stride of 1 A jump table of 3 entries table[0] Pointer, jump target table[1] table[2] VSA Recognizing Types with Slicing & VSA

Base Pointer Reattribution False Negatives Base Pointer Reattribution int ctrs[2] = {0}; int main() { int input = getchar(); switch (input – 'A') case 0: ctrs[input – 'A']++; break; ... } ; Assuming ctrs is stored at 0x804a034 ; eax holds the input character ; ctrs[input – 'A']++; add 0x8049f30[eax * 4], 1 ... .bss 804a034: ctrs[0] 804a038: ctrs[1] 0x8049f30 does not belong to any section A code snippet allows constant folding Compiled in Clang with –O1

Base Pointer Reattribution False Negatives Base Pointer Reattribution Belongs to .bss ; Assuming ctrs is stored at 0x804a034 ; eax holds the input character ; ctrs[input – 'A']++; add 0x8049f30[eax * 4], 1 ... .bss 804a034: ctrs[0] 804a038: ctrs[1] 0x8049f30 ‘A’ * 4 Constant un-folding add 0x804a034 0x8049f30 does not belong to any section The Slicing Result Compiled in Clang with –O1

Safety Heuristics: Data Consumer Check DWORD PTR [0x8045000] DWORD PTR [0x8045f08] xor *(the_result) I GIVE UP Unusual Behaviors Triggering the Opt-out Rule

Symbolization & Reassembly 0x400010 label_34 0x400020 label_35 0x400a14 label_42 ... 0x406000 data_3 push offset label_34 push offset label_35 cmp eax, ecx jne label_42 .label_42: mov eax, 0x12fa9e5 ... Symbolization Assembly Generation

Evaluation

Data sets Coreutils 8.25.55 Binaries from CGC Programs 106 143 Compiler CGC 5 Clang 4.4 Optimization levels O0/O1/O2/O3/Os/Ofast Architectures X86/AMD64 X86 Test cases Yes Total binaries 1272 725

Brief Results: Success Rate

Ramblr is the foundation of ... Patching Vulnerabilities Obfuscating Control Flows Optimizing Binaries Hardening Binaries

Conclusion

Conclusion Identified challenges in reassembling Proposed a novel composition of static analysis techniques Developed a systematic approach to reassemble stripped binaries Ramblr is open-sourced Extra data-sets and usable tools will be released soon

Patching support in angr Management Tools Patching support in angr Management Ramblr IDA Plugin Patcherex