Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004.

Slides:



Advertisements
Similar presentations
Practical Malware Analysis
Advertisements

PASTE 2011 Szeged, Hungary September 5, 2011 Labeling Library Functions in Stripped Binaries Emily R. Jacobson, Nathan Rosenblum, and Barton P. Miller.
ByteWeight: Learning to Recognize Functions in Binary Code
University of Washington Procedures and Stacks II The Hardware/Software Interface CSE351 Winter 2013.
© 2006 Nathan RosenblumMarch 2006Unconventional Code Constructs The New Dyninst Code Parser: Binary Code Isn't as Simple as it Used to Be Nathan Rosenblum.
Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Web:
%rax %eax %rbx %ebx %rdx %edx %rcx %ecx %rsi %esi %rdi %edi %rbp %ebp %rsp %esp %r8 %r8d %r9 %r9d %r11 %r11d %r10 %r10d %r12 %r12d %r13 %r13d.
Introduction to Information Security מרצים : Dr. Eran Tromer: Prof. Avishai Wool: מתרגלים : Itamar Gilad
Binghamton University CS-220 Spring 2015 Binghamton University CS-220 Spring 2015 x86 Assembler.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Accessing parameters from the stack and calling functions.
X86 ISA Compiler Baojian Hua Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer.
Recitation 2: Assembly & gdb Andrew Faulring Section A 16 September 2002.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Y86 Processor State Program Registers
1 Carnegie Mellon Stacks : Introduction to Computer Systems Recitation 5: September 24, 2012 Joon-Sup Han Section F.
University of Washington Today More on procedures, stack etc. Lab 2 due today!  We hope it was fun! What is a stack?  And how about a stack frame? 1.
Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison
Andrew Bernat, Bill Williams Paradyn / Dyninst Week Madison, Wisconsin April 29-May 1, 2013 New Features in Dyninst
Paradyn Project Petascale Tools Workshop Madison, Wisconsin Aug 4-Aug 7, 2014 Binary Code is Not Easy Xiaozhu Meng, Emily Gember-Jacobson, and Bill Williams.
Introduction to Information Security מרצים : Dr. Eran Tromer: Prof. Avishai Wool: מתרגלים : Itamar Gilad
Recitation 2: Outline Assembly programming Using gdb L2 practice stuff Minglong Shao Office hours: Thursdays 5-6PM Wean Hall.
CSC 2400 Computer Systems I Lecture 9 Deeper into Assembly.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
5. Assembly Language. Basics of AL Program data Pseudo-ops Array Program structures Data, stack, code segments.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
Stack Usage with MS Visual Studio Without Stack Protection.
Compiler Construction Code Generation Activation Records
Improvements to the Compiler Lecture 27 Mon, Apr 26, 2004.
Carnegie Mellon Midterm Review : Introduction to Computer Systems October 15, 2012 Instructor:
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily.
Linking I Topics Assembly and symbol resolution Static linking Systems I.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
1 Machine-Level Programming V: Control: loops Comp 21000: Introduction to Computer Organization & Systems March 2016 Systems book chapter 3* * Modified.
IA32: Control Flow Topics –Condition Codes Setting Testing –Control Flow If-then-else Varieties of Loops Switch Statements.
Spring 2016Assembly Review Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);
Machine-Level Programming 2 Control Flow Topics Condition Codes Setting Testing Control Flow If-then-else Varieties of Loops Switch Statements.
Recitation 3: Procedures and the Stack
A job ad at a game programming company
Reading Condition Codes (Cont.)
Machine-Level Programming 2 Control Flow
Instruction Set Architecture
Conditional Branch Example
143A: Principles of Operating Systems Lecture 4: Calling conventions
Recitation 2 – 2/11/02 Outline Stacks & Procedures
Aaron Miller David Cohen Spring 2011
x86 Lite for Compiler Writers Hal Perkins Autumn 2009
Introduction to Compilers Tim Teitelbaum
Recitation 2 – 2/4/01 Outline Machine Model
Assembly Language Programming V: In-line Assembly Code
Emily Jacobson and Nathan Rosenblum
Causes of Performance Swings Due to Code Placement in IA
Machine-Level Programming 1 Introduction
Y86 Processor State Program Registers
Instructors: Majd Sakr and Khaled Harras
Machine-Level Programming 5 Structured Data
Machine-Level Programming 4 Procedures
Introduction to Intel x86-64 Assembly, Architecture, Applications, & Alliteration Xeno Kovah – 2014 xkovah at gmail.
Introduction to Intel x86-64 Assembly, Architecture, Applications, & Alliteration Xeno Kovah – 2014 xkovah at gmail.
Machine-Level Programming 2 Control Flow
Fundamentals of Computer Organisation & Architecture
Introduction to Intel x86-64 Assembly, Architecture, Applications, & Alliteration Xeno Kovah – 2014 xkovah at gmail.
Machine-Level Programming III: Procedures Sept 18, 2001
Machine-Level Programming 2 Control Flow
Khaled Yakdan University of Bonn Fraunhofer FKIE
X86 Assembly Review.
Get To Know Your Compiler
ICS51 Introductory Computer Organization
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
Presentation transcript:

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Program Provenance Guessing the Source Compiler from Binary Code Nathan Rosenblum

Why compiler provenance? 2 Guessing the Source Compiler IDA Pro

Why should this work? 3 Guessing the Source Compiler

4 test edi,edi jle 4004ae mov eax,0x0 lea eax,[rdx+rax] imul edx,eax add eax,0x1 cmp edi,eax jg 4004a1 mov eax,edx ret xor edx,edx test edi,edi jle add edx,eax imul eax,edx inc edx cmp edx,edi jl 40097e ret int bar(int foo) { int i, j; for(i=0;i<foo;++i) { i = j + i; j *= i; } return j; } GCCICC

Modeling binary code 5 Guessing the Source Compiler program binary gcc icc i i ₋₁ i ₊₁ i ₊₂ icc none …… compiler labels … c ff d0 c9 c ec e b b4 24 ec … underlying bytes 8d b d bc c c c c 9b match_init zp_init_keys seekable padding addrs. data

Describing code 6 Guessing the Source Compiler 〈mov [IMM], RAX ; * ; sub [IMM], RAX〉 abstracts several IA32 opcodes single-instruction wildcard hide immediate values …… instruction-level control flow- level branch 〈mov [IMM], RAX ; * ; sub [IMM], RAX〉 〈add[IMM], RDX ; * ; sub RAX, RCX〉 〈push EBP ; mov ESP, EBP〉 〈shl[IMM], RAX ; shr[IMM], RAX〉 〈 *; * ; sub [IMM], RAX〉 [math elided]

Guessing the Source Compiler Results [R, Miller, Zhu PASTE ‘10] single compiler mixed compiler GCC ICCMSVC 92.5% 93.7% 5.3% 2.3% or 2.8% 6.4% error types

Finer detail: compiler versions, optimization 8 Guessing the Source Compiler Major versions? Minor versions? Low optimization vs. high optimization? Highly optimized code? GCC 3.x vs 4.x GCC 4.2 vs 4.3 GCC -O0 vs -O3 GCC –O2 vs –O3 easy 99% easy85-99% easy99% hard60%

Future work 9 Guessing the Source Compiler int bar(int foo) { int i, j; for(i=0;i<foo;++i) { i = j + i; j *= i; } return j; }