Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004.

Similar presentations


Presentation on theme: "Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004."— Presentation transcript:

1 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Program Provenance Guessing the Source Compiler from Binary Code Nathan Rosenblum

2 Why compiler provenance? 2 Guessing the Source Compiler IDA Pro

3 Why should this work? 3 Guessing the Source Compiler

4 4 test edi,edi jle 4004ae mov eax,0x0 lea eax,[rdx+rax] imul edx,eax add eax,0x1 cmp edi,eax jg 4004a1 mov eax,edx ret xor edx,edx test edi,edi jle 400989 add edx,eax imul eax,edx inc edx cmp edx,edi jl 40097e ret int bar(int foo) { int i, j; for(i=0;i<foo;++i) { i = j + i; j *= i; } return j; } GCCICC

5 Modeling binary code 5 Guessing the Source Compiler program binary gcc icc i i ₋₁ i ₊₁ i ₊₂ icc none …… compiler labels … c7 04 24 10 70 05 08 ff d0 c9 c3 90 81 ec e4 00 00 00 8b b4 24 ec 00 00 00 … underlying bytes 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 90 80 4c 90 80 4c 94 80 4c 98 80 4c 9b match_init zp_init_keys seekable padding addrs. data

6 Describing code 6 Guessing the Source Compiler 〈mov [IMM], RAX ; * ; sub [IMM], RAX〉 abstracts several IA32 opcodes single-instruction wildcard hide immediate values …… instruction-level control flow- level branch 011101011010 101010101110 101001010101 110001001001 011010110011 010101010101 010010011110 + 〈mov [IMM], RAX ; * ; sub [IMM], RAX〉 〈add[IMM], RDX ; * ; sub RAX, RCX〉 〈push EBP ; mov ESP, EBP〉 〈shl[IMM], RAX ; shr[IMM], RAX〉 〈 *; * ; sub [IMM], RAX〉 [math elided]

7 Guessing the Source Compiler Results [R, Miller, Zhu PASTE ‘10] 7 01110101101 01010101011 10101001010 10111000100 10010110101 10011010101 01010101001 single compiler mixed compiler GCC ICCMSVC 92.5% 93.7% 5.3% 2.3% or 2.8% 6.4% error types

8 Finer detail: compiler versions, optimization 8 Guessing the Source Compiler Major versions? Minor versions? Low optimization vs. high optimization? Highly optimized code? GCC 3.x vs 4.x GCC 4.2 vs 4.3 GCC -O0 vs -O3 GCC –O2 vs –O3 easy 99% easy85-99% easy99% hard60%

9 Future work 9 Guessing the Source Compiler int bar(int foo) { int i, j; for(i=0;i<foo;++i) { i = j + i; j *= i; } return j; }... 0111010110101 0101010111010 1001010101110 0010010010110 1011001101010 1010101010010 0111101010111 0100101101010


Download ppt "Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004."

Similar presentations


Ads by Google