Download presentation

Presentation is loading. Please wait.

Published byNatalie Woodis Modified about 1 year ago

1
Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu The Pennsylvania State University, University Park Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Application to Software Plagiarism Detection FSE’14

2
Research Problem A set of functions, or a whole program. 2 Program A Program B ? Component Similar

3
Challenges 3 No source code available Different compilers & compiler optimization levels Code obfuscation techniques

4
Examples 4 GCC –O2 GCC –O0 if (condition) { /*then branch*/ }

5
Examples Code obfuscation: switch/case if/else 5 if/else switch/case CIL

6
Requirements 6 Binary code- based Obfuscation resilient Basic Partial code reuse detection Other

7
Existing methods… 7 Clone detection MOSS, JPLag, etc. ✗✗✔ Binary similarity detection Bdiff, DarunGrim2, etc. ✔ ✗ ✔ Requirements Binary code- based Obfuscation resilient Partial code detection Software plagiarism detection Birthmark based, core value based, etc. ✔ ✔/✗✔/✗ ✗

8
CoP: a binary-oriented obfuscation- resilient method Based on a new concept: Longest Common Subsequence (LCS) of Semantically Equivalent Basic Blocks (SEBB) 8 Obfuscation resiliency SEBB Scalability Obfuscation resiliency LCS

9
Architecture 9 Three levels: Basic block level Path level Whole component level

10
Basic block similarity computation Symbolic execution Obtain a set of symbolic formulas that represent the input-output relations of basic blocks. Theorem proving Check the pair-wise equivalence of the symbolic formulas of two basic blocks. Calculate a basic block similarity score > a threshold semantically equivalent basic blocks 10

11
An Example 11 p = a + b; q = a – b; s = x + y; t = x – y; u = x + 10; v = y – 10; s = u + v; t = u – v; r = x + 1; p = f 1 (a, b) = a + b; q = f 2 (a, b) = a - b; Symbolic execution s = f 3 (x, y) = x + y; t = f 4 (x, y) = x - y; Symbolic execution a = x ∧ b = y p = s a = x ∧ b = y q = t u = f 3 (x) = x + 10; v = f 4 (y) = y – 10; s = f 5 (u, v) = u + v; t = f 6 (u, v) = u – v; r = f 7 (x) = x + 1; a = x ∧ b = y p = s a = x ∧ b = y q = t Semantically equivalent basic blocks Plaintiff block Suspicious block Obfuscated block 100%

12
Examples of semantically equivalent basic blocks with very different instructions 12

13
Architecture 13 Three levels: Basic block level Path level Whole component level

14
Path similarity comparison Improving Software Security with Concurrent Monitoring, Automated Diagnosis, and Self-shielding 14 ? S3S3 S2S2 S1S1 SpSp Plaintiff Suspicious SpSp S3S3 S3S3 Step 1: Starting blocks. Step 2: Linearly independent paths. Step 3: Longest common subsequence (LCS) of semantically- equivalent-basic-block (SEBB) computation.

15
Computing LCS of SEBB Breadth-first search LCS dynamic programming 15

16
LCS Refinement 16

17
Evaluation Obfuscation resiliency Experiments: thttpd, openssl, and gzip Different compiler optimization levels Different compilers Different code obfuscation techniques. Compared with MOSS, JPLag, Bdiff, and DarunGrim2. Scalability Gecko vs. Firefox 17

18
thttpd vs. sthttpd: Different compiler optimization levels 18 T0: thttpd –O0 T2: thttpd –O2 S0: shttpd –O0 S2: shttpd –O2

19
thttpd vs. sthttpd : Different compilers 19 TG: thttpd GCC TI : thttpd ICC SG: shttpd GCC SI : shttpd ICC

20
Code obfuscation resiliency testing Source code obfuscation tools Semantic Designs Inc.’s C obfuscator Stunnix’s CXX-obfuscator Binary code obfuscation tools Diablo Loco CIL: possesses many useful source code transformation techniques. 20

21
21

22
thttpd vs. independent programs To measure false positives, we tested our tool against four independently developed programs. thttpd-2.25b atphttpd-0.4b boa lighttpd Very low similarity scores (below 2%) were reported. 22

23
Gecko vs. Firefox : Scalability 23 Gecko vs. Opera Gecko vs. Chrome CoP reported scores below 3% for all cases.

24
Summary We propose a binary-oriented, obfuscation-resilient code similarity comparison approach, named CoP. CoP is based on a new concept, Longest Common Subsequence (LCS) of Semantically Equivalent Basic Blocks (SEBB). Our experimental results show that CoP is effective and practical when applied to real-world software. 24

25
25

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google