Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu The Pennsylvania State University, University Park Semantics-Based Obfuscation-Resilient Binary.

Similar presentations


Presentation on theme: "Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu The Pennsylvania State University, University Park Semantics-Based Obfuscation-Resilient Binary."— Presentation transcript:

1 Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu The Pennsylvania State University, University Park Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Application to Software Plagiarism Detection FSE’14

2 Research Problem A set of functions, or a whole program. 2 Program A Program B ? Component Similar

3 Challenges 3 No source code available Different compilers & compiler optimization levels Code obfuscation techniques

4 Examples 4 GCC –O2 GCC –O0 if (condition) { /*then branch*/ }

5 Examples Code obfuscation: switch/case  if/else 5 if/else switch/case CIL

6 Requirements 6 Binary code- based Obfuscation resilient Basic Partial code reuse detection Other

7 Existing methods… 7 Clone detection MOSS, JPLag, etc. ✗✗✔ Binary similarity detection Bdiff, DarunGrim2, etc. ✔ ✗ ✔ Requirements Binary code- based Obfuscation resilient Partial code detection Software plagiarism detection Birthmark based, core value based, etc. ✔ ✔/✗✔/✗ ✗

8 CoP: a binary-oriented obfuscation- resilient method Based on a new concept: Longest Common Subsequence (LCS) of Semantically Equivalent Basic Blocks (SEBB) 8 Obfuscation resiliency SEBB Scalability Obfuscation resiliency LCS

9 Architecture 9 Three levels: Basic block level Path level Whole component level

10 Basic block similarity computation Symbolic execution Obtain a set of symbolic formulas that represent the input-output relations of basic blocks. Theorem proving Check the pair-wise equivalence of the symbolic formulas of two basic blocks. Calculate a basic block similarity score > a threshold  semantically equivalent basic blocks 10

11 An Example 11 p = a + b; q = a – b; s = x + y; t = x – y; u = x + 10; v = y – 10; s = u + v; t = u – v; r = x + 1; p = f 1 (a, b) = a + b; q = f 2 (a, b) = a - b; Symbolic execution s = f 3 (x, y) = x + y; t = f 4 (x, y) = x - y; Symbolic execution a = x ∧ b = y  p = s a = x ∧ b = y  q = t u = f 3 (x) = x + 10; v = f 4 (y) = y – 10; s = f 5 (u, v) = u + v; t = f 6 (u, v) = u – v; r = f 7 (x) = x + 1; a = x ∧ b = y  p = s a = x ∧ b = y  q = t Semantically equivalent basic blocks Plaintiff block Suspicious block Obfuscated block 100%

12 Examples of semantically equivalent basic blocks with very different instructions 12

13 Architecture 13 Three levels: Basic block level Path level Whole component level

14 Path similarity comparison Improving Software Security with Concurrent Monitoring, Automated Diagnosis, and Self-shielding 14 ? S3S3 S2S2 S1S1 SpSp Plaintiff Suspicious SpSp S3S3 S3S3 Step 1: Starting blocks. Step 2: Linearly independent paths. Step 3: Longest common subsequence (LCS) of semantically- equivalent-basic-block (SEBB) computation.

15 Computing LCS of SEBB Breadth-first search LCS dynamic programming 15

16 LCS Refinement 16

17 Evaluation Obfuscation resiliency Experiments: thttpd, openssl, and gzip Different compiler optimization levels Different compilers Different code obfuscation techniques. Compared with MOSS, JPLag, Bdiff, and DarunGrim2. Scalability Gecko vs. Firefox 17

18 thttpd vs. sthttpd: Different compiler optimization levels 18 T0: thttpd –O0 T2: thttpd –O2 S0: shttpd –O0 S2: shttpd –O2

19 thttpd vs. sthttpd : Different compilers 19 TG: thttpd GCC TI : thttpd ICC SG: shttpd GCC SI : shttpd ICC

20 Code obfuscation resiliency testing Source code obfuscation tools Semantic Designs Inc.’s C obfuscator Stunnix’s CXX-obfuscator Binary code obfuscation tools Diablo Loco CIL: possesses many useful source code transformation techniques. 20

21 21

22 thttpd vs. independent programs To measure false positives, we tested our tool against four independently developed programs. thttpd-2.25b atphttpd-0.4b boa lighttpd Very low similarity scores (below 2%) were reported. 22

23 Gecko vs. Firefox : Scalability 23 Gecko vs. Opera Gecko vs. Chrome CoP reported scores below 3% for all cases.

24 Summary We propose a binary-oriented, obfuscation-resilient code similarity comparison approach, named CoP. CoP is based on a new concept, Longest Common Subsequence (LCS) of Semantically Equivalent Basic Blocks (SEBB). Our experimental results show that CoP is effective and practical when applied to real-world software. 24

25 25


Download ppt "Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu The Pennsylvania State University, University Park Semantics-Based Obfuscation-Resilient Binary."

Similar presentations


Ads by Google