Presentation is loading. Please wait.

Presentation is loading. Please wait.

FSE’14 Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Application to Software Plagiarism Detection Lannan Luo, Jiang Ming,

Similar presentations


Presentation on theme: "FSE’14 Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Application to Software Plagiarism Detection Lannan Luo, Jiang Ming,"— Presentation transcript:

1 FSE’14 Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Application to Software Plagiarism Detection Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu The Pennsylvania State University, University Park Hello, everyone. I am Lannan Luo. Very pleased to be here to talk about our work, semantics-based, obfuscation-reSILient binary code similarity comparison, which can be applied to software plagiarism detection. This is joint work with Jiang Ming, Dinghao Wu, Peng Liu and Sencun Zhu. We are from Penn State University.

2 ? Research Problem A set of functions, or a whole program. Program B
Program A Program B ? Similar Component The basic research problem for code similarity measurement is to detect whether a component in one program is similar to a component in another program, and measure their similarity quantiTAtively. A component here can be a set of functions or a whole program. A set of functions, or a whole program.

3 Challenges Code obfuscation techniques
No source code available Different compilers & compiler optimization levels Code obfuscation techniques Detecting code similarity, however, is challenging. First, the source code of both programs usually is not available, especially, the source code of the suspicious program. Second, an ADversary can use different compilers, different compiler optimization levels, as well as code obfuscation techniques to transforms the stolen code and then generate different binary code to hide its appearance and logic.

4 -O0 vs. –O2 high dissimilarity (around 60 optimizations)
Examples if (condition) { /*then branch*/ } -O0 vs. –O2 high dissimilarity (around 60 optimizations) Let’s first start with some examples to illustrate these challenges. My first example is an if/else code segment. The if/else code segment only contains the then branch, without the else branch. When we compile it using GCC with –O0, the default optimization level, we get two basic blocks. However, when we compile it with –O2, we get only one basic block. The reason is that GCC with -02 optimizes the code segment using cmovs, a conditional move instruction, resulting in different basic blocks as well as different control flow graph. Furthermore, GCC with different optimization levels, like –O0 and –O2, will generate highly dissimilar binary code, due to GCC with –O2 has around 60 optimizations, compared to –O0. GCC –O2 GCC –O0

5 Code obfuscation dramatically transform code in various ways
Examples Code obfuscation: switch/case  if/else CIL Code obfuscation dramatically transform code in various ways Using different compiler optimization levels to avoid detection is the easiest and simplest method. A more advanced method is to apply code obfuscation techniques. My next example illustrates one kind of code obfuscation, which changes the switch/case statement to if/else statement. This is done by CIL, a source-to-source transformation tool we used in our experiments. When we compile the switch/case statement, we get a control-flow-graph with a jump table. However, if we use CIL to transform the statement to if/else statement and then compile it again, we get a control-flow-graph with a binary search tree. As we can see, the two control-flow-graphs are quite different from each other. Besides this kinds of obfuscation, there are many other obfuscations, like function inline and outline, opaque predicate injection, control-flow flattening, etc. Therefore, code obfuscation techniques can dramatically transform code in various ways to hide its appearance and logic. switch/case if/else

6 Requirements Basic Binary code-based Obfuscation resilient Other Partial code reuse detection These bring us to the requirements of a code similarity comparison approach. First, the basic requirements are: this approach should be binary code-based, as well as obfuscation resilient. Second, as demonstrated by our research problem, we also require that this approach should be able to detect partial code reuse.

7 Existing methods… ✗ ✔ ✔ ✗ ✔ ✔/✗ ✗ Software plagiarism detection
Requirements Binary code-based Obfuscation resilient Partial code detection Clone detection MOSS, JPLag, etc. Binary similarity detection Bdiff, DarunGrim2, etc. Software plagiarism detection Birthmark based, core value based, etc. ✔/✗ However, most of the existing methods cannot satisfy all of these requirements. First of all, clone detection [click], such as MOSS, and JPLag, although they can detect partial code reuse, they assume the availability of source code and minimal code obfuscation. For the binary similarity detection [click], such as Bdiff and DarunGrim2, they are binary code-based, but they do not consider obfuscation in general and hence is not obfuscation reSILient. Most of them can detect partial code reuse. With respect to software plagiarism detection[click], different approaches have also been proposed, such as birthmark based, and core value based. They are based on binary code analysis, but birthmark based methods incur false negatives when some code obfuscation techniques are applied, while core value based is obfuscation resilient, but it is input sensitive and cannot be applied to detect partial code reuse. CONsequently, few of the existing methods satisfies all of these requirements.

8 CoP: a binary-oriented obfuscation-resilient method
Based on a new concept: Longest Common Subsequence (LCS) of Semantically Equivalent Basic Blocks (SEBB) SEBB Obfuscation resiliency LCS Scalability To solve these challenges, we propose a binary-oriented, obfuscation-reSILient method named CoP. CoP is based on a new concept, longest common subsequence of semantically equivalent basic blocks, We call it LCS of SEBB for short. It combines program semantics with longest common subsequence based fuzzy matching. The SEBB guarantees the obfuscation resiliency of CoP, and LCS ensures both scalability and obfuscation resiliency.

9 Architecture Three levels: Basic block level Path level
Whole component level Let’s first look at the architecture of CoP. The inputs are the binary code of the plaintiff and suspicious programs. The front-end disassembles the binary code, builds an intermediate representation, and constructs control-flow graphs and call graphs. CoP models program semantics at three different levels: basic block, path, and whole component, and combines two techniques: Basic block similarity computation and Longest common subsequence of semantically equivalent basic block computation. Basic block similarity computation models semantics at the basic block level. It is a base for the longest common subsequence of semantically equivalent basic blocks computation, which models semantics at the path level. The whole component semantics is then modeled collectively as multiple path semantics. Now, let’s first look at the basic block similarity computation. Slow down when present LCS of SEBB

10 Basic block similarity computation
Symbolic execution Obtain a set of symbolic formulas that represent the input-output relations of basic blocks. Theorem proving Check the pair-wise equivalence of the symbolic formulas of two basic blocks. Calculate a basic block similarity score > a threshold  semantically equivalent basic blocks The basic block similarity comparison relies on: symbolic execution and theorem proving. We use symbolic execution to obtain a set of symbolic formulas representing the input-output relations of basic blocks, and then use a theorem prover to check the pair-wise equivalence of the symbolic formulas of two basic blocks, one from the plaintiff program and another from the suspicious program, and calculate a basic block similarity score [click]. When the score is above the threshold [click], we regard [click] them as “semantically equivalent basic blocks” during the next step.

11 Semantically equivalent basic blocks
An Example Obfuscated block u = x + 10; v = y – 10; s = u + v; t = u – v; r = x + 1; Plaintiff block Suspicious block Semantically equivalent basic blocks 100% p = a + b; q = a – b; s = x + y; t = x – y; p = f1 (a, b) = a + b; q = f2 (a, b) = a - b; Symbolic execution Symbolic execution s = f3 (x, y) = x + y; t = f4 (x, y) = x - y; u = f3 (x) = x + 10; v = f4 (y) = y – 10; s = f5 (u, v) = u + v; t = f6 (u, v) = u – v; r = f7 (x) = x + 1; a = x ∧ b = y  p = s a = x ∧ b = y  q = t Here, I’d like to use a simple example to show how it works. In this slide, there are two blocks, a plaintiff block, and a suspicious block. Through symbolic execution, we get two symbolic formulas for each of the two basic blocks. We then use a theorem prover to pair-wise compare their symbolic formulas, and find two equivalence formulas. We use the number of equivalence formulas divided by the number of output variables from the plaintiff block, which is 2 divided by 2, and get a similarity score of 100%, and conclude that the two basic blocks are semantically equivalent. However, if some obfuscation are applied to the suspicious block, we will get an obfuscated block. To check their similarity, we also use symbolic execution to obtain the symbolic formulas of the obfuscated block, and then pair-wise compare their symbolic formulas. In this way, we get two equivalence formulas. We also use the number of equivalence formulas divided by the number of output variables from the plaintiff block and get a similarity score of 100%, and conclude that the two basic blocks are semantically equivalent. a = x ∧ b = y  p = s a = x ∧ b = y  q = t

12 Examples of semantically equivalent basic blocks with very different instructions
Our basic block similarity comparison technique actually works on binary code. This slides shows four examples of semantically equivalent basic blocks with very different instructions. As you can see, these four pairs are quite different with respect to semantics equivalence comparison. However, our technique still effectively identify that all pairs are semantically equivalent.

13 Architecture Three levels: Basic block level Path level
Whole component level Now that we’ve seen the basic block similarity computation technique, let’s talk about the second technique: longest common subsequence of semantically equivalent basic block computation. It aims to model program semantics at the path level, as well as the whole component level.

14 Path similarity comparison
Improving Software Security with Concurrent Monitoring, Automated Diagnosis, and Self-shielding Plaintiff Suspicious ? S1 S2 Sp Sp S3 S3 S3 Let’s revisit the research problem I presented at the beginning. We want to detect whether a component in the plaintiff program is similar to a component in the suspicious program. Since we don’t know where we should start the comparison, we first need to identify the starting blocks, which are the SEBB. We assume with respect to Sp in the plaintiff component, we find three SEBB, S1, S2 and S3 in the suspicious program as the starting blocks. We then compare the plaintiff component to each of the three suspicious components to find their semantics similarities. The highest similarity will be the final detection result. How to find each pair’s similarity? Assume, we want to find the similarity between the plaintiff component and the diamond suspicious component. We first select a set of linearly independent paths from the plaintiff component, and then compare each linearly independent path against the diamond suspicious component, to find its path similarity by computing LCS of SEBB, step 3. I will talk about this step later. All of the path similarities imply the components similarity. Now, let’s consider how to compute LCS of SEBB between a linearly independent path and a component to find path similarity. Step 1: Starting blocks. Step 2: Linearly independent paths. Step 3: Longest common subsequence (LCS) of semantically-equivalent-basic-block (SEBB) computation.

15 Computing LCS of SEBB Breadth-first search LCS dynamic programming
In this figure, We have a plaintiff linearly independent path P and a control flow graph of a suspicious component. We want to compute the longest common subsequence of semantically equivalent basic blocks between them to find the path similarity. To achieve this, We combine the breadth-first search, and the LCS dynamic programming to construct the LCS table. Due to time limit, I’ll not go into a detailed discussion on how to construct the LCS table step by step. Our paper have presented a very detailed description on this. Finally, from the constructed LCS table, we can obtain the highest score from the right-most column of the LCS table, which can be used to calculate a path similarity score. In this table, the highest score is 5.

16 Merge unmatched basic blocks
LCS Refinement Merge unmatched basic blocks Basic block reordering Conditional obfuscation Basic block splitting and merging To improve the obfuscation resiliency, we also developed the LCS refinement. The LCS refinement is to merge unmatched basic blocks, and then use the basic block comparison to determine whether or not two merged blocks are semantically equivalent. If so, the current LCS will be extended by these merged blocks. It can deal with basic block splitting and merging, Basic block reordering, as well as conditional obfuscation.

17 Evaluation Obfuscation resiliency Scalability
Experiments: thttpd, openssl, and gzip Different compiler optimization levels Different compilers Different code obfuscation techniques. Compared with MOSS, JPLag, Bdiff, and DarunGrim2. Scalability Gecko vs. Firefox Now that we’ve discussed our system design, let’s talk about our evaluation results. For our evaluation, we evaluated our tool on a set of benchmarks to measure its obfuscation resiliency and scalability. To measure the obfuscation resiliency, we conducted three experiments, thttpd, openssl, and gzip. For each experiment, we applied different compiler optimization levels, different compilers, and different code obfuscation techniques, and compared our results with MOSS, JPLag, Bdiff and DarunGrim2. To measure the scalability, we evaluated Gecko against Firefox. Now, Let’s look at each of these experiments.

18 thttpd vs. sthttpd: Different compiler optimization levels
T0: thttpd –O0 T2: thttpd –O2 S0: shttpd –O0 S2: shttpd –O2 In the first experiment, we evaluated thttpd and sthttpd. Sthttpd is a forked from thttpd, thus their codebase are similar. We first measure the resilience to transformation through different Compiler Optimization Levels. In this figure, there are four executables. TO is thttpd compiled by GCC with –O0, T2 is thttpd compiled by GCC with –O2, this is the same for S0 and S2. The black circle corresponds to the detection results of CoP, and the white diamond corresponds to DarunGrim2. If we examine the left half of this Figure, when the same optimizations level is applied, we see that both of them have good results. However, if we examine the right half of the figure, when different optimization levels are applied, the average similarity score from DarunGrim2 is only about 13%, while our tool is about 88%. Thus, our tool is quite effective compared to DarunGrim2.

19 thttpd vs. sthttpd : Different compilers
TG: thttpd GCC TI : thttpd ICC SG: shttpd GCC SI : shttpd ICC We also measure the resiliency to transformation through different compilers, GCC and ICC. In this figure, TG is thttpd compiled by GCC, TI is thttpd compiled by ICC, this is the same for SG and SI. In a similar way, if we examine the left half of this Figure, when the same compiler is applied, we see that both of them have good results. However, if we examine the right half of the figure, when different compilers are applied, our tool still reports good similarity scores, but DarunGrim2 failed to recognize the similarity.

20 Code obfuscation resiliency testing
Source code obfuscation tools Semantic Designs Inc.’s C obfuscator Stunnix’s CXX-obfuscator Binary code obfuscation tools Diablo Loco CIL: possesses many useful source code transformation techniques. Our next experiment measured the resiliency to transformation through code obfuscation techniques. For the source code obfuscation tools, we used Semantic Designs Inc.’s C obfuscator and Stunnix’s CXX-obfuscator. For the binary code obfuscation tools, we used Diablo /’di ai bu lou/ and Loco. Moreover, we also use CIL as another source code obfuscation tool, since it has many useful source code transformation techniques.

21 The detection results are shown in this slide
The detection results are shown in this slide. We compared our tool with MOSS, JPLag, DarunGrim2 and Bdiff. We evaluated on code when a single and multiple obfuscations were applied, and show the results in Table 1 and Table 2, respectively. The last column is the detection results of CoP. Here, I will focus on the three categories of code obfuscation: layout, control flow and data flow obfuscations. First, layout obfuscation do not affect DarunGrim2, Bdiff and CoP, but impair MOSS and JPLag. Second, control flow obfuscation affect MOSS, JPLag, DarunGrim2 and Bdiff, but have little impact on CoP. This is similar for data flow obfuscation. Finally, when multiple obfuscation are applied, CoP also reported good results, but the others failed to identify the similarities. It should be clear now that CoP has good obfuscation resiliency, compared to the others. They are very solid results. You can refer to our paper for more information if you are interested.

22 thttpd vs. independent programs
To measure false positives, we tested our tool against four independently developed programs. thttpd-2.25b atphttpd-0.4b boa lighttpd Very low similarity scores (below 2%) were reported. To measure false positives, we further tested our tool against four independently developed programs. Very low similarity scores (below 2%) were reported.

23 Gecko vs. Firefox : Scalability
Gecko vs. Opera Gecko vs. Chrome CoP reported scores below 3% for all cases. The last experiment is to measure the scalability. We chose Gecko as the plaintiff component, and tested it against Firefox. We have 8 versions of Gecko and 8 versions of Firefox. We cross checked each pair of them and show the results in this Figure. This figure contains 8 lines corresponding to the 8 version of Gecko. The highest score of each line represents the case where the Gecko version is included in that Firefox version. For example, if we examine the black line with respect to Gecko-1.8.0, its highest score corresponds to Firefox-1.5, indicating Firefox-1.5 contains this Gecko version. When we move along this line, the similarity scores decrease, indicating the possibility that the corresponding Firefox contains this Gecko version decrease. This is the same for others. Therefore, we can see that the closer two versions are the more similar their code is. To measure false positives, we also tested Gecko against Opera and Google Chrome. Our tool reported scores below 3% for all cases.

24 Summary We propose a binary-oriented, obfuscation-resilient code similarity comparison approach, named CoP. CoP is based on a new concept, Longest Common Subsequence (LCS) of Semantically Equivalent Basic Blocks (SEBB). Our experimental results show that CoP is effective and practical when applied to real-world software. In summary, This novel combination has resulted in more resiliency to code obfuscation.

25 At this moment, I will be very happy to take any questions. Thanks you!


Download ppt "FSE’14 Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Application to Software Plagiarism Detection Lannan Luo, Jiang Ming,"

Similar presentations


Ads by Google