What is program obfuscation? Obfuscation is deliberately making software code so confusing that even those with access to the code can’t figure out what a program is going to do. “The art of making things appear more complicated”
Source: http://www.oreillynet.com/pub/a/mac/2005/04/08/code.html What does this function do?
Three main values: – Potency – Resilience – Cost Many methods in use: – Modify variable names and layout – Replace integer values with complex equations – Change program flow – Modify data structures – Anti-disassembly (“armored” viruses) – Anti-debugging
Obfuscation helps to bypass antivirus, delay security research response Obfuscated web code is often the first step in a “drive-by download” attack When the web code is executed by the browser it calls programs to target local software Result is infection of the user’s computer
Attempt to calculate impact of obfuscated online attacks: 1 http://www.itu.int/ITU-D/cyb/cybersecurity/docs/itu-study-financial- aspects-of-malware-and-spam.pdf 2 http://viruslist.com/en/analysis?pubid=204792056 3 http://www.securityfocus.com/brief/846 74% of malware spread via compromised websites 2 80% of browser-based attacks are now obfuscated 3 = $7.8 billion $13.2 billion direct damages of malware 1
Knowing is half the battle… A few tips to stop obfuscated “drive-by download” attacks Use NoScript to block active content on Firefox Don’t click on web ads Keep client-side software updated: Adobe Reader, Flash Player, Apple Quicktime, etc.
Program obfuscation has some positive uses as well!
Preventing source code theft – Disrupt reverse engineering – Block code copying – Especially important with the increased use of Java and.NET languages such as C# and Visual Basic which do not compile to machine code – Microsoft recommends obfuscating ASP files in case of server compromise Watermarking and Digital Rights Management (DRM)
“If obfuscation technology was ever perfected we would have perfect DRM and perfect malware. Yet, that outcome is unlikely. The computer ultimately has to decipher and follow a software program’s true instructions. Each new obfuscation technique has to abide by this requirement and, thus, will be able to be reverse engineered.” - Chris Wysopal Good Obfuscation, Bad Code
Oracle Access Used by [B+] to facilitate adversary model The oracle is some function Adversary makes query q to the oracle, receives answer f(q) Useful when studying obfuscation: oracle serves as an interface to the program without exposing contents
qq f(q) Adversary Oracle Program Adversary with Oracle Access
Virtual Black Box Anything one can efficiently compute from a virtual black box, one should be able to efficiently compute given just oracle access to the program. In other words, for any adversary A there exists a simulator S such that whatever A can learn given an obfuscated program, S can learn from oracle access to that program.
Speaks Spanish Answers in the form of a question q f(q) Tell me about yourself ¿Que quieres saber?
Adversary with access to the virtual black box Simulator with oracle access to the function
Circuit In the [B+] paper on obfuscation, a circuit represents a finite length Turing machine.
Circuits are easier to put in a virtual black box. Therefore obfuscating circuits is easier than obfuscating TMs. Proofs in the [B+] paper first prove theorems for TM then can easily extend to circuits.
Obfuscators An obfuscator is an algorithm О that will restrict what an adversary can learn about P given O(P).
What is the adversary trying to achieve? – A program that produces the same output as P – A program that produces output with some relation to the output of P – A function that computes some function of P – Decide some property of P The last achievement is the weakest, we want to prove that it is impossible.
TM Obfuscator A probabilistic algorithm O is a TM obfuscator if the following conditions hold…
Functionality: For every Turing machine M, the string O(M) describes a Turing machine that computes the same function as M.
Polynomial slowdown: The description length and running time of O(M) are at most polynomially larger than those of M
“Virtual black box” property: For any PPT A, there is a PPT S and a negligible function α such that for all TMs M
Circuit Obfuscator Same idea as TM Obfuscator but intuitively easier since a circuit computes a function with inputs of particular length Hence the proposition: If a TM obfuscator exists, then a circuit obfuscator exists Thus if we prove impossibility for circuit obfuscators, impossibility of TM obfuscators follows
Unobfuscatable Circuit Ensemble A family of circuits such that: – Every circuit c in the family is efficient – There exists a predicate π(c) such that π(c) is hard to compute with oracle access to the function that c computes π(c) is easy to compute with access to any circuit c’ that computes the same function as c
Main Proof Structure [B+] structure their Proof the Main Impossibility Result as follows: 1.Define obfuscators that are secure when applied to two programs 2.Show that such obfuscators do not exist 3.Modify the construction to prove that TM/circuit obfuscators do not exist 4.Show how this proof yields an unobfuscatable function ensemble
2-TM Obfuscator A 2-TM obfuscator is defined the same as a TM obfuscator but with a strengthened “virtual black box property”: the adversary has access to two obfuscated Turing machines.
Formal definition of the strengthened “virtual black box” property: Adversary with access to two obfuscated TMs Simulator with oracle access to the two TMs
Proposition: According to [B+], “the essence of this proof is that there is a fundamental difference between getting oracle access to a function and getting the program that computes it, no matter how obfuscated”.
Proof by contradiction… Suppose that there exists a 2-TM obfuscator O. Consider a function that cannot be learned by oracle queries, for example the following Turing machine:
Define another Turing machine such that: Consider an adversary A such that: A (C,D) = D(C)
Therefore S with oracle access to and must output 1 and with oracle access to and must output 0… but S cannot differentiate between the two so we have a contradiction.
The combination of the these equations contradict the fact that O is a 2-TM obfuscator: Recall that a 2-TM obfuscator O is defined with the “virtual black box” property that:
In the [B+] paper, the proof that 2-TM obfuscators do not exist is extended to show that 2-circuit obfuscators also do not exist.
TM Obfuscator [B+] extend the two-program obfuscation impossibility result to single program obfuscation. The extension is based on the ability to combine functions/TMs
In [B+] the combination of two functions is defined as. A program C is decomposed into by setting. By this definition, having oracle access to a combined function is the same as having oracle access to and individually.
Theorem: TM obfuscators do not exist. The adversary A is the same as before only modified to decompose the program that it receives.
Suppose for the sake of contradiction that exists TM obfuscator O. These equations contradict the virtual black-box property required for O being a TM obfuscator.
In [B+] this proof is extended to circuit obfuscators. The challenge with extending to circuit obfuscators is greater than expected – Size of the circuit is greater than the input length – Adapt the proof using homomorphic encryption properties
Unobfuscatable Circuit Ensembles The case against obfuscators is further strengthened by proving the existence of unobfuscatable circuit ensembles.
The unobfuscatable circuit ensemble is defined as Reminder:
We can now show that given any circuit that computes the same function as, we can reconstruct the latter. Since D’ computes the same function as D and, we have We can now reconstruct
Indistinguishability Obfuscator Obfuscation models weaker than the “virtual black box” may still be useful for software protection Indistinguishability obfuscator: Obfuscations of equivalent circuits of the same size should be computationally indistinguishable. Later works have shown this model to be impossible to achieve as well
Software Watermarking We would like to be able to “watermark” a program such that the code will always have a certain identifier that cannot be removed.
A good software watermarking scheme should have the following properties: – Functionality: The marked program computes the same function as the original program. – Meaningfulness: Most other programs don’t have this marking. – Fragility: It is infeasible to remove the mark from the program without (substantially) changing its behavior. [B+] sought to formalize the watermarking problem as it relates to obfuscation.
[B+] sketch a proof showing that no such watermarking scheme exists. For any unobfuscatable program, we know that an adversary will be able to take the obfuscated (marked) program and reconstruct the (unmarked) source code.
Conclusion [B+] have made progress in formalizing the concept of program obfuscation. They have shown that the “virtual black box” paradigm is impossible to satisfy. Somewhat strong obfuscation of some programs remains a possibility
Program obfuscation has an increasingly important role in the race between hackers and the information security community. Additional research must be made in order to increase the effectiveness of malware detection. Significant progress in obfuscation techniques may break the current signature-based detection model. Final Thoughts