Software Protection and Code obfuscation Amitabh Saxena Department of Information & Communication Technology University of Trento,

Software Protection and Code obfuscation Amitabh Saxena amitabh@dit.unitn.it Department of Information & Communication Technology University of Trento, Italy

Layout of talk What is “software protection”? What is “code-obfuscation”? What should an obfuscator be like? Discussion of Barak et al's result (CRYPTO 2001) (what does it imply ?) Some related work (2001-2007) Where do we go from here? Questions

Two problems related to software User's problem: Is this code good or bad  Viruses, Mal-ware, (bad programs)  Wants program code to be “easy” to understand and check for malicious behavior. Developer's problem: Is user good or bad ?  User may freely distribute program code (piracy)  Modify program code and use in unacceptable ways.  Also content distributor's problem (DRM) Software Protection techniques essentially aim to solve the latter problem

Developer's problem We will focus on “bad” user: We want to..  Stop piracy  Stop software modification  Generally ensure user cannot misuse software code Bad User will...  Try to circumvent registration (“cracking”)  Exploit weaknesses in code (eg. Buffer overflows)  Extract keys “hidden” within code (eg. DRM)  Essentially, try to “reverse-engineer” the program

Two different approaches Classic techniques: (hardware based)  Special Hardware based (eg. Dongles)  Earlier designs stored just data (eg. keys) (static)  Modern designs also have microprocessor, memory  Disadvantages: Expensive, intrusive.  “Trusted computing” has many issues. Modern ideas: (software only)  Make program code “hard” to understand, modify  Hide source code (closed-source!)  Online activation (eg. Windows XP)

Make software difficult to understand? Solves many problems:  Programmers can hide keys and other sensitive information and not worry about leakage.  DRM can be implemented nicely with software  An attacker cannot do much from the program code except “execute” the code on a CPU Creates some problems:  How do users trust that the programmers is “good”.?  Software may not always behave as promised Well, its better than “trusted hardware”...  Users generally trust developers. (reputation based)

Code Transformation Informally:  Make programs “hard to understand” and therefore “hard to reverse engineer”  But programmer must be able to understand!  Thus, we must transform code from a “readable” form to an “un-readable” (but still executable) form. In other words:  Transform readable program P into an unreadable program P', such P' retains all the functionality of P  It must be difficult to obtain any other “useful” information about the original program P from the transformed code P'

Code Transformation (obfuscation) What to retain? (I/O functionality)  A “good” user only looks at input/output of progeam Does not look “inside” the program However, bad users can do all that and more  Good user should not suffer.  Retain Input/output behavior as original program.  Thats all that matters to the good user What to hide? (other functionality)  A bad user looks “inside” the program,  Want to ensure that bad user cannot learn much more than a good user.  Hide program semantics (internal functionality) Algorithms, keys, etc.

What is obfuscation anyway? Obfuscate: To make so confused or opaque so as to be difficult to perceive or understand (American Heritage Dictionary) Classic example: Encryption (ciphertext is “useless” without the key!)

What about program obfuscation? Goal: Make programs hard to “reverse engineer” Why ?  Copy protection  Many other uses (have discussed these) Why not simply encrypt program?  Obfuscated program must be “Executable”!!  How about “Apart from the ability to execute, program is otherwise 'useless' ” ? Seems like a “reasonable” requirement.

What is an Obfuscator? An obfuscator is a probabilistic “compiler” O that takes as input a program P and outputs another program O(P) such that:  O(P) retains all the functionality of P, yet  It is hard to “reverse engineer” O(P). In other words:  O(P) is same as P in input/output behavior  Apart from this input/output behaviour, O(P) does not give any useful information about P Access to code of O(P) is equivalent to having black-box access to P.

Two Types of Programs Learnable! (can re-create source code just from few I/O queries) Program_1 (input X){ /* Ignore input */ Print (“Hello World!”); } Not learnable! (cannot re-create source code from few I/O queries) Program_2 (input X){ If (X == “1668801023012013”) Then Print (“You got me!”); Else Print (“Hello World!”); } Makes sense to obfuscate only unlearnable programs

Two Types of Program Analyzers Code analyzer (Ana) analyzes executable code  Static analysis: Blocks, variables  Dynamic analysis: Execution Trace, registers, flow  Efficiency analysis: Timing, statistics  Mutational analysis: change fragments of program  Others ? Black-box analyzer ( BAna ) analyzes I/O behavior of code only For obfuscated code, whatever we can do with Ana, we could also have done only with BAna

What is an obfuscator? (Formal) An obfuscator (O) is a PPT algorithm which takes as input an encoding P1 of a Turing machine and outputs the encoding of an equivalent Turing machine P2 : O(P1) = P2  Polynomial slowdown: There is a polynomial poly, s.t. time (P2) ≤ poly ( time (P1))  Virtual Black-box Condition: For any Ana, there exists a BAna, such that for any Turing machine P1 and its obfuscated version, P2 = O(P1); | Pr [Ana( P2 ) = 1] – Pr [BAna P2 (1 time( P2 ) ) = 1] | ≤ Negl(|P1|) So far so good, but... this definition is IMPOSSIBLE to meet!

Impossibility result... Originally, no formal study of obfuscators.  Based on “heurestic” models of security  More of an art than a science The following paper gave the first theoretical study of obfuscation (and a negative result.) On the (im)possibility of obfuscating programs by Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan and Ke Yang in CRYPTO 2001

Barak et al.'s 2001 result The paper claimed a very strong negative result.  The above definition is impossible to meet  However, it has created some confusion  This tutorial will try to elaborate on the result of Barak et al's paper of CRYPTO 2001  Finally, we will discuss developments from 2001- 2007

Impossibility Result (warm up) Essentially a proof by contradiction...  Halting problem proof  Gödel's proof of (first) incompleteness theorem Gödel's incompleteness theorem  Q: Does there exist a mathematical theory that is: Complete: Every TRUE statement can be proved Consistent: No statement can be proved TRUE and FALSE  A: No non-trivial theory can satisfy both!!  Gödel constructed a sentence G such that: G = “The statement G cannot be proved TRUE” Is G True? If yes, and T is complete, then T is inconsistent! On the other hand, if T is consistent, then it cannot be complete!

Impossibility Result – Secret Leaking Functions Need something like a “Gödel sentence”. That something is “Barak's program” Desired property of Barak's program (P1):  From P2=O(P1), Ana can do something that BAna cannot do using only I/O access to P2 (or P1) Barak constructs a function family { f s } such that  Each f s contains a “secret” s  No BAna using f s as an oracle can obtain the secret  Any program that executes f s will “leak” that secret !

How to Leak Your Secret? codeBlack-box black-box AnaBAna The secret leaking function cannot leak the secret to BAna but any executable/source code will leak the secret to Ana !

Mission Impossible? codeBlack-box black-box AnaBAna Simple Approaches don't work! Encode the secret inside the code, maybe as a comment.. Doesn't work with every code, obfuscator may remove the secret. The function outputs the secret if you give the correct input.. How to ensure that Ana knows correct input but BAna doesn't ?? but any executable/source code will leak the secret to Ana !

Correct Input ? We need the correct input to be....  Obtainable from any source/executable code  But not obtainable via black-box access

What's the Correct Input? How about making the program's code itself the correct input ? (Barak et al.'s idea)

Cannibalistic Function (Intuition) “Feed me somebody that behaves like me, and I'll leak my secret!” CANNIBAL (Prog){ If (Prog behaves like me) Then Output (secret); Else Output (“Try Again!”); } Without the code, BAna cannot produce a program Prog that behaves like CANNIBAL But Ana can, since she has the code for CANNIBAL, which behaves exactly like CANNIBAL !

Formal Construction CANNIBAL consists of 2 parts: ID and Leaker : ID a, b ( x ) = Leaker a, b,s ( P ) = ID has the correct “behavior” of CANNIBAL Leaker will output the secret s only when the input program has the correct behavior. That is, Leaker a, b,s ( ID a, b ) = s

Putting 2 functions together We combine 2 functions into one single function CANNIBAL a, b,s ( y, bit ) =

How can Ana obtain the secret Ana constructs a new program Ana_CANNIBAL Ana_CANNIBAL ( Z ) { /* Hard-code 0 */ Output CANNIBAL a, b,s ( Z, 0 ); } and feeds it to CANNIBAL as follows: CANNIBAL a, b,s ( Ana_CANNIBAL, 1 )

Lets see what happens... CANNIBAL a, b,s ( Ana_CANNIBAL, 1 ) Leaker a, b,s ( Ana_CANNIBAL ) Ana_CANNIBAL ( a ) CANNIBAL a, b,s ( a, 0 ) (hard-coded 0 ) ID a, b ( a ) = b (by definition) Thus, Ana can always get s from CANNIBAL ! b b b s

BAna cannot learn much from CANNIBAL (why?) ID a, b ( x ) = Leaker a, b,s ( P ) = BAna must guess at least one of a, b, or s. If a, b, s are chosen randomly, the probability to find them is exponentially small !

Putting Everything Together... CANNIBAL a, b,s ( y, bit ) = There exists an efficient Ana that always learns s No PPT BAna can learn s with high probability { CANNIBAL a, b,s }: secret-leaking function family! No obfuscators exist for CANNIBAL.

Impossibility Results for Circuits We just proved this for Turing Machines Result also holds for circuits, but proof is trickier:  A circuit cannot eat itself  We need to chop it into pieces and feed them to the circuit piece by piece  But the main idea is the same Using a symmetric key homomorphic encryption scheme. Homomorphic operations done using oracles.

Circuits (main idea) Given the following:  The circuit for ID (this is the same ID as before)  Encryption of each bit of a  Access to a homomorphic encryption oracle We can evaluate (using the gates of ID ) to obtain encrypted output (on input a), without knowing a ! A second oracle tells us the secret s if some given value decrypts to b, otherwise it outputs 0 Ana can use the above oracles to obtain s using the circuit for ID BAna cannot do it using black-box access to ID. This is just intuition See paper for full proof!

What did we just prove? It is impossible to design a general-purpose obfuscator for all functions. How about general purpose obfuscators for some special classes of cryptographic functions  eg. private key encryption scheme. Unfortunately this cannot be done too !

Secret Leaking Private Key Systems “Feed me somebody that behaves like me, and I'll leak my secret!” CANNIBAL_ENCRYPTOR (X){ If (X behaves like me) Then Output (secret_key); Else Output ENCRYPT(X); } CANNIBAL_ENCRYPTOR is a secure private key system if used as a black-box. Any “executable” implementation of CANNIBAL_ENCRYPTOR is insecure !

More Impossibility Results There don't exist general obfuscators for:  Encryption schemes  Digital signature schemes  Pseudorandom functions  Message Authentication codes

Is all lost? (Maybe not!) Barak's result applies to “cannibalistic” functions and rules out general purpose obfuscators. what about special purpose obfuscators for some “ordinary” functions?  Mostly an open problem!  Many other negative results! Some functions can nevertheless be obfuscated  Point functions (password checking programs)  Using random oracles  Possibly some cryptographic schemes

What is a Random Oracle? A random oracle is simply an oracle (i.e., a black- box) with access to true randomness. It maintains a table of (INPUT, OUTPUT) pairs For any INPUT, it first checks if this INPUT exists in the first column of the table.  If it exists, it responds with corresponding OUTPUT  If it does not exist, it tosses some fixed number (say k) of coins to generate the OUTPUT value It adds (INPUT, OUTPUT) to its table. Responds with OUTPUT.

Point Functions Password is “ hello world! ” VERIFY_PASSWORD (X){ If (X==“hello world!”) Then print (“Accept”); Else print (“Reject”); } Let ROracle(“hello world!”) = “813841341” VERIFY_PASSWORD_OBF (X){ If (ROracle(X)==“813841341”) Then print (“Accept”); Else print (“Reject”); } Provably secure obfuscation in the random oracle model !

Other results (2001-2007) Some work on practical side (Collberg'97)  Sometimes rely on human intervention  Only heuristic proof of security Theoretical side: Alternate definitions  Using random oracles (Lynn'04)  Point functions (passwords)  Approximate obfuscators (Barak'01, Hofheinz'06)  Without “virtual black-box property: “Best possible obfuscation” Goldwasser 07  Obfuscating certain classes of functions

“Best possible” obfuscation? A notion by Goldwasser et al., (TCC'07) Quite formal definition despite the informal sound name.  Captures the fact that the obfuscator we have is the “best possible” that can exist  All other obfuscators are either Too inefficient Generate too long code Leak more information  Thus, we have the “best” that can exist within our constraints.

Re-Trust Project (www.re-trust.org) Remote En-trustment problem  Similar to “software protection”  In a client-server environment Trusted server, “bad” client Server needs to verify Runtime integrity of software running on client  Client can modify code in many ways (Online games, TCP/IP stack)  We must ensure that certain parts of the software cannot be modified.  “software only” -- no trusted hardware!

Topics for research Barak 2001 results do not apply to finite state automation. It may be interesting to see if these can be obfuscated Weaken the virtual black-box property and find alternate (and feasible) definitions Develop metrics for the difficulty of reverse engineering software Question: Is it possible to avoid special hardware and still achieve sufficient level of protection?

References Barak 2001 On the impossibility of obfuscating progesms (CRYPTO 2001) Ke Yang's presentation P.V. Oorschot 2003 Revisiting software protection (survey article) Goldwasser 2007 On best possible obfuscation (TCC'07) Lynn 2004 Positive results and techniques for obfuscation (Eurocrypt'04) Collberg 1997 A taxonomy of obfuscating transformations (technical report) Re-Trust site (www.re-trust.org)www.re-trust.org

Questions/Comments ? Thank you for your attention!

Software Protection and Code obfuscation Amitabh Saxena Department of Information & Communication Technology University of Trento,

Similar presentations

Presentation on theme: "Software Protection and Code obfuscation Amitabh Saxena Department of Information & Communication Technology University of Trento,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Software Protection and Code obfuscation Amitabh Saxena Department of Information & Communication Technology University of Trento,

Similar presentations

Presentation on theme: "Software Protection and Code obfuscation Amitabh Saxena Department of Information & Communication Technology University of Trento,"— Presentation transcript:

Similar presentations

About project

Feedback