Software Protection and Code obfuscation Amitabh Saxena Department of Information & Communication Technology University of Trento,

Slides:



Advertisements
Similar presentations
Quantum Software Copy-Protection Scott Aaronson (MIT) |
Advertisements

Security Seminar, Fall 2003 On the (Im)possibility of Obfuscating Programs Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil.
Quantum Money from Hidden Subspaces Scott Aaronson and Paul Christiano.
Many-to-one Trapdoor Functions and their Relations to Public-key Cryptosystems M. Bellare S. Halevi A. Saha S. Vadhan.
Lecture 19. Reduction: More Undecidable problems
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
SECURITY AND VERIFICATION Lecture 4: Cryptography proofs in context Tamara Rezk INDES TEAM, INRIA January 24 th, 2012.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
White-Box Cryptography
Department of Computer Science & Engineering
Digital Signatures and Hash Functions. Digital Signatures.
1 Introduction CSE 5351: Introduction to cryptography Reading assignment: Chapter 1 of Katz & Lindell.
Software Certification and Attestation Rajat Moona Director General, C-DAC.
Nathan Brunelle Department of Computer Science University of Virginia Theory of Computation CS3102 – Spring 2014 A tale.
Private Programs: Obfuscation, a survey Guy Rothblum Barak, Goldreich, Impagliazzo, Rudich, Sahai, Vadhan and Yang Lynn, Prabhakaran and Sahai Goldwasser.
1 Introduction to Computability Theory Lecture15: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Session 5 Hash functions and digital signatures. Contents Hash functions – Definition – Requirements – Construction – Security – Applications 2/44.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
CMSC 414 Computer and Network Security Lecture 6 Jonathan Katz.
Perfect and Statistical Secrecy, probabilistic algorithms, Definitions of Easy and Hard, 1-Way FN -- formal definition.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
CMSC 414 Computer (and Network) Security Lecture 2 Jonathan Katz.
Co-operative Private Equality Test(CPET) Ronghua Li and Chuan-Kun Wu (received June 21, 2005; revised and accepted July 4, 2005) International Journal.
A Designer’s Guide to KEMs Alex Dent
Foundations of Network and Computer Security J J ohn Black Lecture #3 Aug 28 th 2009 CSCI 6268/TLEN 5550, Fall 2009.
If a sparse, NP-Complete language exists => P = NP Let S be a sparse NP-Complete language Define C(n) = |S ≤n | and C a (n) = |S ≤p a (n) | Define p ℓ.
CMSC 414 Computer and Network Security Lecture 6 Jonathan Katz.
August 6, 2003 Security Systems for Distributed Models in Ptolemy II Rakesh Reddy Carnegie Mellon University Motivation.
Describing Syntax and Semantics
CS555Spring 2012/Topic 41 Cryptography CS 555 Topic 4: Computational Approach to Cryptography.
On Everlasting Security in the Hybrid Bounded Storage Model Danny Harnik Moni Naor.
Cramer-Shoup is Plaintext Aware in the Standard Model Alexander W. Dent Information Security Group Royal Holloway, University of London.
1 Joe Meehean. 2 Testing is the process of executing a program with the intent of finding errors. -Glenford Myers.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Cryptography and Network Security Chapter 11 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
Cryptanalysis. The Speaker  Chuck Easttom  
Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 9: Cryptography.
Programming Satan’s Computer
Abstract Provable data possession (PDP) is a probabilistic proof technique for cloud service providers (CSPs) to prove the clients' data integrity without.
Lecture 18 Page 1 CS 111 Online Design Principles for Secure Systems Economy Complete mediation Open design Separation of privileges Least privilege Least.
Cryptography, Authentication and Digital Signatures
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Password Mistyping in Two-Factor Authenticated Key Exchange Vladimir KolesnikovCharles Rackoff Bell LabsU. Toronto ICALP 2008.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
1 Information Security – Theory vs. Reality , Winter Lecture 10: Garbled circuits and obfuscation Eran Tromer Slides credit: Boaz.
14.1/21 Part 5: protection and security Protection mechanisms control access to a system by limiting the types of file access permitted to users. In addition,
Lecture 2: Introduction to Cryptography
CRYPTOGRAPHY. WHAT IS PUBLIC-KEY ENCRYPTION? Encryption is the key to information security The main idea- by using only public information, a sender can.
Semantics In Text: Chapter 3.
CRYPTOGRAPHY PRESENTED BY : NILAY JAYSWAL BRANCH : COMPUTER SCIENCE & ENGINEERING ENTRY NO. : 14BCS033 1.
CS1Q Computer Systems Lecture 6 Simon Gay. Lecture 6CS1Q Computer Systems - Simon Gay2 Algebraic Notation Writing AND, OR, NOT etc. is long-winded and.
Cryptography Against Physical Attacks Dana Dachman-Soled University of Maryland
Private key
CSE 311 Foundations of Computing I Lecture 28 Computability: Other Undecidable Problems Autumn 2011 CSE 3111.
TRUSTED FLOW: Why, How and Where??? Moti Yung Columbia University.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
SECURITY. Security Threats, Policies, and Mechanisms There are four types of security threats to consider 1. Interception 2 Interruption 3. Modification.
CompSci Today’s Topics Computer Science Noncomputability Upcoming Special Topic: Enabled by Computer -- Decoding the Human Genome Reading Great.
Computer Security By Rubel Biswas. Introduction History Terms & Definitions Symmetric and Asymmetric Attacks on Cryptosystems Outline.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
A plausible approach to computer-aided cryptographic proofs (a collection of thoughts) Shai Halevi – May 2005.
Cryptographic Hash Function. A hash function H accepts a variable-length block of data as input and produces a fixed-size hash value h = H(M). The principal.
Searchable Encryption in Cloud
Cryptographic Hash Function
CSCE 411 Design and Analysis of Algorithms
Theory of Computation Turing Machines.
Information Security CS 526
Investigating Provably Secure and Practical Software Protection
Information Security CS 526
Presentation transcript:

Software Protection and Code obfuscation Amitabh Saxena Department of Information & Communication Technology University of Trento, Italy

Layout of talk What is “software protection”? What is “code-obfuscation”? What should an obfuscator be like? Discussion of Barak et al's result (CRYPTO 2001) (what does it imply ?) Some related work ( ) Where do we go from here? Questions

Two problems related to software User's problem: Is this code good or bad  Viruses, Mal-ware, (bad programs)  Wants program code to be “easy” to understand and check for malicious behavior. Developer's problem: Is user good or bad ?  User may freely distribute program code (piracy)  Modify program code and use in unacceptable ways.  Also content distributor's problem (DRM) Software Protection techniques essentially aim to solve the latter problem

Developer's problem We will focus on “bad” user: We want to..  Stop piracy  Stop software modification  Generally ensure user cannot misuse software code Bad User will...  Try to circumvent registration (“cracking”)  Exploit weaknesses in code (eg. Buffer overflows)  Extract keys “hidden” within code (eg. DRM)  Essentially, try to “reverse-engineer” the program

Two different approaches Classic techniques: (hardware based)  Special Hardware based (eg. Dongles)  Earlier designs stored just data (eg. keys) (static)  Modern designs also have microprocessor, memory  Disadvantages: Expensive, intrusive.  “Trusted computing” has many issues. Modern ideas: (software only)  Make program code “hard” to understand, modify  Hide source code (closed-source!)  Online activation (eg. Windows XP)

Make software difficult to understand? Solves many problems:  Programmers can hide keys and other sensitive information and not worry about leakage.  DRM can be implemented nicely with software  An attacker cannot do much from the program code except “execute” the code on a CPU Creates some problems:  How do users trust that the programmers is “good”.?  Software may not always behave as promised Well, its better than “trusted hardware”...  Users generally trust developers. (reputation based)

Code Transformation Informally:  Make programs “hard to understand” and therefore “hard to reverse engineer”  But programmer must be able to understand!  Thus, we must transform code from a “readable” form to an “un-readable” (but still executable) form. In other words:  Transform readable program P into an unreadable program P', such P' retains all the functionality of P  It must be difficult to obtain any other “useful” information about the original program P from the transformed code P'

Code Transformation (obfuscation) What to retain? (I/O functionality)  A “good” user only looks at input/output of progeam Does not look “inside” the program However, bad users can do all that and more  Good user should not suffer.  Retain Input/output behavior as original program.  Thats all that matters to the good user What to hide? (other functionality)  A bad user looks “inside” the program,  Want to ensure that bad user cannot learn much more than a good user.  Hide program semantics (internal functionality) Algorithms, keys, etc.

What is obfuscation anyway? Obfuscate: To make so confused or opaque so as to be difficult to perceive or understand (American Heritage Dictionary) Classic example: Encryption (ciphertext is “useless” without the key!)

What about program obfuscation? Goal: Make programs hard to “reverse engineer” Why ?  Copy protection  Many other uses (have discussed these) Why not simply encrypt program?  Obfuscated program must be “Executable”!!  How about “Apart from the ability to execute, program is otherwise 'useless' ” ? Seems like a “reasonable” requirement.

What is an Obfuscator? An obfuscator is a probabilistic “compiler” O that takes as input a program P and outputs another program O(P) such that:  O(P) retains all the functionality of P, yet  It is hard to “reverse engineer” O(P). In other words:  O(P) is same as P in input/output behavior  Apart from this input/output behaviour, O(P) does not give any useful information about P Access to code of O(P) is equivalent to having black-box access to P.

Two Types of Programs Learnable! (can re-create source code just from few I/O queries) Program_1 (input X){ /* Ignore input */ Print (“Hello World!”); } Not learnable! (cannot re-create source code from few I/O queries) Program_2 (input X){ If (X == “ ”) Then Print (“You got me!”); Else Print (“Hello World!”); } Makes sense to obfuscate only unlearnable programs

Two Types of Program Analyzers Code analyzer (Ana) analyzes executable code  Static analysis: Blocks, variables  Dynamic analysis: Execution Trace, registers, flow  Efficiency analysis: Timing, statistics  Mutational analysis: change fragments of program  Others ? Black-box analyzer ( BAna ) analyzes I/O behavior of code only For obfuscated code, whatever we can do with Ana, we could also have done only with BAna

What is an obfuscator? (Formal) An obfuscator (O) is a PPT algorithm which takes as input an encoding P1 of a Turing machine and outputs the encoding of an equivalent Turing machine P2 : O(P1) = P2  Polynomial slowdown: There is a polynomial poly, s.t. time (P2) ≤ poly ( time (P1))  Virtual Black-box Condition: For any Ana, there exists a BAna, such that for any Turing machine P1 and its obfuscated version, P2 = O(P1); | Pr [Ana( P2 ) = 1] – Pr [BAna P2 (1 time( P2 ) ) = 1] | ≤ Negl(|P1|) So far so good, but... this definition is IMPOSSIBLE to meet!

Impossibility result... Originally, no formal study of obfuscators.  Based on “heurestic” models of security  More of an art than a science The following paper gave the first theoretical study of obfuscation (and a negative result.) On the (im)possibility of obfuscating programs by Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan and Ke Yang in CRYPTO 2001

Barak et al.'s 2001 result The paper claimed a very strong negative result.  The above definition is impossible to meet  However, it has created some confusion  This tutorial will try to elaborate on the result of Barak et al's paper of CRYPTO 2001  Finally, we will discuss developments from

Impossibility Result (warm up) Essentially a proof by contradiction...  Halting problem proof  Gödel's proof of (first) incompleteness theorem Gödel's incompleteness theorem  Q: Does there exist a mathematical theory that is: Complete: Every TRUE statement can be proved Consistent: No statement can be proved TRUE and FALSE  A: No non-trivial theory can satisfy both!!  Gödel constructed a sentence G such that: G = “The statement G cannot be proved TRUE” Is G True? If yes, and T is complete, then T is inconsistent! On the other hand, if T is consistent, then it cannot be complete!

Impossibility Result – Secret Leaking Functions Need something like a “Gödel sentence”. That something is “Barak's program” Desired property of Barak's program (P1):  From P2=O(P1), Ana can do something that BAna cannot do using only I/O access to P2 (or P1) Barak constructs a function family { f s } such that  Each f s contains a “secret” s  No BAna using f s as an oracle can obtain the secret  Any program that executes f s will “leak” that secret !

How to Leak Your Secret? codeBlack-box black-box AnaBAna The secret leaking function cannot leak the secret to BAna but any executable/source code will leak the secret to Ana !

Mission Impossible? codeBlack-box black-box AnaBAna Simple Approaches don't work! Encode the secret inside the code, maybe as a comment.. Doesn't work with every code, obfuscator may remove the secret. The function outputs the secret if you give the correct input.. How to ensure that Ana knows correct input but BAna doesn't ?? but any executable/source code will leak the secret to Ana !

Correct Input ? We need the correct input to be....  Obtainable from any source/executable code  But not obtainable via black-box access

What's the Correct Input? How about making the program's code itself the correct input ? (Barak et al.'s idea)

Cannibalistic Function (Intuition) “Feed me somebody that behaves like me, and I'll leak my secret!” CANNIBAL (Prog){ If (Prog behaves like me) Then Output (secret); Else Output (“Try Again!”); } Without the code, BAna cannot produce a program Prog that behaves like CANNIBAL But Ana can, since she has the code for CANNIBAL, which behaves exactly like CANNIBAL !

Formal Construction CANNIBAL consists of 2 parts: ID and Leaker : ID a, b ( x ) = Leaker a, b,s ( P ) = ID has the correct “behavior” of CANNIBAL Leaker will output the secret s only when the input program has the correct behavior. That is, Leaker a, b,s ( ID a, b ) = s

Putting 2 functions together We combine 2 functions into one single function CANNIBAL a, b,s ( y, bit ) =

How can Ana obtain the secret Ana constructs a new program Ana_CANNIBAL Ana_CANNIBAL ( Z ) { /* Hard-code 0 */ Output CANNIBAL a, b,s ( Z, 0 ); } and feeds it to CANNIBAL as follows: CANNIBAL a, b,s ( Ana_CANNIBAL, 1 )

Lets see what happens... CANNIBAL a, b,s ( Ana_CANNIBAL, 1 ) Leaker a, b,s ( Ana_CANNIBAL ) Ana_CANNIBAL ( a ) CANNIBAL a, b,s ( a, 0 ) (hard-coded 0 ) ID a, b ( a ) = b (by definition) Thus, Ana can always get s from CANNIBAL ! b b b s

BAna cannot learn much from CANNIBAL (why?) ID a, b ( x ) = Leaker a, b,s ( P ) = BAna must guess at least one of a, b, or s. If a, b, s are chosen randomly, the probability to find them is exponentially small !

Putting Everything Together... CANNIBAL a, b,s ( y, bit ) = There exists an efficient Ana that always learns s No PPT BAna can learn s with high probability { CANNIBAL a, b,s }: secret-leaking function family! No obfuscators exist for CANNIBAL.

Impossibility Results for Circuits We just proved this for Turing Machines Result also holds for circuits, but proof is trickier:  A circuit cannot eat itself  We need to chop it into pieces and feed them to the circuit piece by piece  But the main idea is the same Using a symmetric key homomorphic encryption scheme. Homomorphic operations done using oracles.

Circuits (main idea) Given the following:  The circuit for ID (this is the same ID as before)  Encryption of each bit of a  Access to a homomorphic encryption oracle We can evaluate (using the gates of ID ) to obtain encrypted output (on input a), without knowing a ! A second oracle tells us the secret s if some given value decrypts to b, otherwise it outputs 0 Ana can use the above oracles to obtain s using the circuit for ID BAna cannot do it using black-box access to ID. This is just intuition See paper for full proof!

What did we just prove? It is impossible to design a general-purpose obfuscator for all functions. How about general purpose obfuscators for some special classes of cryptographic functions  eg. private key encryption scheme. Unfortunately this cannot be done too !

Secret Leaking Private Key Systems “Feed me somebody that behaves like me, and I'll leak my secret!” CANNIBAL_ENCRYPTOR (X){ If (X behaves like me) Then Output (secret_key); Else Output ENCRYPT(X); } CANNIBAL_ENCRYPTOR is a secure private key system if used as a black-box. Any “executable” implementation of CANNIBAL_ENCRYPTOR is insecure !

More Impossibility Results There don't exist general obfuscators for:  Encryption schemes  Digital signature schemes  Pseudorandom functions  Message Authentication codes

Is all lost? (Maybe not!) Barak's result applies to “cannibalistic” functions and rules out general purpose obfuscators. what about special purpose obfuscators for some “ordinary” functions?  Mostly an open problem!  Many other negative results! Some functions can nevertheless be obfuscated  Point functions (password checking programs)  Using random oracles  Possibly some cryptographic schemes

What is a Random Oracle? A random oracle is simply an oracle (i.e., a black- box) with access to true randomness. It maintains a table of (INPUT, OUTPUT) pairs For any INPUT, it first checks if this INPUT exists in the first column of the table.  If it exists, it responds with corresponding OUTPUT  If it does not exist, it tosses some fixed number (say k) of coins to generate the OUTPUT value It adds (INPUT, OUTPUT) to its table. Responds with OUTPUT.

Point Functions Password is “ hello world! ” VERIFY_PASSWORD (X){ If (X==“hello world!”) Then print (“Accept”); Else print (“Reject”); } Let ROracle(“hello world!”) = “ ” VERIFY_PASSWORD_OBF (X){ If (ROracle(X)==“ ”) Then print (“Accept”); Else print (“Reject”); } Provably secure obfuscation in the random oracle model !

Other results ( ) Some work on practical side (Collberg'97)  Sometimes rely on human intervention  Only heuristic proof of security Theoretical side: Alternate definitions  Using random oracles (Lynn'04)  Point functions (passwords)  Approximate obfuscators (Barak'01, Hofheinz'06)  Without “virtual black-box property: “Best possible obfuscation” Goldwasser 07  Obfuscating certain classes of functions

“Best possible” obfuscation? A notion by Goldwasser et al., (TCC'07) Quite formal definition despite the informal sound name.  Captures the fact that the obfuscator we have is the “best possible” that can exist  All other obfuscators are either Too inefficient Generate too long code Leak more information  Thus, we have the “best” that can exist within our constraints.

Re-Trust Project ( Remote En-trustment problem  Similar to “software protection”  In a client-server environment Trusted server, “bad” client Server needs to verify Runtime integrity of software running on client  Client can modify code in many ways (Online games, TCP/IP stack)  We must ensure that certain parts of the software cannot be modified.  “software only” -- no trusted hardware!

Topics for research Barak 2001 results do not apply to finite state automation. It may be interesting to see if these can be obfuscated Weaken the virtual black-box property and find alternate (and feasible) definitions Develop metrics for the difficulty of reverse engineering software Question: Is it possible to avoid special hardware and still achieve sufficient level of protection?

References Barak 2001 On the impossibility of obfuscating progesms (CRYPTO 2001) Ke Yang's presentation P.V. Oorschot 2003 Revisiting software protection (survey article) Goldwasser 2007 On best possible obfuscation (TCC'07) Lynn 2004 Positive results and techniques for obfuscation (Eurocrypt'04) Collberg 1997 A taxonomy of obfuscating transformations (technical report) Re-Trust site (

Questions/Comments ? Thank you for your attention!