Provably Secure Program Protection

Provably Secure Program Protection
Lt Col Todd McDonald AFIT/ENG x4639

Research Interests Program Encryption
Program protection / secure coding Obfuscation / tamperproofing Mobile agent security / mobile code Information / database security Multi-agent architectures Trust-based computing

Three Focus Areas Semantic Transformation
Random Program Security Model / Randomizing Obfuscators Perfectly Secure White Box Obfuscators

Program Scenario

Program Protection Adversarial Observation: Black Box Analysis
If the adversary cannot determine the function/intent of the device by input/ output analysis, we say it is black-box protected Adversarial Observation: Black Box Analysis White Box Analysis If the adversary cannot determine the function/intent of the device by analyzing the structure of the code, we say it is white-box protected Intent Protected: Combined black-box and white-box protection does not reveal the function/intent of the program

Definitions “The goal of program obfuscation is to make a program unintelligible while preserving its functionality” Virtual black box (VBB): anything one can compute from the obfuscated program could also be computed from input-output behavior of the original program

Formally… (yuck) An obfuscator is an efficient compiler O that takes input program P and produces semantically equivalent program P’: P’ = O(P) functionality, x, P(x) = P’(x), where P’=O(P) polynomial slowdown, which says O(P) is at most polynomially slower than P (for circuits the requirement is that the size of O(P) is at most polynomially greater than P) virtual black box (VBB) property: The generalized VBB property mathematically states that you should not be able to learn more from the obfuscated version of a program (O(M)) than from a simulator (S<M>) for the original program with oracle access. It is formulated as follows:

Totally Unobfuscatable Functions under VBB
1) Given any program that computes f  F the value (f) can be efficiently computed 2) Given oracle access to a (randomly selected) function f  F no efficient algorithm can compute (f) much better than random guessing Property : F  {0,1} Family of functions F This family is constructed from any one-way function This family of functions is UNOBFUSCATABLE if (1) and (2) are true

VBB Proof Methodology Because a family of (contrived) functions can be shown to be unobfuscatable Therefore, general, efficiently, secure obfuscators do not exist (a) (b) (c)

Where are we at? Known methods of obfuscation are reverse of good software engineering None guarantee impossibility of retrieving sensitive information or algorithms A determined specialist given enough time and resources is able to deobfuscate any obfuscated program In spite of VBB: does not imply there is no method for making programs “unintelligible” in some meaningful and precise way

How to Define Security Explicitly Implicitly
Define adversary task and require that it is computationally difficult Disadvantage: lot of threats/some are difficult to formulate in terms of computational problems Implicitly Define ideal security model and require our case is nearly as good as ideal one Disadvantage: Barak et al. result shows this is impossible based on VBB

Going against the Gold Standard
“Anything that can be efficiently computed from O(P) can be efficiently computed given oracle access to P” Who died and made them the boss (Barek et al.)? The intent: Can you eliminate any advantage to seeing the obfuscated source code beyond getting black box access to the original program? Our contention and intuition: You will always have an advantage for learning something about the original program if given (obfuscated) source code above what you could learn if given just black box access

Properties of Random Program Obfuscators
Black Box Protection Y and A are semantically different A has input/output consistent with the function of the program Y has input/output consistent with a family of one-way function circuits Y = ORAND(A,K)

Properties of Random Program Obfuscators
Black Box Protection

Semantic Encryption Transformation

Program/Circuit P Py1 Px1 P Pxn Pyn

Strongly Pseudorandom Data Ciphers
K 232 ~256 Truth Tables

Semantically Secure Black Box Protection

Semantically Secure Black Box Protection
P’ = O(P)

Things to Be Done: P + E Living under Kerckhoff's Principle
Program encryption generation engine Unique encryption ciphers / key-based Security characterizations Number of E’s Input sizes Practical implementation issues

White Box Protection ?? Circuit P’

IEEE International Symposium on Circuits and Systems (ISCAS) Format
# Comment: Inputs INPUT(WIRE) # Comment: Outputs OUTPUT(WIRE) # Comment: Gate Specifications GATE = FUNCTION (OPERAND, OPERAND) OPERAND = {GATE} U {WIRE} FUNCTION = {AND, OR, NOT, XOR, NXOR, NAND, NOR}

BENCH and BED Formats C17.gif C17.bench
Binary Expression Diagram (BED)

BENCH Workflow Graphics Format c1000.bench ISCAS Format C1000.gif
WINDOWS bench graphviz DOT Format Executable BED C Program C1000.dot C1000 C1000.c c1000 gcc LINUX

We Need a Better Security Model… and Provably Security Under that Model

Obfuscation under RPM ? CAL’ CL Y = ORAND(A,K) Circuit Y Circuit X
Circuit A

Random Programs/Circuits

Random Programs

White Box Understandable base on Random Program Oracles

? ATLANTIC Gulf of OCEAN Mexico 0.69%
486,800,000,000,000,000,000 teaspoons 70,940,000,000,000,000,000,000 teaspoons

Correlating Program and Data Encryption
Randomizing Obfuscators

Generating a Circuit Library
1) # of INPUTS 2) # of OUTPUTS 3) CIRCUIT SIZE + AND, OR, NAND, NOR, XOR, NXOR All Possible Combinations

Correlating Program and Data Encryption
CIRCUIT REPRESENTATION HLL or ASM PROGRAM HLL or ASM PROGRAM SUB-CIRCUIT SELECTION SUB-CIRCUIT REPLACEMENT Linear cryptanalysis was first openly published as a means for attacking DES by Mitsuru Matsui in EUROCRYPT’93.6 His method attempts to find a linear relation among the plaintext, ciphertext, and keys as they pass through the s-boxes. With enough known plaintext/ciphertext pairs as data, a relation with a high enough probability can be used to find the key. Matsui generated linear approximation tables for the 8 DES s-boxes and found the strongest linearity in S5 (the fifth s-box). The tables were created by analyzing all the combinations of the input and output bits of the s-boxes. Since there are 6 input bits and 4 output bits, there are 1024 (= 26 · 24) entries in his tables for every s-box. A linear approximation is stronger if it is significantly greater or less Eli Biham took this one step further to help define restrictions on s-boxes to make them more resistant to linear cryptanalysis.8 He found that increasing the number of output bits of an s-box can endanger the s-box significantly to linear cryptanalysis. More precisely, he found that in an m·n s-box, where m is the number of input bits and n is the number of output bits, if n • 2m-m, the s-box must have a linear property of the input and output bits. With the primary modes of attack on DES-like algorithms defined, rules can be established for how s-boxes are designed and used. Researchers can also examine how other’s s-box designs match up against those cryptanalysis techniques. There are several ways of making better s-boxes than the ones specified in DES, however, Schneider states that “… blindingly choosing new sboxes isn’t a good idea.”15 Among the common and well-documented features of s-boxes that are considered viable are those that permit the algorithm to follow the Strict Avalanche Criteria (SAC). The avalanche effect was first published in the cryptography world by Horst Feistel.16 In that study, it was determined that when an input bit goes through the system, an equal number of 1’s and 0’s on average are the resultant output. This was taken one step further by Webster and Tavares17, requiring exactly half of the output bits to change when one input bit changes. Another consideration is the size of the s-box. From the above discussions on cryptanalysis, a large box would be better than a small one. A large number of output bits are needed to protect against differential attacks; however, a corresponding large number of input bits are also needed to protect against linear cryptanalysis. Obviously, a balance of the two is needed. Finally, there are three requirements regarding the values in the s-box. First, the distributions of outputs must be checked for uniformity to protect against the Davies’ Attack. Second, the outputs must have no linearity in their function to the input. Third, there must be unique values in every row of the s-box. There are several other requirements; however they are beyond the scope of this paper. S-Box Selection Iterative Rounds

Perfect White Box Protection
main (int argc, char *argv) { int x,y; /* Get input from the user */ x = argv[1]; /* Super secret algorithm */ …….. /* Output the result */ cout << y; }

What is the best we can hope for to protect the “structure” of the code that performs the secret algorithm? We want the program to act just like an oracle would We want the program to be a “black-box” implementation

Perfect White Box Protection = Black Box Implementation
main (int argc, char *argv) { int x,y; /* Get input from the user */ x = argv[1]; /* Super secret algorithm */ if (x == 1) y = ; else if (x == 2) y = 23; else if (x == 3) y = ; …. /* Output the result */ cout << y; }

Problems with this approach: You have to know all inputs/outputs Therefore, the algorithm could never be efficient for all size input n Therefore, the algorithm could never be general for all programs Which lends support to what Barak was saying…

But: Mobile code programs are targeted for small programs Input size might be limited You may not care about the full range of possible inputs, only some…

Regardless of efficiency: We can define a methodology for perfect white box protection We could apply that method for programs of small input size n (which is defined only by the amount of time or resources you want to apply to get the result) Those programs would be perfectly white box protected

Circuits Structural view of P: Consider circuit P 3 representations:
Algebraically (Boolean function) Structurally (circuit diagram) Truth table (input/output behavior) Structural view of P: INPUT(3) INPUT(2) INPUT(1) OUTPUT(7) OUTPUT(6) 4 = AND(3,2) 5 = OR(4,1) 6 = XOR(4,3) 7 = NAND(5,6)

Circuits Behavioral view of P:

Circuits Functional view of P: fP Derive it from structure
y6 = (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’ y7 = ((x3x2 + x1) (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’)’ Derive it from truth table y6 = x1’x2’x3 + x1x2’x3 y7 = x1’x2’x3’ + x1’x2’x3 + x1’x2x3’ + x1’x2x3 + x1x2’x3’ + x1x2x3’ + x1x2x3

Circuits There are many different but equivalent realizations of the same function (including minterm and maxterm realizations) There is no “right” realization given any function If a Boolean expression is written in a certain form, it will always be obvious, given two expressions, whether we are dealing with the same function or different functions

Circuits Such forms are termed “canonical forms”
Canonical forms are “official” forms for writing the algrabraic expression of a given type (such as Boolean algebraic expressions)

Circuits There is one and only one canonical realization for each function It is (should be) impossible to have different canonical realizations of the same function, only with exceptions based on commutativity:: abc’ + b’c  cb’ + c’ba There is only 1 minterm realization of any function

Circuits Take these 2 functions for example: b’c’ + bc + a’b
b’c’ + bc + a’c’ These two functions are equivalent, yet neither can be simplified any further

Circuits Blake canonical form (BCF)
produced by taking a Boolean function in SOP form perform a sequence of simplification steps result is a form that produces a unique and compact representation of the original circuit b’c’ + bc + a’b b’c’ + bc + a’c’ The BCF form of the 2 above equivalent circuits is given by: b’c’ + bc + a’b + a’c’

Circuits I reduced it by hand: Functional view of P: fP
Derive it from structure y6 = (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’ y7 = ((x3x2 + x1) (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’)’ I reduced it by hand: y6 = x3x2’ y7 = x1’ + x3’ + x2

Circuits No really I did:

Circuits This is in complete-SOP form already
Functional view of P: fP Derive it from truth table y6 = x1’x2’x3 + x1x2’x3 y7 = x1’x2’x3’ + x1’x2’x3 + x1’x2x3’ + x1’x2x3 + x1x2’x3’ + x1x2x3’ + x1x2x3 This is in complete-SOP form already I applied Blake’s method to get: y6 = x2’x3 y7 = x1’ + x3’ + x2

So what does canonical minimization do?
All you need is the truth table or behavioral view to get an SOP form

So what does canonical minimization do for us?
This is what an oracle for P would “use” when asked questions about P … Any circuit that implements this truth table would then be a “black box implementation” of P

The “Logic” of Canonical P
if (x1 == 0) && (x2 ==0) & (x3==0) y6 = 1 y7 = 0 else if ((x1==0) && (x2==0) && (x3==1) y7 = 1 …

Can I ever recover the structure of the original P from canonical P?

Can I ever recover the structure of the original P from canonical P?
y6 = x3x2’ y7 = x1’ + x3’ + x2 BOTH are forward derivations y6 = x1’x2’x3 + x1x2’x3 y7 = x1’x2’x3’ + x1’x2’x3 + x1’x2x3’ + x1’x2x3 + x1x2’x3’ + x1x2x3’ + x1x2x3 y6 = (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’ y7 = ((x3x2 + x1) (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’)’ This would revel the gate structure

Perfect White Box Obfuscators
Algorithm O Truth Table TP Algorithm A: Complete Sum-of-Products Algorithm B: Minimal Sum-of-Products Circuit P’

For Designing Catenation-Based Obfuscators : P’ = P + E
HIGH LOW Efficiency HIGH LOW Security Perfect White Box P + E Randomization Circuit Splicing Subcircuit-Canonical Minimization

End-to-End Program Protection Architecture

Questions ???

Provably Secure Program Protection

Similar presentations

Presentation on theme: "Provably Secure Program Protection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Provably Secure Program Protection

Similar presentations

Presentation on theme: "Provably Secure Program Protection"— Presentation transcript:

Similar presentations

About project

Feedback