Methods for Software Protection

Methods for Software Protection
Clark Thomborson University of Auckland 18 November 2003 28/11/2018 UW-Tacoma

Questions to be (Partially) Answered
What is security? How does software obfuscation compare with encryption? Is “perfect obfuscation” possible? How can software be watermarked? 28/11/2018 UW-Tacoma

What is Security? (A Taxonomic Overview)
The first step in wisdom is to know the things themselves; this notion consists in having a true idea of the objects; objects are distinguished and known by classifying them methodically and giving them appropriate names. Therefore, classification and name-giving will be the foundation of our science. Carolus Linnæus, Systema Naturæ, 1735 (from Lindqvist and Jonsson, “How to Systematically Classify Computer Security Intrusions”, 1997.) 28/11/2018 UW-Tacoma

Four Goals of Security Prohibition: (try to) prevent something from happening. Permission: (try to) allow something to happen. Assertion: (try to) serve notice on an end-user, or on some other “security principal”. Affirmation: (try to) assure the end-user, or other principal, of the authenticity of an object. Most security analyses are focussed on prohibition & permission: which principals are allowed what actions on which objects? Prohibition is my focus – a controversial subject! (Consider alcohol, drugs, and the DMCA.) 28/11/2018 UW-Tacoma

Standard Taxonomy of Security
Confidentiality: all reads must be authorised. Integrity: all writes must be authorised. Availability: authorised reads and writes must be allowed. Prohibition = (Confidentiality & Integrity) Permission = Availability Assertion & Affirmation = ?? (N.B. the standard taxonomy was developed for “information security”, not “software security”) 28/11/2018 UW-Tacoma

Prohibiting Attacks on Software (“Defense in Depth”)
Prevention: Runtime system security (e.g. “execute-only” access to encrypted code; authentication of code by the system and vice versa; tamperproof execution environment) Make the code difficult to understand (Obfuscation) Detection: Monitor principals (User logs) Monitor activities (Execution logs, intrusion detectors) Monitor objects (Watermarking) Response: “Ask for help”: Send a report to an enforcement agent. “Self-help”: Software may modify or repair or destroy itself , or the system on which it runs. 28/11/2018 UW-Tacoma

Security Boundaries We divide the world into trusted and untrusted portions. The dividing line is the security boundary. Secrets and other valued items start in the trusted portion. Attackers start in the untrusted portion. They try to find or make a “security hole”. 28/11/2018 UW-Tacoma

What Secrets are in Software?
Algorithms (so competitors or attackers can’t build similar functionality without redesigning from scratch). Constants, such as an encryption key (typically hidden in code that computes obscure functions of this constant). Internal function points, such as a license-control predicate “if (not licensed) exit()”. External interfaces (to deny access by attackers and competitors to an intentional “service entrance” or an unintentional “backdoor”). 28/11/2018 UW-Tacoma

Security Boundary for Obfuscation
Source code P Executable X’ Same behaviour as X Released to attackers who want to know secrets: source code P, algorithm, unobfuscated X, function points, … Algorithm Compiler Function Points Executable X Secret Keys Obfuscator Secret Interface 28/11/2018 UW-Tacoma

Security Boundary for Encryption
Source code P Executable X Compiler Algorithm Encrypter Secret Keys Encrypted file E(X) Function Points Decrypter Secret Interface Buffer (RAM) CPU Attacker’s GUI and I/O Attacker’s computer 28/11/2018

(Dis)advantages of Encryption over Obfuscation
Strong encryption E() can be used. The largest security holes will be in the attacker’s computer and (perhaps) in the key-distribution scheme. Branches into an undecrypted block will stall the CPU until the target is decrypted. This runtime penalty is proportional to block size. Strong encryption  large blocks  large runtime penalty. The RAM buffer and the decrypter must be large and fast, to minimize the number of undecrypted blocks. “Large and fast”  “expensive or insecure”. 28/11/2018 UW-Tacoma

Partial Encryption Small portions of the executable can be protected with strong encryption, at reasonable cost. The remainder of the executable may be unprotected, or protected with cheap-but-insecure encryption. “Small portions” = some or all of the control transfers, plus a few of the variables (Maude & Maude, 1984; many similar articles and patents since 1984) The strongly-protected portions are executed in a secure hardware environment, e.g. a smart card. Extreme case: a dongle is a secure execution environment for just one predicate “if ( licensed(x) ) …” Performance penalties may be large, especially when more than one protected program is being executed. 28/11/2018 UW-Tacoma

How to Obfuscate? Lexical layer: obscure the names of variables, constants, opcodes, methods, classes, interfaces, etc. (Important for interpreted languages and named interfaces.) Data obfuscations: obscure the values of variables (e.g. by encoding several booleans in one int; encoding one int in several floats; encoding values in enumerable graphs) obscure data structures (e.g. transforming 2-d arrays into vectors, and vice versa) Control obfuscations (to be explained later) 28/11/2018 UW-Tacoma

Attacks on Data Obfuscation
An attacker may be able to discover the decoding function, by observing program behaviour immediately prior to output: print( decode( x ) ), where x is an obfuscated variable. An attacker may be able to discover the encoding function, by observing program behaviour immediately after input. A sufficiently clever human will eventually de-obfuscate any code. Our goal is to frustrate an attacker who wants to automate the de-obfuscation process. More complex obfuscations are more difficult to de-obfuscate, but they tend to degrade program efficiency and may enable pattern-matching attacks. 28/11/2018 UW-Tacoma

Cryptographic Data Obfuscation?
Cloakware have patented an algebraic obfuscation on data, but it does not have a cryptographic secret key. An ideal data obfuscator would have a cryptographic key that selects one of 264 encoding functions. Fundamental weakness: The encoding and decoding functions must be included in the obfuscated software. Otherwise the obfuscated variables cannot be read and written. How can we be confident an attacker cannot (by an automated analysis of the obfuscated code) extract a working implementation of these keyed functions, for use in their automated de-obfuscator and de-obfuscating debugger? 28/11/2018 UW-Tacoma

Perfect Obfuscation? Function Ensemble F Property π: F → {0,1}
Polynomial Time Bound p() Obfuscated Program P’ communicates 1 bit π(f) of secret message Program P for f  F Obfuscator Secret Message No obfuscator can prevent this prisoner from sending messages to an accomplice. (Barak et al, 2001) 28/11/2018 UW-Tacoma

Techniques for Practical Data Obfuscation
Interleave inlined versions of the encode/decode functions, with other segments of obfuscated code, so that the function boundaries are not easily discovered by an automated analysis. The attacker could pattern-match, so we should use many different versions of the encoder and decoder in a single obfuscated code. The encoder and decoder should be “stealthy” (indistinguisable from the other code in the obfuscated application). No uncommon operators such as XOR, integer division. No uncommon sequences of operators or operands. 28/11/2018 UW-Tacoma

Graph-Theoretic Data Obfuscation
Observation: any family of graphs that is enumerated by a recurrence relation (or a generating function) has encoding and decoding functions that run in time polynomial in the size of the graph. We can easily devise efficient (copy, set to zero, increment) algorithms for any variable that is encoded as an enumerable graph. We want all basic arithmetic operations on obfuscated n bit numbers to run in O(n) time. (Open problem!) “Pure” graph algorithms that use node and edge operations, as well as tests on pointer equality, but no arithmetic operations, might be especially difficult to de-obfuscate automatically. 28/11/2018 UW-Tacoma

Control Obfuscations Inline procedures Outline procedures
Obscure method inheritances (e.g. refactor classes) Opaque predicates: Dead code (which may trigger a tamper-response mechanism if it is executed!) Variant (duplicate) code Obscure control flow (“flattened” or irreducible) 28/11/2018 UW-Tacoma

Opaque Predicates A B pT A B P? B’ A B PT Bbug “always true”
F “always true” A B P? T F “indeterminate” B’ A B PT T F “tamperproof” Bbug {A; B }  Note: “always false” is not shown on this slide. 28/11/2018 UW-Tacoma

Opaque Predicates on Graphs Dynamic analysis is required to deobfuscate – this is very difficult to automate! g.Merge(f) f g f g f.Insert(); g.Move(); g.Delete() if (f = = g) then …

History of Software Obfuscation
“Hand-crafted” obfuscations: IOCCC (Int’l Obfuscated C Code Contest, ); a few earlier examples. Automated lexical obfuscations since 1996: Crema, HoseMocha, … Automated control obfuscations since 1996: Monden. Opaque predicates since 1997: Collberg, Thomborson, Low. Commercial activity since 1997: Cloakware, … It is still a small field, with just a handful of companies selling obfuscation products and services. There are only a few non-trivial published results, and a few patents. 28/11/2018 UW-Tacoma

Software Watermarking
Key taxonomic questions: Where is the watermark embedded? How is the watermark embedded? When is the watermark embedded? Why is the watermark embedded? What are its desired properties? 28/11/2018 UW-Tacoma

What is a Software Watermark?
A software watermarking system can be formally described by three functions: E(P; W; k)  Pw R(Pw ; k)  W A(P)  P’ The embedder E embeds the watermark W into the (cover) program P using the secret key k, yielding a watermarked program Pw . The recognizer R extracts W from Pw , also using k as the key. (Alternatively, W may be a parameter of a boolean-valued R.) The attack set A models the ways an attacker may disrupt recognition. The attacker’s usual goal is to find a  A with the property R(a(Pw), k) ≠ W although sometimes an attacker may launch a “protocol attack” by falsely claiming R’ such that R’(Pw, k) ≠ W 28/11/2018 UW-Tacoma

Where Software Watermarks are Embedded
Static code watermarks are stored in the section of the executable that contains instructions. Static data watermarks are stored in other sections of the executable. Static watermarks are detected without executing (or emulating) the code. Recognition is cheaper, but less secure. 28/11/2018 UW-Tacoma

Dynamic Watermarks Easter Eggs (visible dynamic behaviour watermarks) are revealed to any end-user who types a special input sequence. Invisible dynamic behaviour watermarks: Execution Trace Watermarks are carried in the instruction execution sequence of a program, when it is given a special input sequence (possibly null). Data Structure Watermarks are built by a program, when it is given a special input. Data Value Watermarks are produced by a program on a surreptitious channel, when it is given a special input. 28/11/2018 UW-Tacoma

Easter Eggs The watermark is visible -- if you know where to look!
Not resilient, once the secret is out. See 28/11/2018 UW-Tacoma

Dynamic Graph Watermarking
1 2 3 4 Represent an integer watermark W by the Wth graph in some easily-enumerated family of graphs, such as Planted Plane Cubic Trees (PPCTs). 5 6 7 8 9 After determining a data structure to represent the watermark graph, We must modify the program to build this graph at runtime. 28/11/2018 UW-Tacoma

[Slide design by Dejin Zhao, 2002]
Watermark Embedding 1. Build a Node Class for this Graph: choose a base class in the original program and convert it into a node class. class base{ int a; Node ( ) { a=0; } } class base{ int a; Node left, right; //additional fields holding an outgoing edge to another node public base ( ) { a=0; } addedge (base outgoing-edge, int branch) { //additional function if (branch = 0) this.left = outgoing-edge;//adds an outgoing edge else this.right = outgoing-edge; //to the Node. } [Slide design by Dejin Zhao, 2002]

[Slide design by Dejin Zhao, 2002]
Watermark Embedding 2. Build and Merge Graph: (a) add class A1 extending A, and merge code for building W into its constructor. class A1 extends A { A1( ) { a1=0; <code for building W> } int a1; class A{ A( ){ a1=0; } int a1; (b) Substitute “new A1( )” for “new A( )” This method is robust to a variety of program-transformation attacks, but 3. Problem: The extra code for building the graph might be located and removed by an attacker. One solution is to scatter the watermark-building code, so that the watermark is built while a secret input k is being processed by the program. [Slide design by Dejin Zhao, 2002]

Palsberg’s Tamperproof WM
Use Opaque Predicates! Replace statement “S” in watermark-building code with “if (x != y ) S”. Note that our Watermark Graph W is an excellent source for opaque predicates, where x, y refer to nodes of the graph. We can use W to obfuscate the original program P as shown below. Here both P1 and C are unobfuscated because W must be created and executed before “if (x != y) S” can be implemented. P1 C (building watermark graph W) P2 if (predicate from W) S Run- time Program P = P1 + C + P2 [Slide design by Dejin Zhao, 2002]

2. Obfuscate P and C: by introducing another graph Program P
C' for building another graph G P1 (before C is executed) if (predicates from G) S C for building watermark graph W P2 (after C ) if (predicates from W) S general-purpose obfuscator P’ 3. Tamperproofing: C’ cannot be altered safely, and it can be very difficult to distinguish W from G. [Slide design by Dejin Zhao, 2002]

History & Development of DDS WMs
Disclosed in our POPL’99 paper; protected by a WIPO filing. Implemented at (2000- ) Experimental findings by Palsberg et al. (2001): JavaWiz adds less than 10 kilobytes of code on average. Embedding a watermark takes less than 20 seconds. Watermarking increases a program’s execution time by less than 7%. Watermark retrieval takes about 1 minute per megabyte of heap. 28/11/2018 UW-Tacoma

SW Watermarking (Review of Taxonomic Questions)
Key taxonomic questions: Where is the watermark embedded? How is the watermark embedded? When is the watermark embedded? Why is the watermark embedded? What are its desired properties? 28/11/2018 UW-Tacoma

When is the WM Embedded? During a design step (“active watermarking”: Kahng et al., 2001). An attacker may have to reverse-engineer, then re-engineer, an expensive or difficult design step (software may carry a watermark in its optimisation constraints, e.g. register assignments, although these are not particularly expensive) After the design is complete (“passive watermarking”) This is generally simpler than active watermarking, but less expensive to attack. The “unwatermarked object” may be discovered by espionage, or released inadvertently in a procedural error. 28/11/2018 UW-Tacoma

Why Watermark Software?
Visible robust watermarks: useful for assertion (of copyright or authorship) Invisible robust watermarks: useful for prohibition (of unlicensed use) Visible fragile watermarks: useful for affirmation (of authenticity or validity) Invisible fragile watermarks: useful for permission (of licensed uses). 28/11/2018 UW-Tacoma

Assertion Marks An assertion of authorship, and its related copyright and moral rights, can be made in what we call an “Assertion Mark”. Visibility is desired, otherwise the end-user won’t be given notice. Robustness is desired, otherwise the authorship mark would not be present in a substantially similar work. 28/11/2018 UW-Tacoma

Prohibition Marks A publisher, distributor or other agent of the author might embed a “prohibition mark” on each copy they sell or license, to prevent unauthorised use (when suitable detection & response systems are in place). Prohibition Marks should be robust, ideally surviving even in modestly-sized excerpts. Prohibition Marks should be invisible, otherwise they will be easily removed by pirates, and they may annoy the end-user. Used in the “Content Protection System Architecture” proposal from 4C Entity. 28/11/2018 UW-Tacoma

Permission Marks Fragile marks, which are destroyed or suitably modified whenever a copy is made, allow us to design a system that permits licensed use. For example: an object with a “copy-2” mark can be transformed into two objects with “copy-0” marks. Permission Marks are most useful in conjunction with Prohibition Marks: the Prohibition Mark indicates what sort of Permission Mark is required. Permission Marks should be invisible, so that they may resist attacks by pirates. Permission & Prohibition Marks are present in the Content Protection for Recordable Media proposal from 4C Entity. 28/11/2018 UW-Tacoma

Affirmation Marks Visible, fragile marks can affirm, to the end-user, that the software has not been modified in any important way since manufacture. Affirmation Marks for software are typically designed to be fragile except for verbatim copying; a single-bit change will invalidate these marks. Cryptographic signature algorithms are used to implement Affirmation Marks in Java “sign & seal” and in Microsoft’s Authenticode. 28/11/2018 UW-Tacoma

A Fifth Function? Any watermark is useful for the transmission of information irrelevant to software security (espionage, humour, …). Transmission Marks may involve security for other systems, in which case they can be categorised as Permissions, Prohibitions, etc. 28/11/2018 UW-Tacoma

Our Functional Taxonomy for WMs
Goal: “… wisdom … by classifying [watermarks] methodically and giving them appropriate names.” 28/11/2018 UW-Tacoma

Error-Correcting Watermarks (WG’03)
We can add redundancy to the watermark, so that it is recognizable even after it has been attacked. Important categories of attacks on software WMs: Edge contractions in a bipartite graph Edge expansions (on all edges “simultaneously”) Edge relabelings (reorderings): note that there is a total order on outgoing arcs at each node, in control-flow graphs and in most data structure representations of a graph. Our PPCTs seem to rely on edge-order, however in linear time we can construct the (unique) planar embedding of a trivalent PPCT constructed by adding arcs to form an outercycle (on its leaves and root). 28/11/2018 UW-Tacoma

PPCT with an Outercycle
Each internal node has one incoming arc and two outgoing arcs. Each leaf node has two incoming arcs and one outgoing arc. 28/11/2018 UW-Tacoma

Dangerous/Difficult Attacks
Node additions and deletions require some care on the part of the attacker (for they may damage program correctness), however an attacker will be able to make a small number of these attacks. Theorem: PPCTs with an outercycle can detect and correct a single deletion or insertion error. Conjecture: PPCTs with an outercycle can detect and correct O(log n) random deletions or insertions, with high probability. 28/11/2018 UW-Tacoma

ECC Watermark Graphs that Resemble Flowgraphs
PPCTs are stealthy in codes that use tree-like data structures. All codes have flowgraphs (defined by their branching structure). Can we devise an error-correcting graph for use in static code watermarks? Note: static code graphic watermarks are embedded by adding new branchpoints, guarded by opaquely-false predicates; recognised by extracting the flowgraph, then looking for the watermark as a subgraph of this flowgraph. 28/11/2018 UW-Tacoma

An ECC Graph Resembling a Flowgraph
The Preamble and Body have a Hamiltonian that can be efficiently recovered if edge-ordering is damaged. Except for the Head and Foot, all nodes have indegree 2 and outdegree 1 – this is reasonably stealthy as a flowgraph. 28/11/2018 UW-Tacoma

Summary New taxonomy of security: prohibition, permission, assertion, affirmation Overview of software security techniques, focussing on graph-theoretic techniques in obfuscation and watermarking An open problem in data obfuscation A new result, and a wide-open field of research, in ECC watermark graphs 28/11/2018 UW-Tacoma

Methods for Software Protection

Similar presentations

Presentation on theme: "Methods for Software Protection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Methods for Software Protection

Similar presentations

Presentation on theme: "Methods for Software Protection"— Presentation transcript:

Similar presentations

About project

Feedback