Implementing Oblivious Hashing Using Overlapped Instruction Encodings ACM Multimedia and Security 07 Dallas, TX (USA) September 20-21, 2007 Mariusz H.

Implementing Oblivious Hashing Using Overlapped Instruction Encodings ACM Multimedia and Security 07 Dallas, TX (USA) September 20-21, 2007 Mariusz H. Jakubowski Ramarathnam Venkatesan Microsoft Research Matthias Jacob Nokia Research

ACM Multimedia and Security 07September 20-21, 20072 Introduction Field of work: Software protection –Obfuscation and tamper-resistance –Prevention (or delaying) of reverse engineering and hacking –Securing of content-rights systems (DRM) Background: Two specific protection techniques –Oblivious hashing (OH): Computing hashes (fingerprints) of execution traces –Overlapped code: Jumping into the middle of instructions to obfuscate and protect against disassembly Goals of our work: –Apply overlapped code towards obfuscation and tamper- resistance via OH. –Study new techniques in terms of formal models, avoiding ad hoc approaches.

ACM Multimedia and Security 07September 20-21, 20073 Overview Introduction Background –Software protection –Oblivious hashing (OH) –Overlapped code Code interleaving Conclusion Oblivious hashing via overlapped code

ACM Multimedia and Security 07September 20-21, 20074 Software Protection Obfuscation –Making programs hard to understand Tamper-resistance –Making programs hard to modify Obfuscation tamper-resistance Tamper-resistance obfuscation?

ACM Multimedia and Security 07September 20-21, 20075 Formal Obfuscation Impossible in general –Black-box model (Barak et al.): Source code doesnt help adversary who can examine input-output behavior. –Worst-case programs and poly-time attackers Possible in specific limited scenarios –Secret hiding by hashing (Lynn et al.) –Point functions (Wee, Kalai et al.) Results difficult to use in practice.

ACM Multimedia and Security 07September 20-21, 20076 Tamper-Resistance Many techniques used in practice – e.g.: –Code-integrity checksums (e.g., Atallah et al.s software guards) –Anti-debugging and anti-disassembly methods –Virtual machines and interpreters –Polymorphic and metamorphic code Never-ending battle on a very active field –Targets: DRM, CD/DVD protection, games, dongles, licensing, etc. –Defenses: Binary packers and cryptors, special compilers, transformation tools, programming strategies, etc. Current techniques tend to be ad hoc: –No provable security –No analysis of time required to crack protected instances

ACM Multimedia and Security 07September 20-21, 20077 Tamper-Resistance Model Program: A graph G Execution: A random walk on G Integrity checks: –Probabilistic monitoring of a set of Gs nodes –Detection of failures that lead to delayed responses Security analysis: Graph game on G between attacker and defender OH and overlapped code in context of model: –Provide a source of integrity checks. –Help enforce local indistinguishability and other engineering assumptions about implementation. Abstraction of software tamper-resistance (Dedić et al., IH 07)

ACM Multimedia and Security 07September 20-21, 20078 Oblivious Hashing Computation of hashes over program traces –Initialize hash values at specific points. –Update hashes upon assignments and branches. int x = 123; if (GetUserInput() > 10) { x = x + 1; } else { printf("Hello\n"); } INITIALIZE_HASH(hash1); int x = 123; UPDATE_HASH(hash1, x); if (GetUserInput() > 10) { UPDATE_HASH(hash1, BRANCH_ID_1); x = x + 1; UPDATE_HASH(hash1, x); } else { UPDATE_HASH(hash1, BRANCH_ID_2); printf("Hello\n"); } VERIFY_HASH(hash1); Original code Hash transform Hashed code

ACM Multimedia and Security 07September 20-21, 20079 Overlapped Code Code sharing among different paths –Semantic: Sharing of code blocks among execution paths. –Physical: Sharing of code bytes among machine or byte-code instructions. Purposes –Anti-disassembly and anti-decompilation –Obfuscation –Tamper-resistance from code sharing and explicit OH

ACM Multimedia and Security 07September 20-21, 200710 Semantic Overlap Code section is shared along different paths: increase_ctr(*ctr) { (*ctr)++; } increase_win() { increase_ctr(&win); return win; } increase_loss() { increase_ctr(&loss) return loss; } return win;return loss; Automated via code outlining

ACM Multimedia and Security 07September 20-21, 200711 Physical Overlap Offset 0: B8 B8 04 05 2D mov eax, 2D0504B8 05 90sub eax, 90 Offset 1: B8 04 05 2D 05mov eax, 52D0504 90nop Offset 2: 04 05add al, 5 2D 05 90sub eax, 9005 Execution and disassembly depend on entry point into code. Sample x86 code: B8 B8 04 05 2D 05 90 Note: Disassembly tends to resynchronize naturally – but we can prevent this. Offset 3: 05 2D 05 90add eax, 90052D Offset 4: 2D 05 90sub eax, 9005 Offset 5: 05 90sub eax, 90

ACM Multimedia and Security 07September 20-21, 200712 Disassembly Synchronization Often observed in practice, but previously not explained mathematically. Limits effectiveness of code overlapping for security. Requires explicit anti-synchronization measures to enforce protection. Rigorous explanation: Kruskal count 00411410 55 push ebp 00411411 8B EC mov ebp,esp 00411413 12 EC adc ch,ah 00411415 C0 00 00 rol byte ptr [eax],0 00411418 00 53 56 add byte ptr [ebx+56h],dl 0041141B 57 push edi 0041141C 8D BD 40 FF FF FF lea edi,[ebp-0C0h] 00411410 55 push ebp 00411411 8B EC mov ebp,esp 00411413 81 EC C0 00 00 00 sub esp,0C0h 00411419 53 push ebx 0041141A 56 push esi 0041141B 57 push edi 0041141C 8D BD 40 FF FF FF lea edi,[ebp-0C0h] Corrupted byte Synchronization point Example of corruption and synchronization:

ACM Multimedia and Security 07September 20-21, 200713 Disassembly Synchronization Disassembly: A leapfrog process over code bytes –Each byte address contains an instruction of a definite length. –After disassembling an instruction, a disassembler skips to the next instruction. Example: Sequence of instruction lengths at consecutive offsets: 3 4 6 2 6 3 4 5 3 3 5 4 2 7 3 1 4 Sequence of instruction lengths 3 4 6 2 6 3 4 5 3 3 5 4 2 7 3 1 4 3 2 3 3 4 1 4 4 3 3 4 1 4 6 3 4 1 4 2 3 3 4 1 4 6 5 1 4 Synchronization point Disassembly at offset: 0123401234 Kruskal count: Such disassembly synchronizes in about B 2 /16 steps, where B = average # of bytes per instruction.

ACM Multimedia and Security 07September 20-21, 200714 Disassembly Synchronization Let InstructionLength(address) = length of instruction found at address. Starting at slightly different addresses x and y, a disassembler iterates: x x + InstructionLength(x) (leapfrog x) y y + InstructionLength(y) (leapfrog y) Our goal: Compute N = approximate number of steps before any intermediate x is equal to any intermediate y. Treat all possible values of x-y as states of a Markov chain. N is the coupling time of this Markov chain. Kruskal count: N is about B 2 /16, where B is the average instruction length. Model of the disassembly process

ACM Multimedia and Security 07September 20-21, 200715 Code Interleaving A method to overlap arbitrary code blocks –Explicitly prevents disassembly resynchronization –Adds tamper-resistance Hash of instruction bytes only (like traditional code checksums) Hash of instruction bytes and program state (like oblivious hashing) Basic algorithm –Code interspersing: Create a block of interleaved instructions from two code blocks. –Code merging: Inject hashing instructions overlapped with existing instructions.

ACM Multimedia and Security 07September 20-21, 200716 Code Interleaving: Basic Idea SEQ1: INST_1 INST_2 SEQ2: INST_A INST_B Two input code blocks

ACM Multimedia and Security 07September 20-21, 200717 Code Interleaving: Basic Idea SEQ1: INST_1 INST_2 SEQ2: INST_A INST_B SEQ1: INST_1 JMP L2 SEQ2: INST_A JMP LB L2: INST_2 JMP L3 LB: INST_B L3: Two input code blocksAfter code interspersing Code interspersing: Interleave instructions, injecting jumps as needed to maintain control flow.

ACM Multimedia and Security 07September 20-21, 200718 Code Interleaving: Basic Idea SEQ1: INST_1 INST_2 SEQ2: INST_A INST_B SEQ1: INST_1 JMP L2 SEQ2: INST_A JMP LB L2: INST_2 JMP L3 LB: INST_B L3: SEQ1: INST_1 HASH_1 INST_2 HASH_2 SEQ2: INST_A HASH_A INST_B Two input code blocksAfter code interspersing After code merging Code interspersing: Interleave instructions, injecting jumps as needed to maintain control flow. Code merging: Replace jumps with hash instructions, maintaining control flow. o E.g.: JMP L2; INST_A; JMP_LB transforms into HASH_1 o HASH_1 contains INST_A and part of HASH_A. Suitable hash instructions must be found (and fit together like puzzle pieces). o Various possibilities identified on x86. o Can also design custom byte-codes to maximize utility of overlapping. Disassembly at SEQ2 Disassembly at SEQ1

ACM Multimedia and Security 07September 20-21, 200719 Code Interleaving: Example SEQ1: C1 E0 02 shl eax, 2 I11: 40 inc eax C3 ret SEQ2: 48 dec eax I21: C1 E8 03 shr eax, 3 C3 ret SEQ1: C1 E0 02 shl eax, 2 EB 03 jmp I11 SEQ2: 48 dec eax EB 04 jmp I21 I11: 90 nop 40 inc eax EB 03 jmp O I21: C1 E8 03 shr eax, 3 O: 90 nop C3 ret SEQ1: C1 E0 02 shl eax, 2 81 F1 48 81 E9 90 xor ecx, 90E98148 I11: 40 inc eax 81 C1 C1 E8 03 90 add ecx, 9003E8C1 O: C3 ret SEQ2: 48 dec eax 81 E9 90 40 81 C1 sub ecx, C1814090 I21: C1 E8 03 shr eax, 3 O: 90 nop C3 ret Two input code blocks (x86) After code interspersing After code merging (OH instructions in red) Disassembly at SEQ2Disassembly at SEQ1

ACM Multimedia and Security 07September 20-21, 200720 Code Interleaving Observations –Tamper-resistance comes from two main sources: Implicit: Shared instruction bytes Explicit: OH instructions –Disassembly synchronization is explicitly prevented. –Method enables code-byte hashes even on architectures that do not allow explicit access to code bytes. Extensions –Iteration to build up complexity Enhances security at little or no implementation cost. Complex (emergent) code patterns and behaviors can arise. –Implementation over custom byte codes designed to maximize utility of overlapping (unlike x86)

ACM Multimedia and Security 07September 20-21, 200721 Experimental Results Tool implementation using Vulcan (binary-rewriting framework) Reasonable impact on performance, depending on desired security level Remaining work on analyzing security in practice Performance impact on SpecINT benchmarks: 0 = no overlapping, 1 = full overlapping

ACM Multimedia and Security 07September 20-21, 200722 Conclusion Contributions –Investigation of overlapped code for software protection Study of disassembly synchronization and other roadblocks Design of code interleaving and outlining to address limitations Integrity checking via oblivious hashing Placement in context of security models, not ad hoc methods –Tool implementations to verify practical effectiveness Code interleaving and outlining for x86 binaries Iteration framework to enhance security Future work –Security analysis in theory and practice –Other overlapped-code methods –Porting to custom byte-codes

Implementing Oblivious Hashing Using Overlapped Instruction Encodings ACM Multimedia and Security 07 Dallas, TX (USA) September 20-21, 2007 Mariusz H.

Similar presentations

Presentation on theme: "Implementing Oblivious Hashing Using Overlapped Instruction Encodings ACM Multimedia and Security 07 Dallas, TX (USA) September 20-21, 2007 Mariusz H."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Implementing Oblivious Hashing Using Overlapped Instruction Encodings ACM Multimedia and Security 07 Dallas, TX (USA) September 20-21, 2007 Mariusz H.

Similar presentations

Presentation on theme: "Implementing Oblivious Hashing Using Overlapped Instruction Encodings ACM Multimedia and Security 07 Dallas, TX (USA) September 20-21, 2007 Mariusz H."— Presentation transcript:

Similar presentations

About project

Feedback