Presentation is loading. Please wait.

Presentation is loading. Please wait.

Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.

Similar presentations


Presentation on theme: "Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015."— Presentation transcript:

1 Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015

2 IA-32 Primer 2 Registers 1000 ESP Stack push eax 20 EAX 20 mov ebx, eax 20 EBX mov [esp], 60 60 lea esp, [esp+4] 9961000 add eax, ebx 40 Registers EAX, EBX, ECX, EDX, ESP, EBP, EIP, ESI, EDI Registers EAX, EBX, ECX, EDX, ESP, EBP, EIP, ESI, EDI Flags CF, OF, ZF, SF, PF, … Flags CF, OF, ZF, SF, PF, …

3 Superoptimization Input: An instruction sequence I – Instructions belong to an ISA – No loops Output: An instruction-sequence I’ – Instructions belong to same ISA – I’ is equivalent to I On all inputs, I and I’ produce the same results – I’ is the optimal implementation of I + timeout t an improved shorter faster lesser energy consumption

4 Superoptimization sub eax, ecx neg eax sub eax, 1 sub eax, ecx neg eax sub eax, 1 IA-32 Superoptimizer IA-32 Superoptimizer not eax add eax, ecx not eax add eax, ecx EAX ← ECX – EAX – 1

5 Superoptimization add ebp. 4 mov [ebp], eax add ebp, 4 mov [ebp], ebx add ebp. 4 mov [ebp], eax add ebp, 4 mov [ebp], ebx IA-32 Superoptimizer IA-32 Superoptimizer mov [ebp+4], eax mov [ebp+8], ebx add ebp, 8 mov [ebp+4], eax mov [ebp+8], ebx add ebp, 8

6 Why do we need a superoptimizer?

7 © Alvin Cheung, CSE 501

8 Peephole Optimization Done at low-level IR – Closer to assembly code Sliding window – “peephole” Peephole rules – Rewrite rules of form “LHS → RHS ” If current peephole matches LHS of some rule, replace with RHS

9 Peephole Rules Rule categoryLHSRHS Eliminating redundant loads and stores mov reg, [addr] // Load contents of [addr] in reg mov [addr], reg // Store contents of reg in [addr] (Both instructions must be in the same basic block) mov reg, [addr] Control-flow optimizations goto L1 … L1: goto L2 goto L2 … L1: goto L2 Algebraic rewrites add oprnd, 0(none) imul oprnd, 2shl oprnd, 1 Machine idioms add oprnd, 1inc oprnd

10 Problem with Peephole Rules Manually written – Cumbersome – Error-prone

11 Peephole Superoptimizer Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp] mov [esp], eax... mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]... add eax, 1 mov [esp], eax...

12 Peephole Superoptimizer Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp]. add [esp], 1 mov eax, [esp].... mov eax, [esp].... add [esp], 1 mov eax, [esp]...

13 Peephole Superoptimizer Superoptimization is a sloooooooooowwwww process

14 Optimization Database Training programs Instruction sequences Superoptimizer Optimization database

15 Peephole Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp] mov [esp], eax... mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]... add eax, 1 mov [esp], eax... Optimization database

16 Peephole Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp]. add [esp], 1 mov eax, [esp].... mov eax, [esp].... add [esp], 1 mov eax, [esp]... Optimization database

17 Outline Problem statement Motivation – Peephole optimization – Speeding up critical inner loops Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

18 Checking equivalence of two instruction-sequences Key primitive in superoptimizers and related tools “Do two loop-free instruction sequences I and I’ produce the same outputs for all possible inputs?”

19 Checking equivalence of two instruction-sequences 1.Encode I as a logical formula ϕ 2.Encode I’ as a logical formula ψ 3.Use an SMT solver to check if ϕ ⟺ ψ is logically valid How to encode an instruction sequence as a logical formula? How to use an SMT solver to check if two formulas are equivalent?

20 How to encode an instruction sequence I as a logical formula ϕ? 1.Symbolically evaluate I on Id state to obtain post-state σ 2.Convert σ into a logical formula

21 Concrete Evaluation IA-32 state IA-32 interpreter – Concrete operational semantics of IA-32 instructions

22 IA-32 State ⟨RegMap, FlagMap, MemMap⟩ RegMap : Register ↦ 32-bit value FlagMap : Flag ↦ Boolean value MemMap : 32-bit address ↦ 8-bit value ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 0, EBX ↦ 8, ECX ↦ 0, …, ESP ↦ 1000, EBP ↦ 0, …, EDI ↦ 0 ], [SF ↦ false, CF ↦ false, … OF ↦ false ], [0 ↦ 0, 4 ↦ 0, 8 ↦ 0, …, 1000 ↦ 2, 1004 ↦ 0, …, 4294967292 ↦ 0]⟩ ⟨ [EAX ↦ 0, EBX ↦ 8, ECX ↦ 0, …, ESP ↦ 1000, EBP ↦ 0, …, EDI ↦ 0 ], [SF ↦ false, CF ↦ false, … OF ↦ false ], [0 ↦ 0, 4 ↦ 0, 8 ↦ 0, …, 1000 ↦ 2, 1004 ↦ 0, …, 4294967292 ↦ 0]⟩ 32-bit address ↦ 32-bit value

23 Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩

24 Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩

25 Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 2, EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 2, EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩

26 Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 6, EBX ↦ 8, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 6, EBX ↦ 8, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩

27 Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩

28 Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 4]⟩ ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 4]⟩

29 Symbolic Evaluation Symbolic IA-32 state Symbolic IA-32 interpreter – Symbolic transformers for instructions – Obtained from concrete operational semantics of IA-32 instructions

30 Symbolic IA-32 State ⟨RegMap, FlagMap, MemMap⟩ RegMap : Symbolic register-constant ↦ term FlagMap : Symbolic flag-constant ↦ formula MemMap : Function symbol ↦ function-update expression ID state ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, EBP’ ↦ EBP, …, EDI’ ↦ EDI ], [SF’ ↦ SF, CF’ ↦ CF, … ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ID state ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, EBP’ ↦ EBP, …, EDI’ ↦ EDI ], [SF’ ↦ SF, CF’ ↦ CF, … ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ QFBV+Arrays

31 Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩

32 Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩

33 Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP), EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ Mem(ESP), EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩

34 Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((Mem(ESP) + 4) < 0), …, ZF’ ↦ ((Mem(ESP) + 4) = 0) ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((Mem(ESP) + 4) < 0), …, ZF’ ↦ ((Mem(ESP) + 4) = 0) ], [Mem’ ↦ Mem]⟩

35 Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem]⟩

36 Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩

37 Convert a symbolic state into a logical formula ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩ EAX’ = Mem(ESP) + 4 ∧ EBX’ = EBX – 4 ∧ … ∧ ESP’ = ESP ∧ … ∧ SF’ = ((EBX – 4) < 0) ∧ … ∧ ZF’ = ((EBX – 4) = 0) ∧ Mem’ = Mem[ESP ↦ EBX – 4] EAX’ = Mem(ESP) + 4 ∧ EBX’ = EBX – 4 ∧ … ∧ ESP’ = ESP ∧ … ∧ SF’ = ((EBX – 4) < 0) ∧ … ∧ ZF’ = ((EBX – 4) = 0) ∧ Mem’ = Mem[ESP ↦ EBX – 4]

38 How to encode an instruction sequence I as a logical formula ϕ? 1.Symbolically evaluate I on Id state to obtain post-state σ 2.Convert σ into a logical formula push eax EAX’ = EAX ∧ EBX’ = EBX ∧ … ∧ ESP’ = ESP – 4 ∧ … ∧ SF’ = SF ∧ … ∧ ZF’ = ZF ∧ Mem’ = Mem[ESP – 4 ↦ EAX] EAX’ = EAX ∧ EBX’ = EBX ∧ … ∧ ESP’ = ESP – 4 ∧ … ∧ SF’ = SF ∧ … ∧ ZF’ = ZF ∧ Mem’ = Mem[ESP – 4 ↦ EAX] ESP’ = ESP – 4 ∧ Mem’ = Mem[ESP – 4 ↦ EAX]

39 Exercise Convert the following instruction sequence into a QFBV formula lea esp, [esp - 4] mov [esp], 1 push ebp mov ebp, esp

40 Checking Equivalence of Two Instruction-Sequences 1.Encode I as a logical formula ϕ 2.Encode I’ as a logical formula ψ 3.Use an SMT solver to check if ϕ ⟺ ψ is logically valid How to encode an instruction sequence as a logical formula? How to use an SMT solver to check if two formulas are equivalent?

41 How to use SMT solver to check equivalence? Conventionally used for “satisfiability” – Symbolic execution for test-case generation EAX > 0 ∧ EBX = EAX + 4 ∧ EBX > 0 SMT solver SAT [EAX ↦ 1, EBX ↦ 5 ] SAT [EAX ↦ 1, EBX ↦ 5 ] EAX > 0 ∧ EBX = EAX + 4 ∧ EBX ≤ 0 SMT solver UNSAT

42 How to use SMT solver to check equivalence? SMT solver UNSAT (Two formulas are equivalent) UNSAT (Two formulas are equivalent) SMT solver SAT [EAX ↦ 1, EBX ↦ 1 ] (Counterexample) SAT [EAX ↦ 1, EBX ↦ 1 ] (Counterexample)

43 Checking equivalence of two instruction sequences sub eax, ecx neg eax sub eax, 1 sub eax, ecx neg eax sub eax, 1 EAX’ = -1 * (EAX + (-1 * ECX)) - 1 ∧ SF’ = (-1 * (EAX + (-1 * ECX)) - 1) < 0 EAX’ = -1 * (EAX + (-1 * ECX)) - 1 ∧ SF’ = (-1 * (EAX + (-1 * ECX)) - 1) < 0 not eax add eax, ecx not eax add eax, ecx EAX’ = -1 * EAX + ECX - 1 ∧ SF’ = (-1 * EAX + ECX - 1) < 0 EAX’ = -1 * EAX + ECX - 1 ∧ SF’ = (-1 * EAX + ECX - 1) < 0 SMT solver UNSAT

44 Checking equivalence of two instruction sequences mov eax, [esp] add eax, ebx mov eax, [esp] add eax, ebx EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) + EBX) < 0 EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) + EBX) < 0 mov eax, [esp] sub eax, ebx mov eax, [esp] sub eax, ebx SMT solver SAT [EBX ↦ 1, ESP ↦ 1000, Mem[1000 ↦ 1 ]] SAT [EBX ↦ 1, ESP ↦ 1000, Mem[1000 ↦ 1 ]] EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) - EBX) < 0 EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) - EBX) < 0

45 Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

46 Thousands of unique IA-32 instruction schemas Exponential cost of enumeration += Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer Optimality check No Timeout expired I’

47 Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

48 Canonicalization mov ebp, esp mov esp, ebp mov ebp, esp mov esp, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, esp mov ebp, eax mov eax, ebp mov ebp, eax mov eax, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, eax mov esi, esp mov esp, esi mov esi, esp mov esp, esi IA-32 Superoptimizer IA-32 Superoptimizer mov esi, esp

49 Canonicalization mov ebp, esp mov esp, ebp mov ebp, esp mov esp, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, esp mov ebp, eax mov eax, ebp mov ebp, eax mov eax, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, eax mov esi, esp mov esp, esi mov esi, esp mov esp, esi IA-32 Superoptimizer IA-32 Superoptimizer mov esi, esp mov reg1, reg2 mov reg2, reg1 mov reg1, reg2 mov reg2, reg1 IA-32 Superoptimizer IA-32 Superoptimizer mov reg1, reg2

50 Canonicalization add ebp, 20 add ebp, 30 add ebp, 20 add ebp, 30 IA-32 Superoptimizer IA-32 Superoptimizer add ebp, 50 add reg1, c1 add reg1, c2 add reg1, c1 add reg1, c2 IA-32 Superoptimizer IA-32 Superoptimizer add reg1, c1+c2

51 Canonicalization add ebp, 20 add ebp, 30 add ebp, 20 add ebp, 30 Canonicalizer add ebp, 50 add reg1, c1 add reg1, c2 add reg1, c1 add reg1, c2 Uncanonicalizer add reg1, c1+c2 IA-32 Superoptimizer IA-32 Superoptimizer reg1 ↦ ebp c1 ↦ 20 c2 ↦ 30 reg1 ↦ ebp c1 ↦ 20 c2 ↦ 30

52 Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer Optimality check No Timeout expired I’

53 Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer + Canonicalization Optimality check No Timeout expired I’ Canonicalizer I’’ Uncanonicalizer

54 Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

55 Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer + Canonicalization Optimality check No Timeout expired I’ Canonicalizer I’’ Uncanonicalizer

56 Test Cases mov reg1, [reg2] add reg1, reg3 mov reg1, [reg2] add reg1, reg3 mov reg1, [reg2] sub reg1, reg3 mov reg1, [reg2] sub reg1, reg3 [REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 2, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 2, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 0, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 0, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]]

57 Instruction- Sequence Enumerator Equivalence check Equivalence check I Test Cases Optimality check Optimality check No Timeout expired I’ Canonicalizer Test cases Test cases Fail I’’ Uncanonicalizer

58 Checking equivalence of two instruction sequences mov reg1, [reg2] add reg1, reg3 mov reg1, [reg2] add reg1, reg3 REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) + REG3) < 0 REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) + REG3) < 0 mov reg1, [reg2] sub reg1, reg3 mov reg1, [reg2] sub reg1, reg3 SMT solver SAT [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] SAT [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) – REG3) < 0 REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) – REG3) < 0 Counterexample

59 Instruction- Sequence Enumerator Equivalence check Equivalence check I Test Cases Optimality check Optimality check No Timeout expired I’ Canonicalizer Test cases Test cases Fail Counter-example Counterexample- guided inductive synthesis (CEGIS) I’’ Uncanonicalizer

60 Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

61 Pruning sub-optimal candidates. add ebp, c1+c2. One-instruction sequencesTwo-instruction sequencesThree-instruction sequences. add ebp, c1 add ebp, c2...........................

62 Instruction- Sequence Enumerator + Pruning Equivalence check Equivalence check I Superoptimizer Optimality check Optimality check No Timeout expired I’ Canonicalizer Test cases Test cases Fail Counter-example

63 Results – Search-space pruning LengthOriginal search space After canonicalization After pruningReduction factor 154539976448.5 229 million2.49 million1.2 million24.7 3162.1 billion8.6 billion3.11 billion52.1 Original search space After canonicalization After pruning After test-cases

64 But still …. Superoptimization is still a sloooooooooowwwww process After canonicalization, pruning, and testing, checking all candidates of length up to 3 instructions takes several hours

65 Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

66 © Eric Schkufza, ASPLOS 2013

67

68

69 Montgomery multiplication kernel in OpenSSL library (116 instructions): STOKE’s result is 16 instructions shorter and 1.6 times faster than gcc –O3’s result

70 Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization


Download ppt "Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015."

Similar presentations


Ads by Google