Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compilers have many bugs

Similar presentations


Presentation on theme: "Compilers have many bugs"— Presentation transcript:

1 Automatically Checking the Correctness of Program Analyses and Transformations

2 Compilers have many bugs
Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results: [Bug middle-end/19650] New: miscompilation of correct code [Bug c++/19731] arguments incorrectly named in static member specialization [Bug rtl-optimization/13300] Variable incorrectly identified as a biv [Bug rtl-optimization/16052] strength reduction produces wrong code [Bug tree-optimization/19633] local address incorrectly thought to escape [Bug target/19683] New: MIPS wrong-code for 64-bit multiply [Bug c++/19605] Wrong member offset in inherited classes Bug java/19295] [4.0 regression] Incorrect bytecode produced for bitwise AND Total of 545 matches… And this is only for Janary 2005! On a mature compiler!

3 Compiler bugs cause problems
if (…) { x := …; } else { y := …; } …; Compiler Exec They lead to buggy executables They rule out having strong guarantees about executables

4 The focus: compiler optimizations
A key part of any optimizing compiler Original program Optimization Optimized program

5 The focus: compiler optimizations
A key part of any optimizing compiler Hard to get optimizations right Lots of infrastructure-dependent details There are many corner cases in each optimization There are many optimizations and they interact in unexpected ways It is hard to test all these corner cases and all these interactions

6 Goals Make it easier to write compiler optimizations
student in an undergrad compiler course should be able to write optimizations Provide strong guarantees about the correctness of optimizations automatically (no user intervention at all) statically (before the opts are even run once) Expressive enough for realistic optimizations

7 The Rhodium work A domain-specific language for writing optimizations: Rhodium A correctness checker for Rhodium optimizations An execution engine for Rhodium optimizations Implemented and checked the correctness of a variety of realistic optimizations

8 Broader implications Many other kinds of program manipulators: code refactoring tools, static checkers My work is about program analyses and transformations, the core of any program manipulator Enables safe extensible program manipulators Allow end programmers to easily and safely extend program manipulators Improve programmer productivity

9 Outline Introduction Overview of the Rhodium system
Writing Rhodium optimizations Checking Rhodium optimizations Evaluation

10 Rhodium system overview
Written by me Rhodium Execution engine Checker Written by programmer Rdm Opt

11 Rhodium system overview
Written by me Rhodium Execution engine Checker Written by programmer Rdm Opt

12 Rhodium system overview
Rdm Opt Rdm Opt Rdm Opt Checker Checker Checker

13 Rhodium system overview
Compiler if (…) { x := …; } else { y := …; } …; Rhodium Execution engine Exec Rdm Opt Rdm Opt Rdm Opt Checker Checker

14 The technical problem Tension between: Challenge: develop techniques
Expressiveness Automated correctness checking Challenge: develop techniques that will go a long way in terms of expressiveness that allow correctness to be checked

15 Contribution: three techniques
Rdm Opt Verification Task Checker Show that for any original program: behavior of original program = behavior of optimized program Verification Task Automatic Theorem Prover

16 Contribution: three techniques
Rdm Opt Verification Task Verification Task Automatic Theorem Prover

17 Contribution: three techniques
Rdm Opt Verification Task Verification Task Automatic Theorem Prover

18 Contribution: three techniques
Rdm Opt Rhodium is declarative declare intent using rules execution engine takes care of the rest First, Rhodium is a declarative language. There are no loops, no branches, no program counter. Programmers declare their intent using rules, and then the execution takes care of the rest. These rules are similar to what are called flow functions, which are the way that compiler writers typically think about optimizations. As a result, these rules provide a familiar way of expressing optimizations, while shielding the programmer from infrastructure dependent details. Because these rules talk about only one statement at a time, they are much easier to reason about automatically, and so the verifications task becomes simpler. Automatic Theorem Prover

19 Contribution: three techniques
Rdm Opt Rhodium is declarative declare intent using rules execution engine takes care of the rest Automatic Theorem Prover

20 Contribution: three techniques
Part that must be reasoned about Heuristics not affecting correctness Rdm Opt Rhodium is declarative Factor out heuristics legal transformations vs. profitable transformations Next, I noticed that a large part of these optimizations are profitability heuristics that figure out whether or not an optimization will improve performance. These heuristics do not affect correctness. For example, in inlining, there is a lot of computation that goes on in order to determine if a function call should be inlined. But all this computation does not in any way affect whether or not inlining the function is correct. Rhodium forces the programmer to factor out profitability heuristics from the rest of the optimization. First, the programmer declares what transformations are legal, and this is the bolded part here. This part is simple even for complicated optimizations. Then, using an arbitrary language, the programmer can implement profitability heuristics that pick which ones of these legal transformations to actually perform. Because these heuristics do not affect correctness, the checker only needs to reason about the small bolded part here, and so the verification task becomes simpler. Automatic Theorem Prover

21 Contribution: three techniques
Part that must be reasoned about Heuristics not affecting correctness Rhodium is declarative Factor out heuristics legal transformations vs. profitable transformations Automatic Theorem Prover

22 Contribution: three techniques
opt- dependent Rhodium is declarative Factor out heuristics Split verification task opt-dependent vs. opt-independent opt- independent Automatic Theorem Prover

23 Contribution: three techniques
Rhodium is declarative Factor out heuristics Split verification task opt-dependent vs. opt-independent Automatic Theorem Prover

24 Contribution: three techniques
Rhodium is declarative Factor out heuristics Split verification task opt-dependent vs. opt-independent Automatic Theorem Prover

25 Contribution: three techniques
Rhodium is declarative Factor out heuristics Split verification task Result: Expressive language Automated correctness checking Automatic Theorem Prover

26 Outline Introduction Overview of the Rhodium system
Writing Rhodium optimizations Checking Rhodium optimizations Evaluation

27 MustPointTo analysis a = &b a b c = a c a b d = *c d = b

28 MustPointTo info in Rhodium
a = &b mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c

29 MustPointTo info in Rhodium
a = &b a = &b mustPointTo (a, b) mustPointTo (a, b) a b a b c = a c = a mustPointTo (a, b) mustPointTo (c, b) c a b mustPointTo (a, b) mustPointTo (c, b) c a b d = *c d = *c

30 MustPointTo info in Rhodium
define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b Fact correct on edge if: mustPointTo (a, b) a b whenever program execution reaches edge, meaning of fact evaluates to true in the program state c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c

31 Propagating facts define fact mustPointTo(X:Var,Y:Var)
with meaning « X == &Y ¬ a = &b mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c

32 Propagating facts define fact mustPointTo(X:Var,Y:Var)
with meaning « X == &Y ¬ a = &b a = &b if currStmt == [X = &Y] then if currStmt == [X = &Y] then mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c

33 Propagating facts define fact mustPointTo(X:Var,Y:Var)
with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c

34 Propagating facts define fact mustPointTo(X:Var,Y:Var)
with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then a b mustPointTo (a, b) mustPointTo (a, b) if Æ currStmt == [Z = X] then c = a c = a c a b mustPointTo (a, b) mustPointTo (c, b) mustPointTo (c, b) d = *c

35 Propagating facts define fact mustPointTo(X:Var,Y:Var)
with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo (a, b) a b if Æ currStmt == [Z = X] then c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c

36 Transformations define fact mustPointTo(X:Var,Y:Var)
with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo (a, b) a b if Æ currStmt == [Z = X] then c = a c a b mustPointTo (a, b) if Æ currStmt == [Z = *X] then transform to [Z = Y] mustPointTo (c, b) mustPointTo (c, b) d = *c d = *c d = b

37 Transformations define fact mustPointTo(X:Var,Y:Var)
with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo (a, b) a b if Æ currStmt == [Z = X] then c = a c a b mustPointTo (a, b) if Æ currStmt == [Z = *X] then transform to [Z = Y] mustPointTo (c, b) d = *c d = b

38 Profitability heuristics
Legal transformations (identified by the Rhodium rules) Profitability Heuristics Subset of legal transformations (actually performed)

39 Profitability heuristic example 1
Inlining Many heuristics to determine when to inline a function compute function sizes, estimate code-size increase, estimate performance benefit maybe even use AI techniques to make the decision However, these heuristics do not affect the correctness of inlining They are just used to choose which of the correct set of transformations to perform

40 Profitability heuristic example 2
Partial redundancy elimination (PRE) a := ...; b := ...; if (...) { x := a + b; } else { ... } x := a + b;

41 Profitability heuristic example 2
PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { ... } Code duplication x := a + b;

42 Profitability heuristic example 2
PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { ... } x := Code duplication CSE x := a + b; a + b; x;

43 Profitability heuristic example 2
PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { ... } x := Code duplication CSE self-assignment removal x := a + b; x;

44 Profitability heuristic example 2
Legal placements of x := a + b Profitable placement a := ...; b := ...; if (...) { x := a + b; } else { ... }

45 Semantics of a Rhodium opt
Run propagation rules in a loop until there are no more changes (optimistic iterative analysis) Then run transformation rules Then run profitability heuristics For better precision, combine propagation rules and transformations rules using our previous composition framework [POPL 02]

46 More facts define fact mustNotPointTo(X:Var,Y:Var)
with meaning « X  &Y ¬ define fact doesNotPointIntoHeap(X:Var) with meaning « 9 Y:Var . X == &Y ¬ define fact hasConstantValue(X:Var,C:Const) with meaning « X == C ¬

47 More rules if currStmt == [X = *A] Æ mustNotPointToHeap(A)@in Æ
8 B:Var . ) mustNotPointTo(B,Y) then if currStmt == [Y = I + BE ] Æ Æ Æ : mayDef(X) Æ : mayDefArray(A) Æ unchanged(BE) then

48 More in Rhodium More powerful pointer analyses
Heap summaries Analyses across procedures Interprocedural analyses Analyses that don’t care about the order of statements Flow-insensitive analyses

49 Outline Introduction Overview of the Rhodium system
Writing Rhodium optimizations Checking Rhodium optimizations Evaluation

50 Rhodium correctness checker
Exec Compiler Rhodium Execution engine Rdm Opt if (…) { x := …; } else { y := …; } …; Checker Rdm Opt Checker

51 Rhodium correctness checker
Rdm Opt Checker

52 Rhodium correctness checker
Rdm Opt Checker Checker Automatic theorem prover

53 Rhodium correctness checker
Rhodium optimization define fact … if … then … if … then transform … Profitability heuristics Checker Automatic theorem prover

54 Rhodium correctness checker
Rhodium optimization define fact … if … then … if … then transform … Checker Automatic theorem prover

55 Rhodium correctness checker
Rhodium optimization Opt-independent define fact … if … then … if … then transform … Lemma For any Rhodium opt: If Local VCs are true Then opt is correct Proof «¬  $  \ r t  l Checker VCGen VCGen Local VC Local VC Opt-dependent Automatic theorem prover

56 Local verification conditions
define fact mustPointTo(X,Y) with meaning « X == &Y ¬ Local VCs (generated and proven automatically) Assume: Propagated fact is correct Show: All incoming facts are correct if Æ currStmt == [Z = X] then Assume: Original stmt and transformed stmt have same behavior Show: All incoming facts are correct if Æ currStmt == [Z = *X] then transform to [Z = Y]

57 Local correctness of prop. rules
define fact mustPointTo(X,Y) with meaning « X == &Y ¬ Local VC (generated and proven automatically) Show: « Z == &Y ¬ (out) « X == &Y ¬ (in) Æ out = step (in , [Z = X] ) Assume: Assume: All incoming facts are correct if Æ currStmt == [Z = X] Show: Propagated fact is correct then mustPointTo (X, Y) mustPointTo (Z, Y) Z := X

58 Local correctness of prop. rules
define fact mustPointTo(X,Y) with meaning « X == &Y ¬ Local VC (generated and proven automatically) Show: « Z == &Y ¬ (out) « X == &Y ¬ (in) Æ out = step (in , [Z = X] ) Assume: currStmt == [Z = X] then if Æ X Y mustPointTo (X, Y) mustPointTo (Z, Y) Z := X in Z := X Z Y ? out

59 Outline Introduction Overview of the Rhodium system
Writing Rhodium optimizations Checking Rhodium optimizations Evaluation

60 Dimensions of evaluation
Ease of use Correctness guarantees Usefulness of the checker Expressiveness

61 Ease of use Joao Dias Erika Rice, summer 2004
Guarantees Usefulness Expressiveness Ease of use Joao Dias third year graduate student in compilers at Harvard less than 45 mins to write CSE and copy prop Erika Rice, summer 2004 only knowledge of compilers: one undergrad class started writing Rhodium optimizations in a few days Simple interface to the compiler’s structures pattern matching “flow functions” familiar to compiler 101 students

62 Correctness guarantees
Ease of use Guarantees Usefulness Expressiveness Correctness guarantees Once checked, optimizations are guaranteed to be correct Caveat: trusted computing base execution engine checker implementation proofs done by hand once by me Adding a new optimization does not increase the size of the trusted computing base

63 Usefulness of the checker
Ease of use Guarantees Usefulness Expressiveness Usefulness of the checker Found subtle bugs in my initial implementation of various optimizations define fact equals(X:Var, E:Expr) with meaning « X == E ¬ x := x + 1 x = x + 1 if currStmt == [X = E] then equals (x , x + 1)

64 Usefulness of the checker
Ease of use Guarantees Usefulness Expressiveness Usefulness of the checker Found subtle bugs in my initial implementation of various optimizations define fact equals(X:Var, E:Expr) with meaning « X == E ¬ x := x + 1 x = x + 1 if currStmt == [X = E] then if currStmt == [X = E] Æ “X does not appear in E” then equals (x , x + 1)

65 Usefulness of the checker
Ease of use Guarantees Usefulness Expressiveness Usefulness of the checker Found subtle bugs in my initial implementation of various optimizations define fact equals(X:Var, E:Expr) with meaning « X == E ¬ x = x + 1 x = x + 1 x = *y + 1 if currStmt == [X = E] Æ “X does not appear in E” then if currStmt == [X = E] Æ “E does not use X” then equals (x , x + 1) equals (x , *y + 1)

66 Rhodium expressiveness
Ease of use Guarantees Usefulness Expressiveness Rhodium expressiveness Traditional optimizations: const prop and folding, branch folding, dead assignment elim, common sub-expression elim, partial redundancy elim, partial dead assignment elim, arithmetic invariant detection, and integer range analysis. Pointer analyses must-point-to analysis, Andersen's may-point-to analysis with heap summaries Loop opts loop-induction-variable strength reduction, code hoisting, code sinking Array opts constant propagation through array elements, redundant array load elimination

67 Rhodium expressiveness
Ease of use Guarantees Usefulness Expressiveness Rhodium expressiveness Traditional optimizations: const prop and folding, branch folding, dead assignment elim, common sub-expression elim, partial redundancy elim, partial dead assignment elim, arithmetic invariant detection, and integer range analysis. Pointer analyses must-point-to analysis, Andersen's may-point-to analysis with heap summaries Loop opts loop-induction-variable strength reduction, code hoisting, code sinking Array opts constant propagation through array elements, redundant array load elimination

68 Expressiveness limitations
Ease of use Guarantees Usefulness Expressiveness Expressiveness limitations May not be able to express your optimization in Rhodium opts that build complicated data structures opts that perform complicated many-to-many transformations (e.g.: loop fusion, loop unrolling) A correct Rhodium optimization may be rejected by the correctness checker limitations of the theorem prover limitations of first-order logic

69 Summary Rhodium system
makes it easier to write optimizations provides correctness guarantees is expressive enough for realistic optimizations Rhodium system provides a foundation for safe extensible program manipulators


Download ppt "Compilers have many bugs"

Similar presentations


Ads by Google