Instruction Selection, Part I Selection via Peephole Optimization Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Slides:



Advertisements
Similar presentations
Operator Strength Reduction From Cooper, Simpson, & Vick, “Operator Strength Reduction”, ACM TOPLAS, 23(5), See also § of EaC2e. 1COMP 512,
Advertisements

Peephole Optimization & Other Post-Compilation Techniques 1COMP 512, Rice University Copyright 2011, Keith D. Cooper, Linda Torczon, & Jason Eckhardt,
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
SSA-Based Constant Propagation, SCP, SCCP, & the Issue of Combining Optimizations 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon,
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
2015/6/10\course\cpeg421-08s\Topic-6.ppt1 Topic 6 Basic Back-End Optimization Instruction Selection Instruction scheduling Register allocation.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
11/19/2002© 2002 Hal Perkins & UW CSEN-1 CSE 582 – Compilers Instruction Selection Hal Perkins Autumn 2002.
Code Shape III Booleans, Relationals, & Control flow Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
2015/6/28\course\cpeg421-10F\Topic-2.ppt1 Topic 2: Foundamentals in Basic Back-End Optimization Instruction Selection Instruction scheduling Register allocation.
Intermediate Code. Local Optimizations
Improving Code Generation Honors Compilers April 16 th 2002.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
The Procedure Abstraction, Part VI: Inheritance in OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
The Procedure Abstraction, Part V: Support for OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Local Register Allocation & Lab Exercise 1 Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Instruction Selection and Scheduling. The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate.
Operator Strength Reduction C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Local Instruction Scheduling — A Primer for Lab 3 — Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Intermediate Code Representations
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
Instruction Scheduling Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
Memory Optimizations & Post-Compilation Techniques CS 671 April 3, 2008.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Introduction to Optimization
Instruction Selection II: Tree-pattern Matching Comp 412
Local Register Allocation & Lab Exercise 1
Local Instruction Scheduling
Context-sensitive Analysis
Introduction to Optimization
Intermediate Representations
Introduction to Code Generation
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Unit IV Code Generation
Local Instruction Scheduling — A Primer for Lab 3 —
Code Shape III Booleans, Relationals, & Control flow
Instruction Selection, II Tree-pattern matching
Intermediate Representations
Peephole Optimization & Other Post-Compilation Techniques COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Instruction Selection Hal Perkins Autumn 2005
Local Register Allocation & Lab Exercise 1
Introduction to Optimization
Lecture 16: Register Allocation
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
CSc 453 Final Code Generation
Presentation transcript:

Instruction Selection, Part I Selection via Peephole Optimization Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL Comp 412, Fall 2010

1 The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front end construction is largely automated Middle is largely hand crafted (Parts of ) back end can be automated Front EndBack EndOptimizer Infrastructure Today’s lecture: Automating Instruction Selection Today’s lecture: Automating Instruction Selection Comp 412, Fall 2010

2 Definitions Instruction selection Mapping IR into assembly code Assumes a fixed storage mapping & code shape Combining operations, using address modes Instruction scheduling Reordering operations to hide latencies Assumes a fixed program (set of operations) Changes demand for registers Register allocation Deciding which values will reside in registers Changes the storage mapping, may add false sharing Concerns about placement of data & memory operations Comp 412, Fall 2010

3 The Problem Modern computers (still) have many ways to do anything Consider register-to-register copy in I LOC Obvious operation is i2i r i  r j Many others exist Human would ignore all of these Algorithm must look at all of them & find low-cost encoding —Take context into account ( busy functional unit? ) And I LOC is an overly-simplified case Comp 412, Fall 2010

4 The Goal Want to automate generation of instruction selectors Machine description should also help with scheduling & allocation Front EndBack EndMiddle End Infrastructure Tables Pattern Matching Engine Back-end Generator Back-end Generator Machine description Description-based retargeting Comp 412, Fall 2010

5 The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 22 ) ran quickly How good was the code? x IDENT IDENT loadI4  r 5 loadAOr 0,r 5  r 6 loadI8  r 7 loadAOr 0,r 7  r 8 multr 6,r 8  r 9 loadAIr 0,4  r 5 loadAIr 0,8  r 6 multr 5,r 6  r 7 TreeTreewalk CodeDesired Code Comp 412, Fall 2010

6 The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 22 ) ran quickly How good was the code? x IDENT IDENT loadI4  r 5 loadAOr 0,r 5  r 6 loadI8  r 7 loadAOr 0,r 7  r 8 multr 6,r 8  r 9 loadAIr 0,4  r 5 loadAIr 0,8  r 6 multr 5,r 6  r 7 TreeTreewalk CodeDesired Code Comp 412, Fall 2010 This inefficiency is easy to fix. See digression on page 317 of EaC1e. This inefficiency is easy to fix. See digression on page 317 of EaC1e.

7 The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 22 ) ran quickly How good was the code? x IDENT NUMBER loadI4  r 5 loadAOr 0,r 5  r 6 loadI2  r 7 multr 6,r 7  r 8 loadAIr 0,4  r 5 multIr 5,2  r 7 TreeTreewalk CodeDesired Code Comp 412, Fall 2010

8 The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 22 ) ran quickly How good was the code? x IDENT NUMBER loadI4  r 5 loadAOr 0,r 5  r 6 loadI2  r 7 multr 6,r 7  r 8 loadAIr 0,4  r 5 multIr 5,2  r 7 TreeTreewalk CodeDesired Code Must use info from both these nodes. This is not a local problem. Comp 412, Fall 2010

9 The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 22 ) ran quickly How good was the code? x IDENT NUMBER loadI4  r 5 loadAOr 0,r 5  r 6 loadI2  r 7 multr 6,r 7  r 8 loadAIr 0,4  r 5 addr 5,r 5  r 7 TreeTreewalk CodeDesired Code Another possibility that might take less time & energy — an algebraic identity Comp 412, Fall 2010

10 The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 22 ) ran quickly How good was the code? x IDENT IDENT  r 5 loadI 4  r 6 loadAOr 5,r 6  r 7  r 7 loadI4  r 8 loadAOr 8,r 9  r 10 multr 7,r 10  r 11 loadI4  r 5 loadAIr  r 6 loadAIr  r 7 multr 6,r 7  r 8 TreeTreewalk CodeDesired Code Comp 412, Fall 2010

11 The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator met the second criteria ( lec. 22 ) How did it do on the first ? x IDENT IDENT  r 5 loadI 4  r 6 loadAOr 5,r 6  r 7  r 7 loadI4  r 8 loadAOr 8,r 9  r 10 multr 7,r 10  r 11 loadI4  r 5 loadAIr  r 6 loadAIr  r 7 multr 6,r 7  r 8 TreeTreewalk CodeDesired Code Common offset Another nonlocal problem

12 How do we perform this kind of matching ? Tree-oriented IR suggests pattern matching on trees Process takes tree-patterns as input, matcher as output Each pattern maps to a target-machine instruction sequence Use dynamic programming or bottom-up rewrite systems Linear IR suggests using some sort of string matching Process takes strings as input, matcher as output Each string maps to a target-machine instruction sequence Use text matching ( Aho-Corasick ) or peephole matching In practice, both work well; matchers are quite different Comp 412, Fall 2010

13 Peephole Matching Basic idea Compiler can discover local improvements locally —Look at a small set of adjacent operations —Move a “peephole” over code & search for improvement Classic example was store followed by load storeAI r 1  r 0,8 loadAI r 0,8  r 15 storeAI r 1  r 0,8 i2i r 1  r 15 Original codeImproved code Comp 412, Fall 2010

14 Peephole Matching Basic idea Compiler can discover local improvements locally —Look at a small set of adjacent operations —Move a “peephole” over code & search for improvement Classic example was store followed by load Simple algebraic identities addI r 2,0  r 7 mult r 4,r 7  r 10 mult r 4,r 2  r 10 Original codeImproved code multI r 5,2  r 7 add r 2,r 2  r 7 See Table on p 401 of EaC (§8.3) Comp 412, Fall 2010

15 Peephole Matching Basic idea Compiler can discover local improvements locally —Look at a small set of adjacent operations —Move a “peephole” over code & search for improvement Classic example was store followed by load Simple algebraic identities Jump to a jump jumpI  L 10 L 10 : jumpI  L 11 Original codeImproved code Must be within the window Comp 412, Fall 2010

16 Peephole Matching Implementing it Early systems used limited set of hand-coded patterns Window size ensured quick processing O(n 2 )  O(n) Modern peephole instruction selectors (Davidson) Break problem into three tasks Apply symbolic interpretation & simplification systematically Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM Comp 412, Fall 2010

17 Peephole Matching Expander Turns IR code into a low-level IR (LLIR) such as RTL Operation-by-operation, template-driven rewriting LLIR form includes all direct effects (e.g., setting cc) Significant, albeit constant, expansion of size Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM Comp 412, Fall 2010

18 Peephole Matching Simplifier Looks at LLIR through window and rewrites it Uses forward substitution, algebraic simplification, local constant propagation, and dead-effect elimination Performs local optimization within window This is the heart of the peephole system —Benefit of peephole optimization shows up in this step Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM Comp 412, Fall 2010

19 Peephole Matching Matcher Compares simplified LLIR against a library of patterns Picks low-cost pattern that captures effects Must preserve LLIR effects, may add new ones (e.g., set cc) Generates the assembly code output Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM Comp 412, Fall 2010

Finding Dead Effects The Simplifier must know what is useless (i.e., dead) Expander works in a context-independent fashion It can process the operations in any order —Use a backward walk and compute local L IVE information —Tag each operation with a list of useless values What about non-local effects? —Most useless effects are local — defined & used in same block —It can be conservative & assume L IVE until proven dead ASM mult r 5,r 9  r 12 add r 12,r 17  r 13 LLIR r 12  r 5 * r 9 cc  f(r 5 *r 9 ) r 13  r 12 + r 17 cc  f(r 12 +r 17 ) LLIR r 12  r 5 * r 9 cc  f(r 5 *r 9 ) r 13  r 12 + r 17 cc  f(r 12 +r 17 ) ASM madd r 5,r 9,r 17  r 13 This effect would prevent multiply-add from matching  expand  simplify  match Comp 412, Fall As in Lab 1

21 Example Original IR Code OPArg 1 Arg 2 Result mult2Yt1t1 subxt1t1 w x - 2 x y becomes Symbolic names for memory-bound variables Comp 412, Fall 2010

22 Example Original IR Code OPArg 1 Arg 2 Result mult2Yt1t1 subxt1t1 w LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Expand x - 2 * y becomes Symbolic names for memory-bound variables Comp 412, Fall 2010 This version of the example assumes that x, y, and w are all stored in the AR. The example in § 11 of EaC assumes that x is a call-by-reference formal and y is a global. The results are different. This version of the example assumes that x, y, and w are all stored in the AR. The example in § 11 of EaC assumes that x is a call-by-reference formal and y is a global. The results are different.

23 Example LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Simplify LLIR Code r 13  M EM (r 0 y) r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 M EM (r 0 w)  r 18 Comp 412, Fall 2010

24 Introduced all memory operations & temporary names Turned out pretty good code Example Match LLIR Code r 13  M EM (r 0 y) r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 M EM (r 0 w)  r 18 I LOC Code loadAI r y  r 13 multI 2 x r 13  r 14 loadAI r x  r 17 sub r 17 - r 14  r 18 storeAI r 18  r w Comp 412, Fall 2010

25 Steps of the Simplifier ( 3-operation window ) Comp 412, Fall 2010 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18

26 Steps of the Simplifier ( 3-operation window ) r 10  2 r 11 y r 12  r 0 + r 11 Comp 412, Fall 2010 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18

27 Steps of the Simplifier ( 3-operation window ) r 10  2 r 11 y r 12  r 0 + r 11 r 10  2 r 12  r 0 y r 13  M EM (r 12 ) Comp 412, Fall 2010 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18

28 Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 10  2 r 12  r 0 y r 13  M EM (r 12 ) r 10  2 r 13  M EM (r 0 y) r 14  r 10 x r 13 Comp 412, Fall 2010

29 Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 13  M EM (r 0 y) r 14  2 x r 13 r 15 x r 10  2 r 13  M EM (r 0 y) r 14  r 10 x r 13 Comp 412, Fall 2010 Folding 2 into computation of r 14 made the 1 st op dead.

30 Steps of the Simplifier ( 3-operation window ) r 13  M EM (r 0 y) r 14  2 x r 13 r 15 x r 14  2 x r 13 r 15 x r 16  r 0 + r 15 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010 Simplifier emits ops that are live when they roll out of the window.

31 Steps of the Simplifier ( 3-operation window ) r 14  2 x r 13 r 15 x r 16  r 0 + r 15 r 14  2 x r 13 r 16  r 0 x r 17  M EM (r 16 ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

32 Steps of the Simplifier ( 3-operation window ) r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 r 14  2 x r 13 r 16  r 0 x r 17  M EM (r 16 ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

33 Steps of the Simplifier ( 3-operation window ) r 17  M EM (r 0 x) r 18  r 17 - r 14 r 19 w r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

34 Steps of the Simplifier ( 3-operation window ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 r 17  M EM (r 0 x) r 18  r 17 - r 14 r 19 w LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

35 Steps of the Simplifier ( 3-operation window ) r 18  r 17 - r 14 r 20  r 0 w M EM (r 20 )  r 18 r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

36 Steps of the Simplifier ( 3-operation window ) r 18  r 17 - r 14 M EM (r 0 w)  r 18 r 18  r 17 - r 14 r 20  r 0 w M EM (r 20 )  r 18 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

37 Steps of the Simplifier ( 3-operation window ) r 18  r 17 - r 14 M EM (r 0 w)  r 18 r 18  r 17 - r 14 r 20  r 0 w M EM (r 20 )  r 18 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

38 Example Simplify LLIR Code r 13  M EM (r 0 y) r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 M EM (r 0 w)  r 18 LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Comp 412, Fall 2010

39 Making It All Work Details LLIR is largely machine independent ( RTL ) —Some compilers use LLIR as one of their IRs —Eliminates the Expander Target machine described as LLIR  ASM pattern Actual pattern matching —Use a hand-coded pattern matcher (gcc) —Turn patterns into grammar & use LR parser (VPO) Several important compilers use this technology It seems to produce good portable instruction selectors Key strength appears to be late low-level optimization Comp 412, Fall 2010

Other Considerations Control-flow operations Can clear simplifier’s window at branch or label Predication has similar effects —May want to special case predicated single operations so as not to disrupt the flow of the simplifier too often … Physical versus logical windows Can run optimizer over a logical window —k operations connected definition to use Expander can link definitions & uses Logical windows ( within block ) improve effectiveness Davidson & Fraser report 30% faster & 20% fewer ops with local logical window. 40Comp 412, Fall 2010