CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral

Slides:



Advertisements
Similar presentations
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic E: Software Pipelining José Nelson Amaral
Advertisements

CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic P: Reference Analysis José Nelson Amaral
Topic G: Static Single-Assignment Form José Nelson Amaral
CMPUT Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic B: Open Research Compiler José Nelson Amaral
CMPUT Compiler Design and Optimization1 Borrowed from J. N. Amaral, slightly modified LIVE-IN: k j.
CMPUT Compiler Design and Optimization
1 SSA review Each definition has a unique name Each use refers to a single definition The compiler inserts  -functions at points where different control.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Jeffrey D. Ullman Stanford University. 2  A set of nodes N and edges E is a region if: 1.There is a header h in N that dominates all nodes in N. 2.If.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
1 Code Optimization. 2 The Code Optimizer Control flow analysis: control flow graph Data-flow analysis Transformations Front end Code generator Code optimizer.
1 Introduction to Data Flow Analysis. 2 Data Flow Analysis Construct representations for the structure of flow-of-data of programs based on the structure.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
Partial Redundancy Elimination. Partial-Redundancy Elimination Minimize the number of expression evaluations By moving around the places where an expression.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
CMPUT Compiler Design and Optimization
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA Emery Berger University.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Lecture 6 Program Flow Analysis Forrest Brewer Ryan Kastner Jose Amaral.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
2015/6/24\course\cpeg421-10F\Topic1-b.ppt1 Topic 1b: Flow Analysis Some slides come from Prof. J. N. Amaral
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.
Improving Code Generation Honors Compilers April 16 th 2002.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 Region-Based Data Flow Analysis. 2 Loops Loops in programs deserve special treatment Because programs spend most of their time executing loops, improving.
1 CS 201 Compiler Construction Data Flow Analysis.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
1 Code Optimization Chapter 9 (1 st ed. Ch.10) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CS4311 Spring 2011 Unit Testing Dr. Guoqiang Hu Department of Computer Science UTEP.
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
CS 614: Theory and Construction of Compilers Lecture 15 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
1 Code Optimization Chapter 9 (1 st ed. Ch.10) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Loops Simone Campanoni
Simone Campanoni CFA Simone Campanoni
Global Register Allocation Based on
Lecture 5 Partial Redundancy Elimination
CS 201 Compiler Construction
Static Single Assignment
Princeton University Spring 2016
CS 201 Compiler Construction
Factored Use-Def Chains and Static Single Assignment Forms
Control Flow Analysis CS 4501 Baishakhi Ray.
Code Optimization Chapter 10
Code Optimization Chapter 9 (1st ed. Ch.10)
CS 201 Compiler Construction
TARGET CODE GENERATION
Topic 4: Flow Analysis Some slides come from Prof. J. N. Amaral
Code Optimization Overview and Examples Control Flow Graph
Static Single Assignment Form (SSA)
Optimizations using SSA
Control Flow Analysis (Chapter 7)
Interval Partitioning of a Flow Graph
Data Flow Analysis Compiler Design
Taken largely from University of Delaware Compiler Notes
Presentation transcript:

CMPUT Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral

CMPUT Compiler Design and Optimization2 Reading Material Bharadwaj, J., Menezes, K., McKinsey, C., “Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs,” Proceedings of 32nd International Symposium on Microarchitecture, Dec. 1996, pp Bharadwaj, J., “Method and apparatus for instruction scheduling to reduce negative effects of compensation code,” Patent No. 5,894,576, April

CMPUT Compiler Design and Optimization3 New Concepts Global Code Scheduler (GCS) Region Formation Wavefront Scheduling Path Vectors Deferred Compensation P-ready Code Motion

CMPUT Compiler Design and Optimization4 Scheduling Regions Similar to Mahlke’s definition, here a region is a subgraph of a control flow graph that has a unique entry node that dominates all the nodes in the region. There is a further restriction that the regions must be acyclic.

CMPUT Compiler Design and Optimization5 JS-nodes A Join-Split (JS) edge in a CFG goes from a split node to a join node. A split node in a CFG is a node that has more than one immediate successor. A join node in a CFG is a node that has more than one immediate predecessor. C B D B D

CMPUT Compiler Design and Optimization6 Removal of JS-nodes C B D The application of the wavefront scheduling technique requires the removal of al JS-nodes. A JS-node is removed by adding an empty block (called a JS block) between the split node and the join node. C B D G

CMPUT Compiler Design and Optimization7 Interface Blocks A side entry node is a node in the region that has at least one immediate predecessor in the region, and at least one immediate predecessor outside the region. B E CD Which nodes are side entry nodes in the example? D D

CMPUT Compiler Design and Optimization8 Interface Blocks A side exit node is a node in the region that has at least one immediate successor in the region, and at least one immediate successor outside the region. Which nodes are side exit nodes in the example? C and D CD B E CDCD

CMPUT Compiler Design and Optimization9 Interface Blocks When control enters or leaves the region, GCS may require a block to schedule compensation code in. Thus interface blocks are inserted between two nodes x and y iff: (i) x is outside of the region, y is a side entry node, and there is an edge (x,y), or (ii) y is outside the region, x is a side exit node, and there is an edge (x,y).

CMPUT Compiler Design and Optimization10 Interface Blocks Where do we need interface blocks in the following example? B E CD

CMPUT Compiler Design and Optimization11 Interface Blocks We need three interface blocks. B E CD F G H

CMPUT Compiler Design and Optimization12 Hierarchical Regions For the global code scheduler, regions are hierarchical: (1) First the code of an inner most loop is selected and scheduled. (2) Then a summary of the data flow and resource usage of the loop is computed, and the loop is converted into a single node in the graph.

CMPUT Compiler Design and Optimization13 Nested Regions A C B D E F2 F1 F3 A C B D E F2 F1 F3 G HJKI G, J, and K are JS blocks H and I are interface blocks

CMPUT Compiler Design and Optimization14 Path Vectors There is a finite number of control paths in an acyclic scheduling region. A path vector is a bit vector in which each bit in the vector represents a unique path in a region. A subset of paths can be represented by a path vector by writing 1 for the paths in the subset and writing 0 for the paths not in the subset.

CMPUT Compiler Design and Optimization15 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI We can define the subset of all paths that include basic block G as BP(G) = {P2, P3} And we can represent this set by the block path vector: BPV(G) = [ ]

CMPUT Compiler Design and Optimization16 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI

CMPUT Compiler Design and Optimization17 Control Flow Relations We can compute control flow relations such as dominance, post-dominance, control equivalence, disjointness, etc, by performing bitwise operations on these path vectors. If BPV(x) = BPV(y), then blocks x and y are control flow equivalent. If BPV(x) is a superset of BPV(y), then block x either dominates or post-dominates block y.

CMPUT Compiler Design and Optimization18 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Example1: What is the relation between blocks B and D? Blocks B and D are control flow equivalent because BPV(B) = BPV(D).

CMPUT Compiler Design and Optimization19 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Either block A dominates or post-dominates block E because and BPV(A) is a superset of BPV(E). Example 2: What is the relation between blocks B and D?

CMPUT Compiler Design and Optimization20 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Example3: Likewise block E either dominates or post-dominates block K because and BPV(E) is a superset of BPV(K).

CMPUT Compiler Design and Optimization21 Problems with Cross-Block Scheduling Most cross-block scheduling techniques are not judicious when scheduling compensation code. Consider that the scheduling of an instruction M in block x requires compensation code in block y. Most schedulers cannot evaluate how desirable it is to place the compensation code in y. Some schedulers only allow M to be scheduled in x if y has not been scheduled yet. Compensation code is code that needs to be scheduled somewhere else to compensate for the execution of an instruction M on a block x.

CMPUT Compiler Design and Optimization22 Wavefront A scheduling region is an acyclic region with JS edges eliminated and interface blocks added. A wavefront is a strongly independent cut set that partitions a scheduling region in three parts:  nodes above the wavefront  nodes on the wavefront  nodes below the wavefront The wavefront is strongly independent in the sense that no control flow path flows through more than one node in the wavefront.

CMPUT Compiler Design and Optimization23 Wavefront Dominance Property The wavefront nodes collectively dominate all the nodes below the wavefront, and collectively post-dominate all the nodes above the wavefront. Consider two blocks in the region: Block k is not in the wavefront Block w is in the wavefront This property guarantees that when an instruction originally in block k is scheduled in block w, compensation code can be inserted entirely into blocks in the wavefront.

CMPUT Compiler Design and Optimization24 JS-nodes and Strongly Independent Cuts A F B D C E JH KI Can you build a wavefront that includes C and satisfy the conditions of dominance, post-dominance, and no control path including more than one node in the wavefront? First try: {C, F} This wavefront does not post-dominate A,B nor it dominates D, H, J, E.

CMPUT Compiler Design and Optimization25 JS-nodes and Strongly Independent Cuts A F B D C E JH KI Can you build a wavefront that includes C and satisfy the conditions of dominance, post-dominance, and no control path including more than one node in the wavefront? The path ABCDH includes two nodes in the wavefront therefore the wavefront is not a strongly independent cut set. Second try: {C, D, F}

CMPUT Compiler Design and Optimization26 JS-nodes and Strongly Independent Cuts A F B D CG E JH KI When the proper JS-node is inserted, we can easily find a wavefront that: (1) post-dominates all predecessors, (2) dominates all successors, and (3) is a strongly independent cut set (no control path includes more than one node in the wavefront).

CMPUT Compiler Design and Optimization27 Wavefront Scheduling In directional scheduling (either top-down or bottom-up) there is a region of code that is already scheduled, another region that is not yet scheduled, and a boundary. In wavefront scheduling, the wavefront is this boundary. The wavefront moves up or down according to the direction of scheduling choosen.

CMPUT Compiler Design and Optimization28 Example of Wavefront Scheduling A F B D CG E JH KI W0 W2 W4 W1 W6 W3 W5

CMPUT Compiler Design and Optimization29 Deferred Compensation A B E CD G F Consider that an instruction M is originally in block A. If we want to move M downward we have to schedule M in all paths that contain an use of the variable defined by M. For instance, assume that there is an use of M in G.

CMPUT Compiler Design and Optimization30 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG Thus a clone of M must appear in paths P0, P1, and P2. The compensation path vector of an instruction M is the set of all paths that must contain a clone of M when M is not scheduled in its original basic block. CPV(M) = [1 1 1]

CMPUT Compiler Design and Optimization31 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG CPV(M) = [1 1 1] W1 Assume that we decide that it is desirable to schedule a clone of M, M’, in block F. We update CPV(M) to: CPV(M) = CPV(M) - BPV(F) = [1 1 1] - [0 0 1] = [1 1 0] M’

CMPUT Compiler Design and Optimization32 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG CPV(M) = [1 1 0] W2 Assume that at W2 we decide to schedule a clone of M, M’’, in block C. CPV(M) = CPV(M) - BPV(C) = [1 1 1] - [1 0 0] = [0 1 0] M’

CMPUT Compiler Design and Optimization33 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG CPV(M) = [0 1 0] W2 Now we cannot close block D unless we schedule M. M’ M’’ Because BPV(B) is a superset of CPV(M) we know that this is the last compensation copy of M to be scheduled.

CMPUT Compiler Design and Optimization34 When to Move Code? Bharadwaj, Menezes and McKinsey define the usefulness of moving code from an origin block O to a target block T in terms of the likelihood that control will flow through T and O given that control reaches T.

CMPUT Compiler Design and Optimization35