Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral

Similar presentations


Presentation on theme: "CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral"— Presentation transcript:

1 CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680

2 CMPUT 680 - Compiler Design and Optimization2 Reading Material Bharadwaj, J., Menezes, K., McKinsey, C., “Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs,” Proceedings of 32nd International Symposium on Microarchitecture, Dec. 1996, pp. 100-113. Bharadwaj, J., “Method and apparatus for instruction scheduling to reduce negative effects of compensation code,” Patent No. 5,894,576, April 3 1999

3 CMPUT 680 - Compiler Design and Optimization3 New Concepts Global Code Scheduler (GCS) Region Formation Wavefront Scheduling Path Vectors Deferred Compensation P-ready Code Motion

4 CMPUT 680 - Compiler Design and Optimization4 Scheduling Regions Similar to Mahlke’s definition, here a region is a subgraph of a control flow graph that has a unique entry node that dominates all the nodes in the region. There is a further restriction that the regions must be acyclic.

5 CMPUT 680 - Compiler Design and Optimization5 JS-nodes A Join-Split (JS) edge in a CFG goes from a split node to a join node. A split node in a CFG is a node that has more than one immediate successor. A join node in a CFG is a node that has more than one immediate predecessor. C B D B D

6 CMPUT 680 - Compiler Design and Optimization6 Removal of JS-nodes C B D The application of the wavefront scheduling technique requires the removal of al JS-nodes. A JS-node is removed by adding an empty block (called a JS block) between the split node and the join node. C B D G

7 CMPUT 680 - Compiler Design and Optimization7 Interface Blocks A side entry node is a node in the region that has at least one immediate predecessor in the region, and at least one immediate predecessor outside the region. B E CD Which nodes are side entry nodes in the example? D D

8 CMPUT 680 - Compiler Design and Optimization8 Interface Blocks A side exit node is a node in the region that has at least one immediate successor in the region, and at least one immediate successor outside the region. Which nodes are side exit nodes in the example? C and D CD B E CDCD

9 CMPUT 680 - Compiler Design and Optimization9 Interface Blocks When control enters or leaves the region, GCS may require a block to schedule compensation code in. Thus interface blocks are inserted between two nodes x and y iff: (i) x is outside of the region, y is a side entry node, and there is an edge (x,y), or (ii) y is outside the region, x is a side exit node, and there is an edge (x,y).

10 CMPUT 680 - Compiler Design and Optimization10 Interface Blocks Where do we need interface blocks in the following example? B E CD

11 CMPUT 680 - Compiler Design and Optimization11 Interface Blocks We need three interface blocks. B E CD F G H

12 CMPUT 680 - Compiler Design and Optimization12 Hierarchical Regions For the global code scheduler, regions are hierarchical: (1) First the code of an inner most loop is selected and scheduled. (2) Then a summary of the data flow and resource usage of the loop is computed, and the loop is converted into a single node in the graph.

13 CMPUT 680 - Compiler Design and Optimization13 Nested Regions A C B D E F2 F1 F3 A C B D E F2 F1 F3 G HJKI G, J, and K are JS blocks H and I are interface blocks

14 CMPUT 680 - Compiler Design and Optimization14 Path Vectors There is a finite number of control paths in an acyclic scheduling region. A path vector is a bit vector in which each bit in the vector represents a unique path in a region. A subset of paths can be represented by a path vector by writing 1 for the paths in the subset and writing 0 for the paths not in the subset.

15 CMPUT 680 - Compiler Design and Optimization15 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI We can define the subset of all paths that include basic block G as BP(G) = {P2, P3} And we can represent this set by the block path vector: BPV(G) = [ 0 0 1 1 0 0]

16 CMPUT 680 - Compiler Design and Optimization16 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI

17 CMPUT 680 - Compiler Design and Optimization17 Control Flow Relations We can compute control flow relations such as dominance, post-dominance, control equivalence, disjointness, etc, by performing bitwise operations on these path vectors. If BPV(x) = BPV(y), then blocks x and y are control flow equivalent. If BPV(x) is a superset of BPV(y), then block x either dominates or post-dominates block y.

18 CMPUT 680 - Compiler Design and Optimization18 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Example1: What is the relation between blocks B and D? Blocks B and D are control flow equivalent because BPV(B) = BPV(D).

19 CMPUT 680 - Compiler Design and Optimization19 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Either block A dominates or post-dominates block E because and BPV(A) is a superset of BPV(E). Example 2: What is the relation between blocks B and D?

20 CMPUT 680 - Compiler Design and Optimization20 Paths in our Example A F B D CG E JH KI Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Example3: Likewise block E either dominates or post-dominates block K because and BPV(E) is a superset of BPV(K).

21 CMPUT 680 - Compiler Design and Optimization21 Problems with Cross-Block Scheduling Most cross-block scheduling techniques are not judicious when scheduling compensation code. Consider that the scheduling of an instruction M in block x requires compensation code in block y. Most schedulers cannot evaluate how desirable it is to place the compensation code in y. Some schedulers only allow M to be scheduled in x if y has not been scheduled yet. Compensation code is code that needs to be scheduled somewhere else to compensate for the execution of an instruction M on a block x.

22 CMPUT 680 - Compiler Design and Optimization22 Wavefront A scheduling region is an acyclic region with JS edges eliminated and interface blocks added. A wavefront is a strongly independent cut set that partitions a scheduling region in three parts:  nodes above the wavefront  nodes on the wavefront  nodes below the wavefront The wavefront is strongly independent in the sense that no control flow path flows through more than one node in the wavefront.

23 CMPUT 680 - Compiler Design and Optimization23 Wavefront Dominance Property The wavefront nodes collectively dominate all the nodes below the wavefront, and collectively post-dominate all the nodes above the wavefront. Consider two blocks in the region: Block k is not in the wavefront Block w is in the wavefront This property guarantees that when an instruction originally in block k is scheduled in block w, compensation code can be inserted entirely into blocks in the wavefront.

24 CMPUT 680 - Compiler Design and Optimization24 JS-nodes and Strongly Independent Cuts A F B D C E JH KI Can you build a wavefront that includes C and satisfy the conditions of dominance, post-dominance, and no control path including more than one node in the wavefront? First try: {C, F} This wavefront does not post-dominate A,B nor it dominates D, H, J, E.

25 CMPUT 680 - Compiler Design and Optimization25 JS-nodes and Strongly Independent Cuts A F B D C E JH KI Can you build a wavefront that includes C and satisfy the conditions of dominance, post-dominance, and no control path including more than one node in the wavefront? The path ABCDH includes two nodes in the wavefront therefore the wavefront is not a strongly independent cut set. Second try: {C, D, F}

26 CMPUT 680 - Compiler Design and Optimization26 JS-nodes and Strongly Independent Cuts A F B D CG E JH KI When the proper JS-node is inserted, we can easily find a wavefront that: (1) post-dominates all predecessors, (2) dominates all successors, and (3) is a strongly independent cut set (no control path includes more than one node in the wavefront).

27 CMPUT 680 - Compiler Design and Optimization27 Wavefront Scheduling In directional scheduling (either top-down or bottom-up) there is a region of code that is already scheduled, another region that is not yet scheduled, and a boundary. In wavefront scheduling, the wavefront is this boundary. The wavefront moves up or down according to the direction of scheduling choosen.

28 CMPUT 680 - Compiler Design and Optimization28 Example of Wavefront Scheduling A F B D CG E JH KI W0 W2 W4 W1 W6 W3 W5

29 CMPUT 680 - Compiler Design and Optimization29 Deferred Compensation A B E CD G F Consider that an instruction M is originally in block A. If we want to move M downward we have to schedule M in all paths that contain an use of the variable defined by M. For instance, assume that there is an use of M in G.

30 CMPUT 680 - Compiler Design and Optimization30 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG Thus a clone of M must appear in paths P0, P1, and P2. The compensation path vector of an instruction M is the set of all paths that must contain a clone of M when M is not scheduled in its original basic block. CPV(M) = [1 1 1]

31 CMPUT 680 - Compiler Design and Optimization31 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG CPV(M) = [1 1 1] W1 Assume that we decide that it is desirable to schedule a clone of M, M’, in block F. We update CPV(M) to: CPV(M) = CPV(M) - BPV(F) = [1 1 1] - [0 0 1] = [1 1 0] M’

32 CMPUT 680 - Compiler Design and Optimization32 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG CPV(M) = [1 1 0] W2 Assume that at W2 we decide to schedule a clone of M, M’’, in block C. CPV(M) = CPV(M) - BPV(C) = [1 1 1] - [1 0 0] = [0 1 0] M’

33 CMPUT 680 - Compiler Design and Optimization33 Deferred Compensation A B E CD G F Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG CPV(M) = [0 1 0] W2 Now we cannot close block D unless we schedule M. M’ M’’ Because BPV(B) is a superset of CPV(M) we know that this is the last compensation copy of M to be scheduled.

34 CMPUT 680 - Compiler Design and Optimization34 When to Move Code? Bharadwaj, Menezes and McKinsey define the usefulness of moving code from an origin block O to a target block T in terms of the likelihood that control will flow through T and O given that control reaches T.

35 CMPUT 680 - Compiler Design and Optimization35


Download ppt "CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral"

Similar presentations


Ads by Google