1 CS 201 Compiler Construction Software Pipelining: Circular Scheduling.

Presentation on theme: "1 CS 201 Compiler Construction Software Pipelining: Circular Scheduling."— Presentation transcript:

1 CS 201 Compiler Construction Software Pipelining: Circular Scheduling

Motivation Trace Scheduling uncovers ILP in acyclic segments of code – another technique is needed to exploit ILP across loop iterations. 1. Loop Unrolling: Unrolling a loop converts ILP across loop iterations to ILP within a single iteration that can be exploited using trace scheduling. * drawback is growth in code size. 2.Software Pipelining: * converts ILP across loop iterations to ILP within a single iteration without significant growth in code size. 2

Software Pipelining 2.Software Pipelining Contd: * a single iteration of the transformed loop contains a single occurrence of each instruction – this is why code growth is less than unrolling. * loop iteration so constructed brings instances of statements from different loop iterations of the original loop into the same loop iteration. 3

Software Pipelining Contd.. 4

5 Ld 1 Ld 2 Add 1 Ld 3 Add 2 St 1 …..Add 3 St 2 …..……St 3 …..……….. Ld n-2 ……….. Ld n-1 Add n- 2 ….. Ld n Add n- 1 St n-2 Add n St n-1 St n Ld 1 Ld 2 Add 1 Ld i+1 Add i St i-1 Add n St n-1 St n i =2,n-1Loop Prologue Epilogue Prologue + Epilogue=2 iterations Loop = n-2 iterations

Circular Scheduling An algorithm for Software Pipelining that is suitable for scalar architectures –Limited amount of ILP can be exploited –Limited number of registers are available Assumption: register allocation has already been done Approach: Identify idle slots in the instruction schedule and try to fill them by propagating instructions across loop iterations Continue to do the above as long as the schedule continues to improve If register allocation needs to be modified to allow instruction motion, then do so. 6

Circular Scheduling Contd.. Construct a DAG for the loop body. Moving an instruction from later iteration to earlier iteration corresponds to moving an instruction from top of the DAG to the bottom of the DAG. An instruction moved from top of the loop to the bottom is called a circled instruction. If each instruction can only circle once: circled instructions form the prologue; remaining instructions form the epilogue; loop is executed N-1 times. 7 N iterations

Circular Scheduling Contd.. 8 I1I1 I2I2 ININ …… I1I1 I2I2 I3I3 Prologue Loop N-1 Iterations Circled instructions Epilogue Non-circled instrns

Circular Scheduling Contd.. 9 …… Ramp-Up Ramp-Down Effect Before After

Circular Scheduling Contd.. 10 for (i=0; i<N; i=i+1) X[i] := X[i] + C -- initialization F8  C R3  0 R2  N -- loop body Loop: F4  0(R3) R3  R3+1 F6  F4+F8 BNE R3,R2,Loop -1(R3)  F6

Circular Scheduling Contd.. 11 -- initialization F8  C R3  0 R2  N -- prologue R3  R3+1 BEQ R3,R2,Lend F4  -1(R3) -- loop body Loop: F6  F4+F8 F4  0(R3) R3  R3+1 BNE R3,R2,Loop -2(R3)  F6 -- epilogue Lend: F6  F4+F8 -2(R3)  F6 -- initialization F8  C R3  0 R2  N -- loop body Loop: F4  0(R3) R3  R3+1 F6  F4+F8 BNE R3,R2,Loop -1(R3)  F6 Circled instructions

Algorithm 1.Apply basic block scheduling to the loop; if no stalls present, use the schedule ; otherwise continue. 2.If the loop has no procedure calls & if-statements then perform circular scheduling; otherwise give up. 3.Select one of the root nodes of the DAG for cycling – choose one on the longest path (simple heuristic). 4.Rebuild the DAG assuming recycling has been performed. 5.If no stalls are present, use current schedule else if there are more stalls than before, use previous schedule else repeat steps 3 & 4 to remove additional stalls. 6.Create prologue & epilogue; alter the number of times the loop body is executed. 12

Register Renaming Since register allocation is done prior to circular scheduling, dependences due to register usage may inhibit code motion. Solution: Perform register renaming during circular scheduling. 13 VS Def R1 Use R1 Def R1 Use R1 Def R1 Use R1 Def R2 Use R2

Register Renaming Contd.. 1.Identify registers that are not live at the beginning and the end of the basic block: these registers form the pool of temporary registers available for temporary usage during renaming. 2.Ignore dependences due to reuse of registers during building of the DAG. 3.Pick instruction. –If instruction uses a temporary register replace that register by a new register (from pool) that was used when the Def corresponding to the Use was processed. If this is the last use, then put the register back in the available pool. 14

Register Renaming Contd.. –If instruction defines a temporary register a new register is chosen from the available pool of registers. –Repeat above steps till the basic block has been scheduled. To avoid running out of registers, given two candidate instructions, select first an instruction that does not need a new register or frees up a temporary register. If renaming fails – give up and use previous schedule. 15

Sample Problem 16

Contd. 17