Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 6461: Computer Architecture Basic Compiler Techniques for Exposing ILP Instructor: Morris Lancaster Corresponding to Hennessey and Patterson Fifth Edition.

Similar presentations


Presentation on theme: "CS 6461: Computer Architecture Basic Compiler Techniques for Exposing ILP Instructor: Morris Lancaster Corresponding to Hennessey and Patterson Fifth Edition."— Presentation transcript:

1 CS 6461: Computer Architecture Basic Compiler Techniques for Exposing ILP Instructor: Morris Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.2

2 January 2013CS 6461 Compiler Based Scheduling2 Basic Compiler Techniques for Exposing ILP Crucial for processors that use static issue, and important for processors that make dynamic issue decisions but use static scheduling

3 January 2013CS 6461 Compiler Based Scheduling3 Basic Pipeline Scheduling and Loop Unrolling Exploiting parallelism among instructions –Finding sequences of unrelated instructions that can be overlapped in the pipeline –Separation of a dependent instruction from a source instruction by a distance in clock cycles equal to the pipeline latency of the source instruction. (Avoid the stall) The compiler works with a knowledge of the amount of available ILP in the program and the latencies of the functional units within the pipeline –This couples the compiler, sometimes to the specific chip version, or at least requires the setting of appropriate compiler flags

4 January 2013CS 6461 Compiler Based Scheduling4 Assumed Latencies Instruction Producing ResultInstruction Using ResultLatency In Clock Cycles (needed to avoid stall) FP ALU opAnother FP ALU op3 FP ALU opStore double2 Load doubleFP ALU op1 Load doubleStore double0 Result of the load can be bypassed without stalling store

5 January 2013CS 6461 Compiler Based Scheduling5 Basic Pipeline Scheduling and Loop Unrolling (cont) Assume standard 5 stage integer pipeline –Branches have a delay of one clock cycle Functional units are fully pipelined or replicated (as many times as the pipeline depth) –An operation of any type can be issued on every clock cycle and there are no structural hazards

6 January 2013CS 6461 Compiler Based Scheduling6 Basic Pipeline Scheduling and Loop Unrolling (cont) Sample code For (i=1000; i>0; i=i-1) x[i] = x[i] + s; MIPS code Loop:L.DF0,0(R1);F0 = array element ADD.DF4,F0,F2;add scalar in F2 S.DF4,0(R1);store back DADDUI R1,R1,#-8;decrement index BNER1,R2,Loop;R2 is precomputed so that ;8(R2) is last value to be ;computed

7 January 2013CS 6461 Compiler Based Scheduling7 Basic Pipeline Scheduling and Loop Unrolling (cont) MIPS code Loop:L.DF0,0(R1);1 clock cycle stall;2 ADD.DF4,F0,F2;3 stall;4 stall;5 S.DF4,0(R1);6 DADDUI R1,R1,#-8;7 stall;8 BNER1,R2,Loop;9

8 January 2013CS 6461 Compiler Based Scheduling8 Rescheduling Gives Sample code For (i=1000; i>0; i=i-1) x[i] = x[i] + s; MIPS code Loop:L.DF0,0(R1)1 DADDUI R1,R1,#-82 ADD.DF4,F0,F2*3 stall4 stall5 S.DF4,8(R1)* 6 BNER1,R2,Loop7

9 January 2013CS 6461 Compiler Based Scheduling9 Unrolling Summary (continued) Simple Unroll Loop:L.DF0,0(R1) ADD.DF4,F0,F2 S.DF4,0(R1) L.DF0,-8(R1) ADD.DF4,F0,F2 S.DF4,-8(R1) L.DF0,-16(R1) ADD.DF4,F0,F2 S.DF4,-16(R1) L.DF0,-24(R1) ADD.DF4,F0,F2 S.DF4,-24(R1) DADDUI R1,R1,#-32 BNER1,R2,Loop Name Dependences Data Dependences

10 January 2013CS 6461 Compiler Based Scheduling10 Unrolling and Renaming Gives MIPS code Loop:L.DF0,0(R1) ADD.DF4,F0,F2 we have a stall coming S.DF4,0(R1) L.DF6,-8(R1) ADD.DF8,F6,F2 S.DF8,-8(R1) L.DF10,-16(R1) ADD.DF12,F10,F2 S.DF12,-16(R1) L.DF14,-24(R1) ADD.DF16,F14,F2 S.DF16,-24(R1) DADDUIR1,R1,#-32 BNER1,R2,Loop

11 January 2013CS 6461 Compiler Based Scheduling11 Unrolling and Removing Hazards Gives MIPS code Loop:L.DF0,0(R1);total of 14 clock cycles L.DF6,-8(R1) L.DF10,-16(R1) L.DF14,-24(R1) ADD.DF4,F0,F2 ADD.DF8,F6,F2 ADD.DF12,F10,F2 ADD.DF16,F14,F2 S.DF4,0(R1) S.DF8,-8(R1) DADDUIR1,R1,#-32 S.DF12,16(R1) S.DF16,8(R1) BNER1,R2,Loop

12 January 2013CS 6461 Compiler Based Scheduling12 Unrolling Summary for Above Determine that it was legal to move the S.D after the DADDUI and BNE, and find the amount to adjust the S.D offset Determine that unrolling the loop would be useful by finding that the loop iterations were independent, except for loop maintenance code Use different registers to avoid unnecessary constraints that would be forced by using the same registers Eliminate the extra test and branch instruction and adjust the loop termination and iteration code. Determine that the loads and stores can be interchanged by determining that the loads and stores from different iterations are independent Schedule the code, preserving any dependencies

13 January 2013CS 6461 Compiler Based Scheduling13 Unrolling Summary (continued) Example on Page 311 shows the steps Loop:L.DF0,0(R1) ADD.DF4,F0,F2 S.DF4,0(R1) L.DF0,-8(R1) ADD.DF4,F0,F2 S.DF4,-8(R1) L.DF0,-16(R1) ADD.DF4,F0,F2 S.DF4,-16(R1) L.DF0,-24(R1) ADD.DF4,F0,F2 S.DF4,-24(R1) DADDUI R1,R1,#-32 BNER1,R2,Loop Name Dependences Data Dependences

14 January 2013CS 6461 Compiler Based Scheduling14 Unrolling Summary (Renaming) Example on Page 311 shows the steps Loop:L.DF0,0(R1) ADD.DF4,F0,F2 S.DF4,0(R1) L.DF6,-8(R1) ADD.DF8,F6,F2 S.DF8,-8(R1) L.DF10,-16(R1) ADD.DF12,F10,F2 S.DF12,-16(R1) L.DF14,-24(R1) ADD.DF16,F14,F2 S.DF16,-24(R1) DADDUI R1,R1,#-32 BNER1,R2,Loop Name Dependences Data Dependences

15 January 2013CS 6461 Compiler Based Scheduling15 Unrolling Summary (continued) Limits to Impacts of Unrolling Loops –As we unroll more, each unroll yields a decreased amount of improvement of distribution of overhead –Growth in code size –Shortfall in available registers (register pressure) Scheduling the code to increase ILP causes the number of live values to increase This could generate a shortage of registers and negatively impact the optimization Useful in a variety of processors today


Download ppt "CS 6461: Computer Architecture Basic Compiler Techniques for Exposing ILP Instructor: Morris Lancaster Corresponding to Hennessey and Patterson Fifth Edition."

Similar presentations


Ads by Google