Download presentation
Presentation is loading. Please wait.
1
VLIW Compilation Techniques in a Superscalar Environment Kemal Ebcioglu, Randy D. Groves, Ki- Chang Kim, Gabriel M. Silberman and Isaac Ziv PLDI 1994. Presented by Jason Horihan
2
Why do we need a special compiler when we have “Super Beast” superscalar processors that extract ILP for us? Processor hardware can only look ahead a small distance to extract ILP Processor hardware can only look ahead a small distance to extract ILP Branch Prediction is not perfect and can only take us so far. Branch Prediction is not perfect and can only take us so far.
3
VLIW Scheduling Techniques Speculative Load/Store Motion out of Loops Speculative Load/Store Motion out of Loops Unspeculation Unspeculation Scheduling Scheduling Limited Combining Limited Combining Basic Block Expansion Basic Block Expansion Prolog Tailoring Prolog Tailoring All of these are implemented at the code generation stage of the compiler. All of these are implemented at the code generation stage of the compiler.
4
Speculative Load/Store Motion out of Loops Loads and Stores can be moved if: Loads and Stores can be moved if: 1. Within each group of loads and stores: 1. Within each group of loads and stores: - Each instruction uses the same base register - Each instruction has the same displacement from this base - Each instruction operates on identical operand data length and type
5
2. The base register of each group is not written to in the loop. 2. The base register of each group is not written to in the loop. 3. There is no overlap with the group operands and any other memory reference in the loop 3. There is no overlap with the group operands and any other memory reference in the loop 4.On every path to the entrance of the loop, a load of an address constant to the base register -or- a load or store to the same location to insure “safe” operation 4.On every path to the entrance of the loop, a load of an address constant to the base register -or- a load or store to the same location to insure “safe” operation
6
Transformed Code: Ld r4, a(r2) …. Ld r10,a(r2) L1:Mv r12,r10 Ai r12,r12,6 Mv r10,r10 ….. Br L1 St r10, a(r2) Original Code: Original Code: Ld r4, a(r2) …. L1:Ld r12,a(r2) Ai r12,r12,6 St r12,a(r2) ….. Br L1
7
Unspeculation Instructions moved above conditional branches to improve performance can lower performance when execution goes down the path where the speculative instructions were not needed. Instructions moved above conditional branches to improve performance can lower performance when execution goes down the path where the speculative instructions were not needed. Moving some of these speculative instructions down into one of the paths can increase performance Moving some of these speculative instructions down into one of the paths can increase performance
8
To perform unspeculation on an instruction (or group of), conditions must be met: To perform unspeculation on an instruction (or group of), conditions must be met: 1.The destination register(s) of the speculative group on one of the paths must ALL be dead. 2.Any instructions between the speculative instruction and the conditional branch must not define or use any of the registers used in the speculative instructions. 3.Instructions cannot have side-effects
9
Scheduling Loop Unrolling Loop Unrolling Renaming Renaming Global Scheduling Global Scheduling Software Pipelining Software Pipelining
10
Limited Combining Similar to value numbering, but spans multiple blocks. Similar to value numbering, but spans multiple blocks. 1. Starts with a load immediate or a move register 2. Searches sequence of following instructions, following non-conditional jumps, until a last use is found. 3. Source or destination registers of starting instruction can not be set in the sequence
11
If the search succeeds, the entire sequence of instructions, from the instruction after the starting instruction to the last use instruction is inserted in place of the starting instruction. If the search succeeds, the entire sequence of instructions, from the instruction after the starting instruction to the last use instruction is inserted in place of the starting instruction. Occurrences of the destination register from the starting instructions are replaced with its source register. Occurrences of the destination register from the starting instructions are replaced with its source register. A branch from the “new” last use instruction is inserted to jump to the instruction after the “old” last use instruction. A branch from the “new” last use instruction is inserted to jump to the instruction after the “old” last use instruction.
12
Original Code Mvr5, r4 …. Br L3 ….L3: Ld r3, 4(r5) …. Br L4 ….L4: Ld r7, 8(r5) Transformed Code …. Ld r3, 4(r4) …. Ld r7, 8(r4) Br L10 L3: Ld r3, 4(r5) …. Br L4 …. L4: Ld r7, 8(r5) L10:
13
Basic Block Expansion Main goal is to eliminate unconditional jumps at the end of some basic blocks. Main goal is to eliminate unconditional jumps at the end of some basic blocks. Begin by copying instructions at the target of the unconditional branch and inserting them before the unconditional branch. Begin by copying instructions at the target of the unconditional branch and inserting them before the unconditional branch. When enough consecutive non-branch instructions have been gathered, the copy stops. When enough consecutive non-branch instructions have been gathered, the copy stops.
14
Original Code …. Bz r1, L1 Op Br L2 …. L2: Bz r3, Lx Op1 Op2 Br L2 L3: Transformed Code …. Bz r1, L1 Op Bz r3, Lx Op1 Op2 Br L2 …. L2: Bz r3, Lx Op1 Op2 L2aBr L2 L3:
15
Prolog Tailoring When entering and exiting a procedure, registers must be saved and restored in the prolog and epilog. When entering and exiting a procedure, registers must be saved and restored in the prolog and epilog. Prolog Tailoring delays the saving of the registers until absolutely necessary. Prolog Tailoring delays the saving of the registers until absolutely necessary. This shortens the execution path and only saves what is necessary for a given path This shortens the execution path and only saves what is necessary for a given path Exception handlers must be changed Exception handlers must be changed
16
Prolog Tailoring Algorithm: Prolog Tailoring Algorithm: 1. Generate a “MustKill” set for each node in program graph. 2. If at a given node, a register that hasn’t been savedbefore will definitely be killed, code must be generated to save this register
17
Proc p1 save r1,r2,r3,r4 …. Ldr2,... Ld r1,… …. restore r1,r2 returnL1: ldr3,.. …. ldr4,.. ldr3,… …. …. restore r3,r4 return Proc p1 …. save r1,r2 Ldr2,... Ld r1,… …. restore r1,r2 return L1: save r3 ldr3,.. …. save r4 ldr4,.. ldr3,… restore r4 …. restore r3 return
18
Results SPECint92 Measurements (yeah!) Benchmark Xlc Time Xlc Specmark VLIW Time VLIW Specmark Espresso41.7054.4438.3059.27 Li99.0062.6681.9075.82 Eqntott13.6080.8810.70102.80 Compress53.9051.3948.1057.59 Sc69.2065.4662.4072.60 Gcc91.4059.6190.2060.53 SPECint9261.7369.93 Measurements done on a RS/6000 model 980
19
Questions? ????????????????????????????????????????
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.