Download presentation
Presentation is loading. Please wait.
1
Instruction Level Parallelism (ILP)
Pipelining is one form of concurrency With several pipelines, opportunity for executing several (pipelined) instructions concurrently With more sophisticated control units, possibility of issuing several instructions in the same cycle The order of execution is determined by the compiler: static scheduling for superscalar processors The order of execution is determined at run-time: dynamic scheduling for out-of-order execution processors 1/17/2019 CSE 471 ILP
2
ILP improves (reduces) CPI
Recall: Single Pipeline CPI = 1 (Ideal CPI) + CPI contributed by stalls With allowing n instruction issues per cycle CPIn = 1/n (Ideal CPI) + CPI contributed by stalls ILP will increase structural hazards ILP makes reduction in other stalls even more important A “bubble” costs more than the loss of a single instruction issue 1/17/2019 CSE 471 ILP
3
Where can we optimize? (control)
The unit of code where one can find ILP is the basic block Contiguous block of instructions with a single entry point and a single exit point Basic blocks are small (branches occur about 30% of the time) ILP can be increased (CPI due to control stalls decreased) by Reducing the number of branches: loop unrolling (compiler) Branch prediction: Static (compiler) or dynamic (hardware) Code movement: based on trace scheduling (compiler) and/or other methods (will not be covered in this course) Predication (compiler and hardware): see later 1/17/2019 CSE 471 ILP
4
Where can we optimize? (data)
CPI contributed by data hazards can be decreased by compiler optimizations load scheduling, dependence analysis, software pipelining, trace scheduling Hardware (run-time) techniques forwarding (we know all about that!) register renaming 1/17/2019 CSE 471 ILP
5
Compiler optimizations ( a sample)
The processor (number of pipelines, latencies of various units, number of registers) is exposed to the compiler Loop unrolling (see next slide) to increase potential for ILP Avoid dependencies (data, name, control) at the instruction level to increase ILP Data dependence translates into RAW load scheduling and code reorganization Name dependence (register allocation): anti-dependence ( WAR) and output dependence (WAW) Control dependence = branches (predicated execution) 1/17/2019 CSE 471 ILP
6
Loop unrolling Replicate the body of the loop and reduce correspondingly the number of iterations (software pipelining) Pros Decrease loop overhead (branches, counter settings) Allows better scheduling (longer basic blocks hence better opportunities to find ILP and to hide latency of “long” operations) Cons Increases register pressure Increases code length (I-cache occupancy) Requires prologue or epilogue (beginning or end of loop) 1/17/2019 CSE 471 ILP
7
Data dependencies: RAW
Instruction j is dependent on instruction i preceding it in sequential program order if the result of i is a source operand of j Transitivity: Instruction j dependent on k and k dependent on i Dependence is a program property Hazards (RAW in this case) and their (partial) removals is a pipeline organization property Code optimization goal Maintain dependence and avoid hazard (e.g., scheduling) Eliminate dependence by modifying code (e.g., register allocation) 1/17/2019 CSE 471 ILP
8
Name dependencies: WAR and WAW
Anti dependence: WAR Instruction i: R1 <- R2 (long operator) R3 Instruction j: R2 <- R4 (short operator) R5 This is a WAR hazard if instruction j finishes first Output dependence Instruction j: R1 <- R4 (short operator) R5 This is a WAW hazard if instruction j finishes first In both cases, not really a dependence but a “naming” problem (would not happen if we had enough registers) 1/17/2019 CSE 471 ILP
9
Control dependencies Branches restrict the scheduling of instructions
Speculation (i.e., executing an instruction that might not be needed) must be: Safe (no additional exception) Legal (the end result should be the same as without speculation) Speculation can be implemented by: Compiler (code motion) Hardware (branch prediction) Both (branch prediction, conditional operations) 1/17/2019 CSE 471 ILP
10
Static vs. dynamic scheduling
Assumptions (for now): 1 instruction issue / cycle Several pipelines with a common IF and ID Static scheduling (optimized by compiler) When there is a stall (hazard) no further issue of instructions (example Alpha and 21164) Dynamic scheduling (enforced by hardware) Instructions following the one that stalls can “issue” if no structural hazard or dependence issue might take different meanings as we’ll see 1/17/2019 CSE 471 ILP
11
Dynamic scheduling Implies possibility of:
Out of order issue (and hence out of order execution) Out of order completion (also true but less frequent in static scheduling) Imprecise exceptions Example (different pipes for add/sub and divide) R1 = R2/ R (long latency) R2 = R1 + R (stall because of RAW on R1) R6 = R7 - R (can be issued before the add, executed and completed before the other 2) 1/17/2019 CSE 471 ILP
12
Conceptual execution on a processor with ILP
Instruction fetch and branch prediction Corresponds to IF in simple pipeline (we have seen that) Instruction decode, dependence check, dispatch, issue Corresponds (many variations) to ID Although instructions are issued (i.e., assigned to functional units), they might not execute right away (might wait for dependent instructions to forward their result) Instruction execution Corresponds to EX and/or MEM (with various latencies) Instruction commit Corresponds to WB but more complex because of speculative and out-of-order completion 1/17/2019 CSE 471 ILP
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.