Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Similar presentations


Presentation on theme: "Chapter 14 Instruction Level Parallelism and Superscalar Processors"— Presentation transcript:

1 Chapter 14 Instruction Level Parallelism and Superscalar Processors

2 What is Superscalar? A Superscalar machine executes multiple independent instructions in parallel. Common instructions (arithmetic, load/store, conditional branch) can be executed independently. Equally applicable to RISC & CISC, but more straightforward in RISC machines. The order of execution is usually determined by the compiler.

3 Example Superscalar Organization

4 Superpipelined Machines
Superpiplined machines overlap pipe stages - rely on stages being able to begin operations before the last is complete.

5 Superscalar v Superpipeline

6 Limitations of Superscalar
Dependent upon: Instruction level parallelism Compiler based optimization Hardware support Limited by True data dependency Procedural dependency Resource conflicts Output dependency Antidependency (one instruction can overwrite a value that an “earlier instruction” has not yet read)

7 Compare with the following?
True Data Dependency ADD r1, r2 (r1 := r1+r2) MOVE r3, r1 (r3 := r1) Can fetch and decode second instruction in parallel with first Can NOT execute second instruction until first is finished Compare with the following? LOAD r1, X (r1 := X) MOVE r3, r1 (r3 := r1) What additional problem do we have here?

8 Procedural Dependency
Can not execute instructions after a branch in parallel with instructions before a branch, because? Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed

9 Solution - Can possibly duplicate resources
Resource Conflict Two or more instructions requiring access to the same resource at the same time e.g. two arithmetic instructions Solution - Can possibly duplicate resources e.g. have two arithmetic units

10 Antidependancy ADD R3, R5, 1 ADD R4, R3, 1
Cannot complete the second instruction before the first has read R3

11 Effect of Dependencies

12 Instruction-level Parallelism
Consider: LOAD R1, R2 ADD R3, 1 ADD R4, R2 These can be handles in parallel ADD R4, R3 STO (R4), R0 These cannot

13 Instruction Issue Policies
Order in which instructions are fetched Order in which instructions are executed Order in which instructions change registers and memory

14 In-Order Issue In-Order Completion
Issue instructions in the order they occur: Not very efficient May fetch >1 instruction Instructions must stall if necessary

15 In-Order Issue In-Order Completion (Diagram)
Assume: I1 requires 2 cycles to execute I3 & I4 conflict for the same functional unit I5 depends upon value produced by I4 I5 & I6 conflict for a functional unit

16 In-Order Issue Out-of-Order Completion
How does this effect interrupts?

17 Out-of-Order Issue Out-of-Order Completion
Decouple decode pipeline from execution pipeline Can continue to fetch and decode until this pipeline is full When a functional unit becomes available an instruction can be executed Since instructions have been decoded, processor can look ahead

18 Out-of-Order Issue Out-of-Order Completion (Diagram)
Note: I5 depends upon I4, but I6 does not

19 Register Renaming Output and antidependencies occur because register contents may not reflect the correct ordering from the program Could result in a pipeline stall One solution: Registers allocated dynamically

20 Register Renaming example
R3b:=R3a + R5a (I1) R4b:=R3b (I2) R3c:=R5a (I3) R7b:=R3c + R4b (I4) Without subscript refers to logical register in instruction With subscript is hardware register allocated Note R3a R3b R3c

21 Machine Parallelism Support
Duplication of Resources Out of order issue Renaming Windowing

22 Speedups of Machine Organizations Without Procedural Dependencies

23 Study Conclusions Not worth duplication functional units without register renaming Need instruction window large enough (more than 8, probably not more than 32)

24 Branch Prediction in Superscalar Machines
Delayed branch not used much. Why? Branch prediction used - Branch history may still be useful

25 Superscalar Execution

26 Committing or Retiring Instructions
Sometimes results must be held in temporary storage until it is certain they can be placed in “permanent” storage. This temporary storage needs to be regularly cleaned up.

27 Superscalar Hardware Support
Simultaneously fetch multiple instructions Logic to determine true dependencies involving register values and Mechanisms to communicate these values Mechanisms to initiate multiple instructions in parallel Resources for parallel execution of multiple instructions Mechanisms for committing process state in correct order


Download ppt "Chapter 14 Instruction Level Parallelism and Superscalar Processors"

Similar presentations


Ads by Google