Lecture 9: Dynamic ILP Topics: out-of-order processors

Slides:

Advertisements

Similar presentations

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

CS6290 Speculation Recovery. Loose Ends Up to now: –Techniques for handling register dependencies Register renaming for WAR, WAW Tomasulo’s algorithm.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,

THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.

1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

March 1, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.

1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.

CS203 – Advanced Computer Architecture ILP and Speculation.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Lecture: Out-of-order Processors

Dynamic Scheduling Why go out of style?

/ Computer Architecture and Design

Smruti R. Sarangi IIT Delhi

Out of Order Processors

Lecture: Pipelining Extensions

Lecture: Out-of-order Processors

Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

Lecture 6: Advanced Pipelines

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 10: Out-of-order Processors

Lecture 11: Out-of-order Processors

Lecture: Out-of-order Processors

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 18: Pipelining Today’s topics:

Smruti R. Sarangi IIT Delhi

Lecture 11: Memory Data Flow Techniques

Lecture 18: Pipelining Today’s topics:

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

15-740/ Computer Architecture Lecture 5: Precise Exceptions

Krste Asanovic Electrical Engineering and Computer Sciences

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture: Pipelining Extensions

Lecture 20: OOO, Memory Hierarchy

Lecture 20: OOO, Memory Hierarchy

Adapted from the slides of Prof

Dynamic Hardware Prediction

Lecture 10: ILP Innovations

Lecture 9: ILP Innovations

Conceptual execution on a processor which exploits ILP

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6, class notes)

An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 T1 T2 T3 T4 T5 T6 Register File R1-R32 R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Decode & Rename T1  R1+R2 T2  T1+R3 BEQZ T2 T4  T1+T2 T5  T4+T2 ALU ALU ALU Instr Fetch Queue Results written to ROB and tags broadcast to IQ Issue Queue (IQ)

Design Details - I Instructions enter the pipeline in order No need for branch delay slots if prediction happens in time Instructions leave the pipeline in order – all instructions that enter also get placed in the ROB – the process of an instruction leaving the ROB (in order) is called commit – an instruction commits only if it and all instructions before it have completed successfully (without an exception) To preserve precise exceptions, a result is written into the register file only when the instruction commits – until then, the result is saved in a temporary register in the ROB

Design Details - II Instructions get renamed and placed in the issue queue – some operands are available (T1-T6; R1-R32), while others are being produced by instructions in flight (T1-T6) As instructions finish, they write results into the ROB (T1-T6) and broadcast the operand tag (T1-T6) to the issue queue – instructions now know if their operands are ready When a ready instruction issues, it reads its operands from T1-T6 and R1-R32 and executes (out-of-order execution) Can you have WAW or WAR hazards? By using more names (T1-T6), name dependences can be avoided

Design Details - III If instr-3 raises an exception, wait until it reaches the top of the ROB – at this point, R1-R32 contain results for all instructions up to instr-3 – save registers, save PC of instr-3, and service the exception If branch is a mispredict, flush all instructions after the branch and start on the correct path – mispredicted instrs will not have updated registers (the branch cannot commit until it has completed and the flush happens as soon as the branch completes) Potential problems: ?

Managing Register Names Temporary values are stored in the register file and not the ROB Logical Registers R1-R32 Physical Registers P1-P64 At the start, R1-R32 can be found in P1-P32 Instructions stop entering the pipeline when P64 is assigned R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 P33  P1+P2 P34  P33+P3 BEQZ P34 P35  P33+P34 What happens on commit?

The Commit Process On commit, no copy is required The register map table is updated – the “committed” value of R1 is now in P33 and not P1 – on an exception, P33 is copied to memory and not P1 An instruction in the issue queue need not modify its input operand when the producer commits When instruction-1 commits, we no longer have any use for P1 – it is put in a free pool and a new instruction can now enter the pipeline  for every instr that commits, a new instr can enter the pipeline  number of in-flight instrs is a constant = number of extra (rename) registers

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr fetch Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Committed Reg Map R1P1 R2P2 Register File P1-P64 R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Decode & Rename P33  P1+P2 P34  P33+P3 BEQZ P34 P35  P33+P34 P36  P35+P34 ALU ALU ALU Speculative Reg Map R1P36 R2P34 Instr Fetch Queue Results written to regfile and tags broadcast to IQ Issue Queue (IQ)

Out-of-Order Loads/Stores Ld R1  [R2] Ld R3  [R4] St R5  [R6] Ld R7  [R8] Ld R9[R10] What if the issue queue also had load/store instructions? Can we continue executing instructions out-of-order?

Memory Dependence Checking Ld 0x abcdef The issue queue checks for register dependences and executes instructions as soon as registers are ready Loads/stores access memory as well – must check for RAW, WAW, and WAR hazards for memory as well Hence, first check for register dependences to compute effective addresses; then check for memory dependences Ld St Ld Ld 0x abcdef St 0x abcd00 Ld 0x abc000 Ld 0x abcd00

Memory Dependence Checking Load and store addresses are maintained in program order in the Load/Store Queue (LSQ) Loads can issue if they are guaranteed to not have true dependences with earlier stores Stores can issue only if we are ready to modify memory (can not recover if an earlier instr raises an exception) Ld 0x abcdef Ld St Ld Ld 0x abcdef St 0x abcd00 Ld 0x abc000 Ld 0x abcd00

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr fetch Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Instr 7 Committed Reg Map R1P1 R2P2 Register File P1-P64 R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 LD R4  8[R3] ST R4  8[R1] Decode & Rename P33  P1+P2 P34  P33+P3 BEQZ P34 P35  P33+P34 P36  P35+P34 P37  8[P35] P37  8[P36] ALU ALU ALU Speculative Reg Map R1P36 R2P34 Results written to regfile and tags broadcast to IQ Instr Fetch Queue Issue Queue (IQ) ALU P37  [P35 + 8] P37  [P36 + 8] D-Cache LSQ

Title Bullet