1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.
CS 152 Computer Architecture & Engineering Andrew Waterman University of California, Berkeley Section 8 Spring 2010.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 Lecture 12: ILP Innovations and SMT Today: ILP innovations, SMT, cache basics (Sections 3.5 and supplementary notes)
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
1 Lecture 5: Dependence Analysis and Superscalar Techniques Overview Instruction dependences, correctness, inst scheduling examples, renaming, speculation,
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power Sign up for class presentations.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.
CS203 – Advanced Computer Architecture ILP and Speculation.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
Lecture: Out-of-order Processors
Lecture: Pipelining Extensions
Lecture: Out-of-order Processors
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Lecture 18: Pipelining Today’s topics:
Lecture 11: Memory Data Flow Techniques
Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Lecture 18: Pipelining Today’s topics:
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Lecture: Pipelining Extensions
Lecture 20: OOO, Memory Hierarchy
Lecture 20: OOO, Memory Hierarchy
Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Lecture 5: Pipeline Wrap-up, Static ILP
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)

2 The Alpha Out-of-Order Implementation Branch prediction and instr fetch R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Instr Fetch Queue Decode & Rename Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Reorder Buffer (ROB) P33  P1+P2 P34  P33+P3 BEQZ P34 P35  P33+P34 P36  P35+P34 Issue Queue (IQ) ALU Register File P1-P64 Results written to regfile and tags broadcast to IQ Register Map Table R1  P1 R2  P2

3 Out-of-Order Loads/Stores LdR1  [R2] Ld St Ld What if the issue queue also had load/store instructions? Can we continue executing instructions out-of-order? R3  [R4] R5  [R6] R7  [R8] R9  [R10]

4 Memory Dependence Checking Ld0x abcdef Ld St Ld 0x abcdef St0x abcd00 Ld0x abc000 Ld0x abcd00 The issue queue checks for register dependences and executes instructions as soon as registers are ready Loads/stores access memory as well – must check for RAW, WAW, and WAR hazards for memory as well Hence, first check for register dependences to compute effective addresses; then check for memory dependences

5 Memory Dependence Checking Ld0x abcdef Ld St Ld 0x abcdef St0x abcd00 Ld0x abc000 Ld0x abcd00 Load and store addresses are maintained in program order in the Load/Store Queue (LSQ) Loads can issue if they are guaranteed to not have true dependences with earlier stores Stores can issue only if we are ready to modify memory (can not recover if an earlier instr raises an exception)

6 The Alpha Out-of-Order Implementation Branch prediction and instr fetch R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Instr Fetch Queue Decode & Rename Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Reorder Buffer (ROB) P33  P1+P2 P34  P33+P3 BEQZ P34 P35  P33+P34 P36  P35+P34 Issue Queue (IQ) ALU Register File P1-P64 Results written to regfile and tags broadcast to IQ Register Map Table R1  P1 R2  P2 P37  [P34 + 4] P35  [P36 + 4] LSQ ALU D-Cache

7 Improving Performance Techniques to increase performance:  pipelining  improves clock speed  increases number of in-flight instructions  hazard/stall elimination  branch prediction  register renaming  efficient caching  out-of-order execution with large windows  memory disambiguation  bypassing  increased pipeline bandwidth

8 Deep Pipelining Increases the number of in-flight instructions Decreases the gap between successive independent instructions Increases the gap between dependent instructions Depending on the ILP in a program, there is an optimal pipeline depth Tough to pipeline some structures; increases the cost of bypassing

9 Increasing Width Difficult to find more than four independent instructions Difficult to fetch more than six instructions (else, must predict multiple branches) Increases the number of ports per structure

10 Reducing Stalls in Fetch Better branch prediction  novel ways to index/update and avoid aliasing  cascading branch predictors Trace cache  stores instructions in the common order of execution, not in sequential order  in Intel processors, the trace cache stores pre-decoded instructions

11 Reducing Stalls in Rename/Regfile Larger ROB/register file/issue queue Virtual physical registers: assign virtual register names to instructions, but assign a physical register only when the value is made available Runahead: while a long instruction waits, let a thread run ahead to prefetch (this thread can deallocate resources more aggressively than a processor supporting precise execution) Two-level register files: values being kept around in the register file for precise exceptions can be moved to 2 nd level

12 Stalls in Issue Queue Two-level issue queues: 2 nd level contains instructions that are less likely to be woken up in the near future Value prediction: tries to circumvent RAW hazards Memory dependence prediction: allows a load to execute even if there are prior stores with unresolved addresses Load hit prediction: instructions are scheduled early, assuming that the load will hit in cache

13 Functional Units Clustering: allows quick bypass among a small group of functional units; FUs can also be associated with a subset of the register file and issue queue

14 Title Bullet