Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advertisements

Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

A scheme to overcome data hazards

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.

COMP25212 Advanced Pipelining Out of Order Processors.

Computer Architecture Lec 8 – Instruction Level Parallelism.

CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Instruction-Level Parallelism dynamic scheduling prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University May 2015Instruction-Level Parallelism.

1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.

2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

CS203 – Advanced Computer Architecture ILP and Speculation.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Instruction-Level Parallelism and Its Dynamic Exploitation

IBM System 360. Common architecture for a set of machines

CS 352H: Computer Systems Architecture

/ Computer Architecture and Design

COMP 740: Computer Architecture and Implementation

Out of Order Processors

Dynamic Scheduling and Speculation

Step by step for Tomasulo Scheme

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

CS203 – Advanced Computer Architecture

CS5100 Advanced Computer Architecture Hardware-Based Speculation

Lecture 12 Reorder Buffers

Advantages of Dynamic Scheduling

11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.

CMSC 611: Advanced Computer Architecture

A Dynamic Algorithm: Tomasulo’s

Out of Order Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

ECE 2162 Reorder Buffer.

CS 704 Advanced Computer Architecture

Lecture 11: Memory Data Flow Techniques

CS 704 Advanced Computer Architecture

Adapted from the slides of Prof

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Advanced Computer Architecture

Tomasulo Organization

Reduction of Data Hazards Stalls with Dynamic Scheduling

Adapted from the slides of Prof

Midterm 2 review Chapter

Chapter 3: ILP and Its Exploitation

September 20, 2000 Prof. John Kubiatowicz

Lecture 7 Dynamic Scheduling

Lecture 9: Dynamic ILP Topics: out-of-order processors

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based on CA: aQA by Hennessy/Patterson. Supplemented from various freely downloadable sources

Advanced Topic: Hardware-Based Speculation CA:aQA Sec. 3.7

Speculating on Branch Outcomes To optimally exploit ILP (instruction-level parallelism) – e.g. with pipelining, Tomasulo, etc. – it is critical to efficiently maintain control dependencies (=branch dependencies) Key idea: Speculate on the outcome of branches(=predict) and execute instructions as if the predictions are correct of course, we must proceed in such a manner as to be able to recover if our speculation turns out wrong Three components of hardware-based speculation dynamic branch prediction to pick branch outcome speculation to allow instructions to execute before control dependencies are resolved, i.e., before branch outcomes become known – with ability to undo in case of incorrect speculation dynamic scheduling

Speculating with Tomasulo Modern processors such as PowerPC 603/604, MIPS R10000, Intel Pentium II/III/4, Alpha extend Tomasulo’s approach to support speculation Key ideas: separate execution from completion: allow instructions to execute speculatively but do not let instructions update registers or memory until they are no longer speculative therefore, add a final step – after an instruction is no longer speculative – when it is allowed to make register and memory updates, called instruction commit allow instructions to execute and complete out of order but force them to commit in order add a hardware buffer, called the reorder buffer (ROB), with registers to hold the result of an instruction between completion and commit

Tomasulo Hardware with Speculation Basic structure of MIPS floating-point unit based on Tomasulo and extended to handle speculation: ROB is added and store buffer in original Tomasulo is eliminated as its functionality is integrated into the ROB ROB is a queue!

Tomasulo’s Algorithm with Speculation: Four Stages 1. Issue: get instruction from Instruction Queue  if reservation station and ROB slot free (no structural hazard), control issues instruction to reservation station and ROB, and sends to reservation station operand values (or reservation station source for values) as well as allocated ROB slot number 2. Execution: operate on operands (EX)  when both operands ready then execute; if not ready, watch CDB for result 3. Write result: finish execution (WB)  write on CDB to all awaiting units and ROB; mark reservation station available 4. Commit: update register or memory with ROB result  when instruction reaches head of ROB and results present, update register with result or store to memory and remove instruction from ROB  if an incorrectly predicted branch reaches the head of ROB, flush the ROB, and restart at correct successor of branch Compare Tomasulo’s Algorithm: Red text = differences!

ROB Data Structure ROB entry fields Instruction type: branch, store, register operation (i.e., ALU or load) State: indicates if instruction has completed and value is ready Destination: where result is to be written – register number for register operation (i.e. ALU or load), memory address for store branch has no destination result Value: holds the value of instruction result till time to commit Additional reservation station field Destination: Corresponding ROB entry number

Example 1. L.D F6, 34(R2) 2. L.D F2, 45(R3) 3. MUL.D F0, F2, F4 4. SUB.D F8, F2, F6 5. DIV.D F10, F0, F6 6. ADD.D F6, F8, F2 Assume latencies load 1 clock, add 2 clocks, multiply 10 clocks, divide 40 clocks Show data structures just before MUL.D goes to commit…

Reservation Stations NameBusyOpVj Vk Qj Qk Dest A Load1no Load2no Add1no Add2no Add3no Mult1yesMUL Mem[45+Regs[R3]] Regs[F4] #3 Mult2yesDIV Mem[34+Regs[R2]] #3 #5 Addi indicates ith reservation station for the FP add unit, etc.

Reorder Buffer EntryBusy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 yes MUL.D F0, F2, F4 Write result F0 #2  Regs[F4] 4 yes SUB.D F8, F6, F2 Write result F8 #1 – #2 5 yes DIV.D F10, F0, F6 Execute F10 6 yes ADD.D F6, F8, F2 Write result F6 #4 + #2 At the time MUL.D is ready to commit only the two L.D instructions have already committed, though others have completed execution Actually, the MUL.D is at the head of the ROB – the L.D instructions are shown only for understanding purposes #X represents value field of ROB entry number X

Registers Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F10 Reorder# Busy yes no no no no no yes … yes yes Floating point registers

Example Loop:LDF0 0 R1 MULTD F4 F0 F2 SDF4 0 R1 SUBI R1 R1 #8 BNEZR1 Loop Assume instructions in the loop have been issued twice Assume L.D and MUL.D from the first iteration have committed and all other instructions have completed Assume effective address for store is computed prior to its issue Show data structures…

Reorder Buffer EntryBusy Instruction State Destination Value 1 no L.D F0, 0(R1) Commit F0 Mem[0+Regs[R1]] 2 no MUL.D F4, F0, F2 Commit F4 #1  Regs[F2] 3 yes S.D F4, 0(R1) Write result 0 + Regs[R1] #2 4 yes DADDUI R1, R1, #-8 Write result R1 Regs[R1] – 8 5 yes BNE R1, R2, Loop Write result 6 yes L.D F0, 0(R1) Write result F0 Mem[#4] 7 yes MUL.D F4, F0, F2 Write result F4 #6  Regs[F2] 8 yes S.D F4, 0(R1) Write result 0 + #4 #7 9 yes DADDUI R1, R1, #-8 Write result R1 #4 – 8 10 yes BNE R1, R2, Loop Write result

Registers Field F0 F1 F2 F3 F4 F5 F6 F7 F8 Reorder# 6 7 Busy yes no no no yes no no … no

Notes If a branch is mispredicted, recovery is done by flushing the ROB of all entries that appear after the mispredicted branch entries before the branch are allowed to continue restart the fetch at the correct branch successor When an instruction commits or is flushed from the ROB then the corresponding slots become available for subsequent instructions