CS203 – Advanced Computer Architecture

Slides:

Advertisements

Similar presentations

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.

Advertisements

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

A scheme to overcome data hazards

Instruction Level Parallelism María Jesús Garzarán University of Illinois at Urbana-Champaign.

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

COMP25212 Advanced Pipelining Out of Order Processors.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.

Computer Architecture

Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.

2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.

Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.

Instruction-Level Parallelism and Its Dynamic Exploitation

IBM System 360. Common architecture for a set of machines

The University of Adelaide, School of Computer Science

/ Computer Architecture and Design

/ Computer Architecture and Design

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

Module: Part 2 Dynamic Scheduling in Hardware - Tomasulo’s Algorithm

Out of Order Processors

Dynamic Scheduling and Speculation

Step by step for Tomasulo Scheme

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

CS203 – Advanced Computer Architecture

CS203 – Advanced Computer Architecture

Lecture 10 Tomasulo’s Algorithm

Lecture 12 Reorder Buffers

Chapter 3: ILP and Its Exploitation

Advantages of Dynamic Scheduling

CPE 631 Lecture 13: Exploiting ILP with SW Approaches

Instruction-level Parallelism

CS 5513 Computer Architecture Pipelining Examples

11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.

CMSC 611: Advanced Computer Architecture

A Dynamic Algorithm: Tomasulo’s

Out of Order Processors

CS203 – Advanced Computer Architecture

ECE 2162 Reorder Buffer.

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

CS 704 Advanced Computer Architecture

Computer Architecture

EECC551 Exam Review 4 questions out of 6 questions

Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2)

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Advanced Computer Architecture

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

September 20, 2000 Prof. John Kubiatowicz

Tomasulo Algorithm Example

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Tomasulo Organization

Reduction of Data Hazards Stalls with Dynamic Scheduling

CS5100 Advanced Computer Architecture Dynamic Scheduling

CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming February.

Midterm 2 review Chapter

/ Computer Architecture and Design

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

September 20, 2000 Prof. John Kubiatowicz

CS252 Graduate Computer Architecture Lecture 6 Introduction to Advanced Pipelining: Out-Of-Order Execution John Kubiatowicz Electrical Engineering and.

CS203 – Advanced Computer Architecture

CPE 631 Lecture 14: Exploiting ILP with SW Approaches (2)

Tomasulo Speculative Example

CS 3853 Computer Architecture Pipelining Examples

Conceptual execution on a processor which exploits ILP

Presentation transcript:

CS203 – Advanced Computer Architecture Tomasulo Algorithm - Superscalar

Tomasulo Example Loop: LD R2,0(R1) DADDIU R2,R2,#1 SD R2,0(R1) DADDIU R1,R1,#8 BNE R2,R3,LOOP Assumption: Add/Branch – 1 cycle Load/Store – 1 cycle Addr. Gen 1 cycles Mem. Access *Assume 2-issue superscalar 2 instruction can commit/clock (2 CDB) 2 1. Issue 3 Memory FP Adder Branch

Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) (R2) = 5 DADDIU R2, R2, #1 (R2) = 6 SD R2, 0(R1) Mem[100] = 6 DADDIU R1, R1, #8 (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 2 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Busy Op Vj Vk Qj Qk Addr Add1 Add2 Add3 Br1 Br2 Load1 Load2 Load3 Store1 Store2

Cycle 1 LD1 – Issue ADD1a – Issue Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) (R2) = 5 DADDIU R2, R2, #1 (R2) = 6 SD R2, 0(R1) Mem[100] = 6 DADDIU R1, R1, #8 (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 2 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 1 LD1 – Issue ADD1a – Issue Busy Op Vj Vk Qj Qk Addr Add1 1 ADD Load1 Add2 Add3 Br1 Br2 LD 100 Load2 Load3 Store1 Store2

Cycle 2 LD1 – Calc. Addr. ADD1a – Wait for R2 (LD1) SD1 – Issue Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 (R2) = 5 DADDIU R2, R2, #1 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Mem[100] = 6 DADDIU R1, R1, #8 (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 2 LD1 – Calc. Addr. ADD1a – Wait for R2 (LD1) SD1 – Issue ADD1b – Issue Busy Op Vj Vk Qj Qk Addr Add1 1 ADD Load1 Add2 100 8 Add3 Br1 Br2 LD Load2 Load3 Store1 SD Store2

Cycle 3 LD1 – Load from Mem. ADD1a – Wait for R2 (LD1) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 (R2) = 5 DADDIU R2, R2, #1 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 3 LD1 – Load from Mem. ADD1a – Wait for R2 (LD1) SD1 – Calc. Addr. ADD1b – Execute BNE1 - Issue Busy Op Vj Vk Qj Qk Addr Add1 1 ADD Load1 Add2 100 8 Add3 Br1 BNE 10 Br2 LD Load2 Load3 Store1 SD Store2

Cycle 4 LD1 – Write to CDB ADD1a – Wait for R2 (LD1) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 4 LD1 – Write to CDB ADD1a – Wait for R2 (LD1) SD1 – Wait for R2 (ADD1a) ADD1b – Write to CDB BNE1 – Wait for R2 (ADD1a) LD2 – Issue ADD2a - Issue Busy Op Vj Vk Qj Qk Addr Add1 1 ADD 5 Add2 100 8 Add3 Load2 Br1 BNE 10 Br2 Load1 LD 108 Load3 Store1 SD Store2

Cycle 5 ADD1a – Execute SD1 – Wait for R2 (ADD1a) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 Wait for BNE (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 5 ADD1a – Execute SD1 – Wait for R2 (ADD1a) BNE1 – Wait for R2 (ADD1a) LD2 – Wait for BNE1 ADD2a – Wait for R2 (LD2) SD2 – Issue ADD2b – Issue Busy Op Vj Vk Qj Qk Addr Add1 1 ADD 5 Add2 108 8 Add3 Load2 Br1 BNE 10 Br2 Load1 LD Load3 Store1 SD 100 Store2

Cycle 6 ADD1a – Write to CDB SD1 – Wait for R2 (ADD1a) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 Wait for BNE (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 6 ADD1a – Write to CDB SD1 – Wait for R2 (ADD1a) BNE1 – Wait for R2 (ADD1a) LD2 – Wait for BNE1 ADD2a – Wait for R2 (LD2) SD2 – Wait for R2 (ADD2a) ADD2b – Wait for BNE1 BNE2 - Issue Busy Op Vj Vk Qj Qk Addr Add1 ADD 5 1 Add2 108 8 Add3 Load2 Br1 BNE 6 10 Br2 Load1 LD Load3 Store1 SD 100 Store2

Cycle 7 SD1 – Write to Mem BNE1 – Calc. Condition LD2 – Wait for BNE1 Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) 7 Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 Wait for BNE (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 7 SD1 – Write to Mem BNE1 – Calc. Condition LD2 – Wait for BNE1 ADD2a – Wait for R2 (LD2) SD2 – Wait for R2 (ADD2a) ADD2b – Wait for BNE1 BNE2 – Wait for R2 (ADD2a) Busy Op Vj Vk Qj Qk Addr Add1 Add2 1 ADD 108 8 Add3 Load2 Br1 BNE 6 10 Br2 Load1 LD Load3 Store1 SD 100 Store2

Note: SD2 is also ready to calc addr, Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) 7 Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 8 Wait for BNE (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 8 LD2 – Addr. Calc. ADD2a – Wait for R2 (LD2) SD2 – Wait for R2 (ADD2a) ADD2b – Exec BNE2 – Wait for R2 (ADD2a) Note: SD2 is also ready to calc addr, But structural hazard exist w/ LD2 Busy Op Vj Vk Qj Qk Addr Add1 Add2 1 ADD 108 8 Add3 Load2 Br1 Br2 BNE 10 Load1 LD Load3 Store1 Store2 SD

Cycle 9 LD2 – Load from Mem ADD2a – Wait for R2 (LD2) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) 7 Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 8 9 Wait for BNE (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 9 LD2 – Load from Mem ADD2a – Wait for R2 (LD2) SD2 – Addr. Calc. ADD2b – Write to CDB BNE2 – Wait for R2 (ADD2a) Busy Op Vj Vk Qj Qk Addr Add1 Add2 ADD 108 8 Add3 1 Load2 Br1 Br2 BNE 10 Load1 LD Load3 Store1 Store2 SD

Cycle 10 LD2 – Write to CDB ADD2a – Wait for R2 (LD2) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) 7 Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 8 9 10 Wait for BNE (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 10 LD2 – Write to CDB ADD2a – Wait for R2 (LD2) SD2 – Wait for R2 (ADD2a) BNE2 – Wait for R2 (ADD2a) Busy Op Vj Vk Qj Qk Addr Add1 Add2 Add3 1 ADD 6 Br1 Br2 BNE 10 Load1 Load2 LD 108 Load3 Store1 Store2 SD

Cycle 11 ADD2a – Exec SD2 – Wait for R2 (ADD2a) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) 7 Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 8 9 10 Wait for BNE 11 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 11 ADD2a – Exec SD2 – Wait for R2 (ADD2a) BNE2 – Wait for R2 (ADD2a) Busy Op Vj Vk Qj Qk Addr Add1 Add2 Add3 1 ADD 6 Br1 Br2 BNE 10 Load1 Load2 Load3 Store1 Store2 SD 108

Cycle 12 ADD2a – Write to CDB SD2 – Wait for R2 (ADD2a) Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) 7 Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 8 9 10 Wait for BNE 11 12 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 12 ADD2a – Write to CDB SD2 – Wait for R2 (ADD2a) BNE2 – Wait for R2 (ADD2a) Busy Op Vj Vk Qj Qk Addr Add1 Add2 Add3 ADD 6 1 Br1 Br2 BNE 7 10 Load1 Load2 Load3 Store1 Store2 SD 108

Cycle 13 SD2 – Store to Mem BNE2 – Calc. Condition Iter Instruction Issue @ Exec @ Mem access @ Wrt. CDB @ Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 5 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) 7 Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 8 9 10 Wait for BNE 11 12 (R2) = 7 13 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Cycle 13 SD2 – Store to Mem BNE2 – Calc. Condition Busy Op Vj Vk Qj Qk Addr Add1 Add2 Add3 Br1 Br2 1 BNE 7 10 Load1 Load2 Load3 Store1 Store2 SD 108