1 Tomasulo’s Algorithm and IBM 360 Srivathsan Soundararajan.

Slides:



Advertisements
Similar presentations
Spring 2003CSE P5481 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulos algorithm.
Advertisements

CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 30, 2002 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Tomasulo’s.
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
A scheme to overcome data hazards
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
COMP25212 Advanced Pipelining Out of Order Processors.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Instruction Issue Logic for High- Performance Interruptible Pipelined Processors Gurinder S. Sohi Professor UW-Madison Computer Architecture Group University.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.
Introduction to Computer Organization Pipelining.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
Fall 2008, Oct. 31 ELEC / The IBM360 and Tomasulo's Algorithm Joel D. Hewlett.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Dataflow Order Execution  Use data copying and/or hardware register renaming to eliminate WAR and WAW ­register name refers to a temporary value produced.
COMP25212 Advanced Pipelining Out of Order Processors.
CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
IBM System 360. Common architecture for a set of machines
Dynamic Scheduling Why go out of style?
/ Computer Architecture and Design
Out of Order Processors
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
CS203 – Advanced Computer Architecture
Microprocessor Microarchitecture Dynamic Pipeline
Advantages of Dynamic Scheduling
Pipelining: Advanced ILP
Lecture 6: Advanced Pipelines
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Adapted from the slides of Prof
Checking for issue/dispatch
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
How to improve (decrease) CPI
Chapter Six.
Static vs. dynamic scheduling
Static vs. dynamic scheduling
Tomasulo Organization
Adapted from the slides of Prof
Lecture 7 Dynamic Scheduling
Conceptual execution on a processor which exploits ILP
Presentation transcript:

1 Tomasulo’s Algorithm and IBM 360 Srivathsan Soundararajan

2 What we have seen till now!!! Single-cycle datapath. Multi-cycle datapath. N-stage Pipelined datapath.

3 Tomasulo's algorithm A hardware algorithm for controlling the execution of multiple functional units with varying latencies in a pipelined CPU micro-architecture. A general mechanism for register forwarding and hazard detection. The key idea is to virtually execute each instruction in a single cycle. Out-of-order execution of instructions.

4 So what is so special??? Let instructions behind stall proceed. Decode instructions and check for structural hazard. Wait until no data hazard and then read operands.

5 Three Stages Issue – if reservation station free (i.e. no structural hazard), control issues instruction and sends operands (renames registers). Execution – if both operands ready, then execute. If not, watch common data bus for result. Write result – if CDB available, write on common data bus to all awaiting units; mark reservation status available

6 Virtual result – a promissory note All registers are modified so that they can either hold a true result or a virtual result. When an instruction is issued, a virtual result is placed in the instruction's destination register. A functional unit is assigned to compute the real result. The virtual result is replaced by the real result when the functional unit has completed its computation.

7 Tomasulo Organization

8 The concept Each instruction, as it arrives, fetches its operands from a special register file. Each register in this file holds either an actual value, or a “tag” indicating the reservation station that will produce the register value when it completes. The instruction and its operands (either values or tags) are stored in a reservation station (RS). The RS watches the results returning from the execution pipelines, and when a result's tag matches one of its operands, it records the value in place of the tag

9 A good example 1-s97/tomasulo.htm 1-s97/tomasulo.htm

10 Why Tomasulo’s Algorithm Hazard detection. Dynamic Scheduling (i.e. Hardware reorganizes instructions)

11 IBM 360 The IBM 360 introduced many new concepts, including dynamic detection of memory hazards, generalized forwarding, and reservation stations. The approach is normally named Tomasulo’s algorithm

12 Installation of the IBM 360/91 in the Columbia Computer Center machine room in February or March 1969 Photo: AIS archive.

13 IBM 360 Was introduced by the team led by Michael Flynn in The internal organization of the 360/91 shares many features with the Pentium III and Pentium 4, as well as several other microprocessors. One major difference was that there was no branch prediction in the 360/91 and hence no speculation. Another major difference was that there was no commit unit, so once the instructions finished execution, they updated the registers. Out-of-order instruction commit led to imprecise interrupts, which proved to be unpopular and led to the commit units in dynamically scheduled pipelined processors since that time.

14 IBM 360 Although the 360/91 was not a success, the key ideas were resurrected later and exist in some form in the majority of microprocessors like Pentium II, Power PC 604 etc... It ran under Operating System/ a powerful programming package of approximately 1.5 million instructions that enabled the system to operate with virtually no manual intervention.

15 IBM 360 Within the central processing unit (CPU), there were five highly autonomous execution units which allowed the machine to overlap operations and process many instructions simultaneously. The five units were processor storage, storage bus control, instruction processor, fixed-point processor and floating-point processor. Not only could these units operate concurrently, they could also perform several functions at the same time.

16 Some uses The IBM-360 family of computers ranged from the model 20 minicomputer (which typically had 24 KB of memory) to the model 91 supercomputer which was built for the North American missile defense system.

17 Quite huge isnt it???

18 References sjoh/readings/smv/CadenceSMV- docs/smv/tutorial/node36.html#tomasulo1 sjoh/readings/smv/CadenceSMV- docs/smv/tutorial/node36.html#tomasulo1 masulo.ppt#268,9,Three Stages of Tomasulo Algorithm masulo.ppt#268,9,Three Stages of Tomasulo Algorithm 11.ppt#723,4,Tomasulo: Organization 11.ppt#723,4,Tomasulo: Organization