Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006.

Slides:



Advertisements
Similar presentations
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Advertisements

Computer Organization and Architecture
1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
A scheme to overcome data hazards
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Introduction 23rd Mar, 2006.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines.
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
Dr A Sahu Dept of Computer Science & Engineering IIT Guwahati.
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
Mult. Issue CSE 471 Autumn 011 Multiple Issue Alternatives Superscalar (hardware detects conflicts) –Statically scheduled (in order dispatch and hence.
Superscalar Implementation Simultaneously fetch multiple instructions Logic to determine true dependencies involving register values Mechanisms to communicate.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
Lecture 8 Shelving in Superscalar Processors (Part 1)
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
OOO execution © Avi Mendelson, 4/ MAMAS – Computer Architecture Lecture 7 – Out Of Order (OOO) Avi Mendelson Some of the slides were taken.
5 Pipelined Processor TECH Computer Science  temporal overlapping of processing, assembly line 5.1 Basic concept 5.2 Design space of pipelines 5.3 Overview.
Microarchitecture of Superscalars (4) Decoding Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
Microarchitecture of Superscalars (5) Dynamic Instruction Issue Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.
Computer Architecture Computer Architecture Superscalar Processors Ola Flygt Växjö University +46.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Differences in ISA Instruction length
Computer Architecture: Out-of-Order Execution
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Modern processor design
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
Microarchitecture of Superscalars (6) Register renaming Dezső Sima Spring 2008 (Ver. 2.0)  Dezső Sima, 2008.
CS203 – Advanced Computer Architecture ILP and Speculation.
CSL718 : Superscalar Processors
Precise Exceptions and Out-of-Order Execution
Instruction Level Parallelism
PowerPC 604 Superscalar Microprocessor
Out of Order Processors
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Microprocessor Microarchitecture Dynamic Pipeline
Flow Path Model of Superscalars
Superscalar Pipelines Part 2
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
ECE 2162 Reorder Buffer.
Lecture 11: Memory Data Flow Techniques
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
7. Microarchitecture of Superscalars (5) Dynamic Instruction Issue
* From AMD 1996 Publication #18522 Revision E
Microarchitecture of Superscalars (4) Decoding
Prof. Onur Mutlu Carnegie Mellon University
CSL718 : Superscalar Processors
Dynamic Pipelines Like Wendy’s: once ID/RD has determined what you need, you get queued up, and others behind you can get past you. In-order front end,
Sizing Structures Fixed relations Empirical (simulation-based)
Presentation transcript:

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006

Anshul Kumar, CSE IITD slide 2 Early proposals/prototypes IBM DEC Stanford U Kyushu U CheetahAmerica project(4) Multititan project(2) Match(2) Torch(4) SIMP(4) DSNS(4) Term Superscalar

Anshul Kumar, CSE IITD slide 3 Commercial superscalars RISCs Intel960KA/KB  960CA (3)1989 IBMPower 1 RS/6000 (4)1990 HPPA7000  PA7100 (2)1992 SUNSPARC  SuperSparc (3)1992 DECAlpha 21064(2)1992 MotorolaMC88100  MC88110(2)1993 MotorolaPowerPC 601/603 (3)1993 MIPSR4000  R8000(4)1994

Anshul Kumar, CSE IITD slide 4 Commercial superscalars CISCs Intel80486  Pentium (2)1993 Motorola MC68040  MC68060 (2)1993 GmicroGmicro/100p  Gmicro 500 (2)1993 AMDK5(2) – 4 RISC instr 1995 CYRIXM1 (2)1995

Anshul Kumar, CSE IITD slide 5 Tasks of superscalar processing Parallel Parallel Preserving the decoding instruction sequential and issue execution consistency of instruction execution and exception processing

Anshul Kumar, CSE IITD slide 6 Superscalar decode and issue I - cache Instruction buffer Decode & Issue IFD/I I - cache Instruction buffer Decode & Issue IFDI Scalar Issue Superscalar Issue

Anshul Kumar, CSE IITD slide 7 Parallel Decoding Fetch multiple instructions in instruction buffer Decode multiple instructions in parallel – instruction window Possibly check dependencies among these as well as with the instructions already under execution

Anshul Kumar, CSE IITD slide 8 Pre-decodingPre-decoding Do partial decoding while instructions are being loaded in I-cache Decoded information is appended to the instruction This includes instruction class, resources required etc. Second level cache or main memory Pre-decode unit I - cache N bits/cycle N + n bits/cycle

Anshul Kumar, CSE IITD slide 9 Number of Pre-decode bits ProcessorNo. of predecode bits PA 7200 (1995)5 PA 8000 (1996)5 PowerPC 620(1996)7 UltraSparc (1995)4 HAL PM1 (1995)4 AMD K5 (1995)5 (per byte) R (1996)4

Anshul Kumar, CSE IITD slide 10 Issue vs Dispatch Blocking Issue Decode and issue to EU Instructions may be blocked due to data dependency Non-blocking Issue Decode and issue to buffer From buffer dispatch to EU Instructions are not blocked due to data dependency

Anshul Kumar, CSE IITD slide 11 Blocking Issue EU Decode Check & Issue Instruction buffer issue window

Anshul Kumar, CSE IITD slide 12 Non-blocking (shelved) Issue Reservation station Dep. Checking/ dispatch EU Reservation station Dep. Checking/ dispatch EU Reservation station Dep. Checking/ dispatch EU Decode & Issue Instruction buffer

Anshul Kumar, CSE IITD slide 13 Handling of Issue Blockages Preserving issue order Alignment of instruction issue aligned unaligned in-order out of order

Anshul Kumar, CSE IITD slide 14 Issue Order cdabe a Issue window Instructions to be issued Instructions issued cdabe a Issue window Instructions to be issued Instructions issued Issue in strict program orderOut of order Issue c Example: MC 88110, PowerPC 601 Independent instruction Dependent instruction Issued instruction

Anshul Kumar, CSE IITD slide 15 AlignmentAlignment cdabe a fixed window checked in cycle 1 Aligned IssueUnaligned Issue issued in cycle 1 fgh next window cdbe b checked in cycle 2 issued in cycle 2 fgh de d checked in cycle 3 issued in cycle 3 fgh c cdabe a gliding window fgh cdbe b fgh defgh c def

Anshul Kumar, CSE IITD slide 16 Design choices in instruction issue Coping with Coping with Use of Handling of Issue false data unresolved shelving issue blockages rate dependencies control (2-6) dependencies no Register renaming wait speculative blocking shelved

Anshul Kumar, CSE IITD slide 17 Frequently used issue policies in scalar processors Traditional Traditional scalar issue scalar issue with shelving with shelving with spec. and renaming execution CDC 6600IBM 360/91i386 MC68030 R3000 Sparc I486 MC68040 R4000 MicroSparc

Anshul Kumar, CSE IITD slide 18 Frequently used issue policies in super scalar processors Straightforward Straightforward Straight forward Advanced superscalar superscalar issue issue with issue with issue shelving renaming (renaming+shelving) aligned unaligned (speculative execution in all) Pentium PowerPC601 PA7100 SuperSparc Alpha21164 MC68060 PA7200 UltraSparc MC88110 R8000 PowerPC602 R10000 PentiumPro PowerPC602 PA8000 Sparc64 Am29000 K5

Anshul Kumar, CSE IITD slide 19 Frequently used issue policies Traditional Traditional Straight forward Advanced scalar issue scalar issue superscalar issue superscalar with spec. Issue execution aligned unaligned

Anshul Kumar, CSE IITD slide 20 Design Space of Shelving Scope of Layout of Operand fetch Instruction shelving shelving policy dispatch scheme buffers partial full

Anshul Kumar, CSE IITD slide 21 Layout of Shelving Buffers Type of the Number of Number of read shelving buffers shelving buffer entries and write ports Stand combined with alone renaming and (RS) reordering individual 2-4 group 6-16 central 20 total depends on no. of EUs connected

Anshul Kumar, CSE IITD slide 22 Reservation Stations (RS) EU RS Individual RSsGroup RSsCentral RS

Anshul Kumar, CSE IITD slide 23 Combined Buffer (for Shelving, Renaming, Reordering) EU DRIS From decode/issue Deferred scheduling, Register renaming and Instruction Shelving

Anshul Kumar, CSE IITD slide 24 Operand Fetch Policies Issue bound fetch Dispatch bound fetch

Anshul Kumar, CSE IITD slide 25 Issue bound operand fetch (with single register file) EU RS EU RS Decode/issue RF instruction data

Anshul Kumar, CSE IITD slide 26 Dispatch bound operand fetch (with single register file) EU RS EU RS Decode/issue instruction data RF

Anshul Kumar, CSE IITD slide 27 Issue bound operand fetch (with multiple register files) EU RS EU RS Decode/issue RF instruction data

Anshul Kumar, CSE IITD slide 28 Dispatch bound operand fetch (with multiple register files) EU RS EU RS Decode/issue instruction data RF

Anshul Kumar, CSE IITD slide 29 Updating RFs and RSs EU RS EU RS Decode/issue RF instruction data

Anshul Kumar, CSE IITD slide 30 Instruction dispatch scheme Dispatch Dispatch Checking Treatment of policy rate operand empty RS availability single multiple instr/ cycle Individual RSGroup or central RS

Anshul Kumar, CSE IITD slide 31 Dispatch policy Selection Arbitration Dispatch rule rule order Rule for identifying instructions which are ready for execution (data dependency check) Rule for choosing one out of several ready instructions (earlier instruction has priority)

Anshul Kumar, CSE IITD slide 32 Dispatch order in-order partially out of out of order order RS check

Anshul Kumar, CSE IITD slide 33 Checking availability of operands Direct check of Check of explicit score-board bits status bits in RS (usual for dispatch (usual for issue bound operand fetch) control flow approach data flow approach Flynn’s terminology

Anshul Kumar, CSE IITD slide 34 Score-boardScore-board Register File Data status Introduced with CDC6600

Anshul Kumar, CSE IITD slide 35 Checking in dispatch bound fetch Register File Reservation station OC Rs1 Rs2 Rd EU decoded instruction check V bits of sources update Rd set V bit Rs1,Rs2,Rd reset V bit of Rd OC (opcode) Os1 Os2 (operand value) result, Rd

Anshul Kumar, CSE IITD slide 36 Checking in issue bound fetch OC Os1/Is1 Vs1 Os2/Is2 Vs2 Rd EU decoded instruction OC, Os1, Os2, Rd result, Rd Register File update Rd, set V bit Rs1,Rs2,Rd reset V bit of Rd Os1 Os2 (operand value) Reservation station check Vs1, Vs2 associative update of Is1, Is2 with Rd, set Vs bits

Anshul Kumar, CSE IITD slide 37 Treatment of an empty RS Straight forward Bypassing approach RS if empty RS At least one cycle stay in RS EU RS EU Nx586 Sparc64 PowerPc 604

Anshul Kumar, CSE IITD slide 38 Approaches in dispatching Straight forward Enhanced Advanced in order partially out of order out of order single single multiple instr/cycle instr/cycle instr/cycle individual RSs individual RSs group/central RSs Power1, PPC603 Power2 PM1, PentiumPro Nx586, Am29000 PPC604,620 PA8000, R10000