Dynamic Scheduling Why go out of style?

Slides:

Advertisements

Similar presentations

Spring 2003CSE P5481 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulos algorithm.

Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.

COMP25212 Advanced Pipelining Out of Order Processors.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.

Instruction-Level Parallelism (ILP)

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.

The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

CS203 – Advanced Computer Architecture ILP and Speculation.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Instruction-Level Parallelism and Its Dynamic Exploitation

CS 352H: Computer Systems Architecture

CSL718 : Superscalar Processors

/ Computer Architecture and Design

/ Computer Architecture and Design

Tomasulo’s Algorithm Born of necessity

Approaches to exploiting Instruction Level Parallelism (ILP)

Out of Order Processors

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

CS203 – Advanced Computer Architecture

Lecture: Out-of-order Processors

Microprocessor Microarchitecture Dynamic Pipeline

Pipelining: Advanced ILP

Sequential Execution Semantics

High-level view Out-of-order pipeline

Morgan Kaufmann Publishers The Processor

Lecture 6: Advanced Pipelines

Out of Order Processors

Superscalar Processors & VLIW Processors

Superscalar Pipelines Part 2

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CS 704 Advanced Computer Architecture

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

15-740/ Computer Architecture Lecture 5: Precise Exceptions

Checking for issue/dispatch

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

How to improve (decrease) CPI

Static vs. dynamic scheduling

Instruction Level Parallelism (ILP)

Adapted from the slides of Prof

Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution

How to improve (decrease) CPI

High-level view Out-of-order pipeline

Lecture 7 Dynamic Scheduling

Lecture 9: Dynamic ILP Topics: out-of-order processors

Conceptual execution on a processor which exploits ILP

Sizing Structures Fixed relations Empirical (simulation-based)

Presentation transcript:

Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Spring 2003 CSE P548

Dynamic Scheduling Why come back? higher chip densities greater need to hide latencies as: discrepancy between CPU & memory speeds increases branch misprediction penalty increases from superpipelining dynamic scheduling was generalized to cover more than floating point operations handles branches & hides branch latencies hides cache misses commits instructions in-order to preserve precise interrupts can be implemented with a more general register renaming mechanism processors now issue multiple instructions at the same time more need to exploit ILP 2 styles: large physical register file & reorder buffer (R10000-style) (PentiumPro-style) Spring 2003 CSE P548

Register Renaming with A Physical Register File Register renaming provides a mapping between 2 register sets architectural registers defined by the ISA physical registers implemented in the CPU hold results of the instructions committed so far hold results of subsequent, independent instructions that have not yet committed more of them than architectural registers ~ issue width * # pipeline stages between register renaming & commit an architectural register is mapped to a physical register during a register renaming stage operands thereafter called by their physical register number hazards determined by comparing physical register numbers Effects: eliminates WAW and WAR hazards (false dependences) increases ILP Spring 2003 CSE P548

Review: Name Dependences & WAR/WAW Hazards Cause of the name dependence: the compiler runs out of registers, so has to reuse them example: anti-dependence (potential WAR hazard) DIVF _, F1, _ ADDF F1, _, _ output dependence (potential WAW hazard) DIVF F0, F1, F2 ADDF F0, F2, F3 Cause of the WAR & WAW hazards: instructions take different numbers of cycles: separate functional units (or execution pipelines) for different instructions that have different numbers of stages out-of-order execution may allow the second instruction to complete before the first Spring 2003 CSE P548

A Register Renaming Example Spring 2003 CSE P548

An Implementation (R10000) Modular design with regular hardware data structures Structures for register renaming 64 physical registers (each, for integer & FP) map tables for the current architectural-to-physical register mapping (separate, for integer & FP) accessed with an architectural register number produces a physical register number a destination register is assigned a new physical register number from a free register list (separate, for integer & FP) source operands refer to the latest defined destination register, i.e., the current mappings Spring 2003 CSE P548

An Implementation (R10000) Instruction “queues” (integer, FP & data transfer) contains decoded & mapped instructions with the current physical register mappings instructions entered into free locations in the IQ sit there until they are dispatched to functional units somewhat analogous to Tomasulo reservation stations without value fields or valid bits used to determine when operands are available compare each source operand for instructions in the IQ to destinations being written this cycle determines when an appropriate functional unit is available dispatches instructions to functional units Spring 2003 CSE P548

An Implementation (R10000) active list for all uncommitted instructions the mechanism for maintaining precise interrupts instructions entered in program-generated order allows instructions to complete in program-generated order instructions removed from the active list when: an instruction commits: the instruction has completed execution all instructions ahead of it have also completed branch is mispredicted an exception occurs contains the previous architectural-to-physical destination register mapping used to recreate the map table for instruction restart after an exception instructions in the other hardware structures & the functional units are identified by their active list location Spring 2003 CSE P548

An Implementation (R10000) busy-register table (integer & FP): indicates whether a physical register contains a value somewhat analogous to Tomasulo’s register status used to determine operand availability bit is set when a register is mapped & leaves the free list (not available yet) cleared when a FU writes the register (now there’s a value) Spring 2003 CSE P548

Spring 2003 CSE P548

The R10000 in Action 1 Spring 2003 CSE P548

The R10000 in Action 2 1 Spring 2003 CSE P548

The R10000 in Action 3 Spring 2003 CSE P548

The R10000 in Action 4 Spring 2003 CSE P548

The R10000 in Action: Interrupts 1 Spring 2003 CSE P548

The R10000 in Action: Interrupts 2 Spring 2003 CSE P548

The R10000 in Action: Interrupts 3 Spring 2003 CSE P548

The R10000 in Action: Interrupts 4 Spring 2003 CSE P548

R10000 Execution In-order issue (have already fetched instructions) rename architectural registers to physical registers via a map table detect structural hazards for instruction queues (integer, memory & FP) & active list issue up to 4 instructions to the instruction queues Out-of-order execution (to increase ILP) reservation-station-like instruction queues that indicate when an operand has been calculated each instruction monitors the setting of the busy-register table set busy-register table entry for the destination register detect functional unit structural & RAW hazards dispatch instructions to functional units In-order completion (to preserve precise interrupts) this & previous program-generated instructions have completed physical register in previous mapping returned to free list rollback on interrupts Spring 2003 CSE P548