Microarchitecture of Superscalars (7) Preserving sequential consistency Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.

Slides:

Advertisements

Similar presentations

Symmetric Multiprocessors: Synchronization and Sequential Consistency.

Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

EECS 470 Lecture 8 RS/ROB examples True Physical Registers? Project.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines.

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University

EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.

EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.

Multiscalar processors

Lecture 8 Shelving in Superscalar Processors (Part 1)

In-Order Execution In-order execution does not always give the best performance on superscalar machines.  The following example uses in-order execution.

Out-of-order Execution Divider Sanmukh Kuppannagari.

SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.

Microarchitecture of Superscalars (4) Decoding Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.

Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,

Microarchitecture of Superscalars (5) Dynamic Instruction Issue Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.

In-Line Interrupt Handling for Software Managed TLBs Aamer Jaleel and Bruce Jacob Electrical and Computer Engineering University of Maryland at College.

Computer Architecture Computer Architecture Superscalar Processors Ola Flygt Växjö University +46.

COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.

Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006.

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:

The Microarchitecture Level

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

1 Lecture 7: Speculative Execution and Recovery using Reorder Buffer Branch prediction and speculative execution, precise interrupt, reorder buffer.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 412, University of Illinois Lecture Instruction Execution: Dynamic Scheduling.

CS 152 Computer Architecture and Engineering Lecture 15 - Out-of-Order Memory, Complex Superscalars Review Krste Asanovic Electrical Engineering and Computer.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Computer System Architecture Interrupt and Precise Exception

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.

Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.

March 1, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.

1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.

Microarchitecture of Superscalars (6) Register renaming Dezső Sima Spring 2008 (Ver. 2.0)  Dezső Sima, 2008.

CS203 – Advanced Computer Architecture ILP and Speculation.

CSL718 : Superscalar Processors

Precise Exceptions and Out-of-Order Execution

Sequential Execution Semantics

Superscalar Pipelines Part 2

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 11: Memory Data Flow Techniques

15-740/ Computer Architecture Lecture 5: Precise Exceptions

7. Microarchitecture of Superscalars (5) Dynamic Instruction Issue

Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011

Lecture 10: Consistency Models

Microarchitecture of Superscalars (4) Decoding

Conceptual execution on a processor which exploits ILP

ECE 721 Modern Superscalar Microarchitecture

Lecture 11: Consistency Models

Presentation transcript:

Microarchitecture of Superscalars (7) Preserving sequential consistency Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007

Overview 1. The processor consistency 2. The Reorder Buffer 3. The introduction of the ROB

Overview 1 The processor consistency 2 The Reorder Buffer 3 The introduction of the ROB

Processor consistencyMemory consistency Sequential consistency of instruction execution Consistency of the sequence of instruction completions Consistency of the sequence of memory accesses 1. Processor consistency (1)

Weak processor consistency Strong processor consistency Detection and resolution Power1 POWER2 (1990) (1993) Processor consistency Instructions may complete out-of-order, only if no dependences are injured Instructions complete strictly in program order Instruction reordering is allowed No instruction reordering is allowed of dependences ensures weak processor consistency ROB ensures strong processor consistency PPC 601 (1991) (1993) -line R8000(1994)  ES/9000 PPC (1992p) Pentium Pro UltraSPARC (1995) K5(1995) PA 8000 R (1996) Trend 1. Processor consistency (2) (till 21264)

2. The Reorder Buffer (1) Introduction: Smith and Pleszkun (1988) Figure 2.1: The principle of the ROB A subsequent ROB-entry is allocated Tail pointer (T) (identifies the next instruction to each dispatched instruction in-order Head pointer (H) dispatched instruction) Instructions retire in-order and modify the program state Free entries Instructions in processing to be retired) (identifies the last Status codes: in processing finished An instruction pointed to by the Tail pointer retires if all previous instructions are already retired and 7 the instruction considered is finished.

2. The Reorder Buffer (2) Figure 2.2: The principle of the ROB while supporting speculative execution Tail pointer (T) (next instruction to be retired) Head pointer (H) Free entries Instructions in processing (last occupied entry) Status codes: in processing finished speculative not speculative A subsequent ROB-entry is allocated to each dispatched instruction in-order An instruction pointed to by the Tail pointer retires if all previous instructions are already retired, the instruction considered is finished and Instructions retire in-order and modify the program state it is not in the speculative state.

2. The Reorder Buffer (3) Figure 2.3: The operation of the ROB during dispatching two instructions Entry 25: add r1, r2, r3 26: sub r4, r1, r5 26 T H Instructions in processing 7

2. The Reorder Buffer (4) Figure 2.4: The operation of the ROB during execution Status reports send status reports (in some cases also results) to the allocated ROB entries During execution instructions T H Instructions in processing 7

2. The Reorder Buffer (5) Figure 2.5: The operation of the ROB during retirement Instruction corresponding to entry 8 retires and modifies program state 9 Next instruction to be retired is the one associated with entry Retirement if the instruction considered is finished and it is not in the speculative state all previous instructions already retired, T H Instructions in processing 7

2. The Reorder Buffer (5) Figure 2.5: The operation of the ROB during retirement Instruction corresponding to entry 8 retires and modifies program state 9 Next instruction to be retired is the one associated with entry Retirement if the instruction considered is finished and it is not in the speculative state all previous instructions already retired, T H Instructions in processing 7

3. The introduction of the ROB Figure 3.1: The introduction of the ROB