Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors
Topics Left Superscalar machines IA64 / EPIC architecture
Computer Organization and Architecture
Computer architecture
CSCI 4717/5717 Computer Architecture
Arsitektur Komputer Pertemuan – 13 Super Scalar
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
CSE 8383 Superscalar Processor 1 Abdullah A Alasmari & Eid S. Alharbi.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Instruction-Level Parallelism (ILP)
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Chapter 12 Pipelining Strategies Performance Hazards.
Chapter 15 IA-64 Architecture No HW, Concentrate on understanding these slides Next Monday we will talk about: Microprogramming of Computer Control units.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Pertemuan 21 Parallelism and Superscalar Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
Chapter 15 IA-64 Architecture. Reflection on Superscalar Machines Superscaler Machine: A Superscalar machine employs multiple independent pipelines to.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Chapter 21 IA-64 Architecture (Think Intel Itanium)
IA-64 Architecture (Think Intel Itanium) also known as (EPIC – Extremely Parallel Instruction Computing) a new kind of superscalar computer HW 5 - Due.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Chapter 15 IA-64 Architecture or (EPIC – Extremely Parallel Instruction Computing)
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 2: Pipeline problems & tricks dr.ir. A.C. Verschueren Eindhoven.
Chapter One Introduction to Pipelined Processors.
Advanced Computer Architectures
Parallelism Processing more than one instruction at a time. Pipelining
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
RISC Architecture RISC vs CISC Sherwin Chan.
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
1 Out-Of-Order Execution (part I) Alexander Titov 14 March 2015.
Lecture 8Fall 2006 Chapter 6: Superscalar Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design, Patterson & Hennessy,
Pipelining and Parallelism Mark Staveley
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
CSIE30300 Computer Architecture Unit 13: Introduction to Multiple Issue Hsin-Chou Chi [Adapted from material by and
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Instruction level parallelism And Superscalar processors By Kevin Morfin.
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
Superscalar Processors & VLIW Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Instruction Level Parallelism and Superscalar Processors
How to improve (decrease) CPI
Chapter Six.
Sampoorani, Sivakumar and Joshua
Computer Architecture
Chapter 13 Instruction-Level Parallelism and Superscalar Processors
Instruction-level Parallelism: Reduced Instruction Set Computers and
Created by Vivi Sahfitri
Presentation transcript:

Chapter 14 Superscalar Processors

What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently. Equally applicable to RISC & CISC, but more straightforward in RISC machines. The order of execution is usually determined by the compiler. A Superscalar machine executes multiple independent instructions in parallel.

Example Superscalar Organization 2 Integer ALU pipelines, 2 FP ALU pipelines, 1 memory pipeline (?)

Superpipelined Machines Superpiplined machines overlap pipe stages - rely on stages being able to begin operations before the last is complete. Superscaler machines have multiple instruction pipelines - process multiple instructions in parallel

Superscalar v Superpipeline

Limitations of Superscalar Dependent upon: Instruction level parallelism Compiler based optimization Hardware support Limited by — True Data dependency — Procedural dependency — Resource conflicts — Output dependency or Antidependency (another form of data dependency)

True Data Dependency ADD r1, r2 (r1+r2  r1) MOVE r3, r1 (r1  r3) Can fetch and decode second instruction in parallel with first Can NOT execute second instruction until first is finished Compare with the following? LOAD r1, X (x  r1) MOVE r3, r1 (r1  r3) What additional problem do we have here?

Procedural Dependency Can’t execute instructions after a branch in parallel with instructions before a branch, because? Note: Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed

Resource Conflict Two or more instructions requiring access to the same resource at the same time —e.g. two arithmetic instructions Solution - Can possibly duplicate resources —e.g. have two arithmetic units

Antidependancy ADD R4, R3, 1 R3 + 1  R4 ADD R3, R5, 1 R5 + 1  R3 Cannot complete the second instruction before the first has read R3 Why?

True data dependency & Antidependency True data dependency: result of 1st instr used in 2 nd instr (can’t complete 1 st too soon) Antidenpendency: out of order completion of 2 nd instr can write over value to be used in 1 st instr (must complete 1 st before 2 nd changes operand value)

Effect of Dependencies

Instruction-level Parallelism Consider: LOAD R1, R2 ADD R3, 1 ADD R4, R2 These can be handled in parallel. Why? Consider: ADD R3, 1 ADD R4, R3 STO (R4), R0 These cannot. Why?

Instruction Issue Policies Order in which instructions are fetched Order in which instructions are executed Order in which instructions update registers and memory values Note: there is also the issue of instruction completion policy

In-Order Issue -- In-Order Completion Issue instructions in the order they occur: Not very efficient Instructions must stall if necessary

In-Order Issue -- In-Order Completion (Example) Assume: I1 requires 2 cycles to execute I3 & I4 conflict for the same functional unit I5 depends upon value produced by I4 I5 & I6 conflict for a functional unit

In-Order Issue -- Out-of-Order Completion (Example) How does this effect interrupts? Again: I1 requires 2 cycles to execute I3 & I4 conflict for the same functional unit I5 depends upon value produced by I4 I5 & I6 conflict for a functional unit

Out-of-Order Issue -- Out-of-Order Completion Decouple decode pipeline from execution pipeline Can continue to fetch and decode until this pipeline is full When a functional unit becomes available an instruction can be executed Since instructions have been decoded, processor can look ahead

Out-of-Order Issue -- Out-of-Order Completion (Example) Note: I5 depends upon I4, but I6 does not Again: I1 requires 2 cycles to execute I3 & I4 conflict for the same functional unit I5 depends upon value produced by I4 I5 & I6 conflict for a functional unit

Register Renaming Output and antidependencies occur because register contents may not reflect the correct ordering from the program Can result in a pipeline stall One solution: Allocate Registers dynamically (renaming registers)

Register Renaming example R3b:=R3a + R5a (I1) R4b:=R3b + 1 (I2) R3c:=R5a + 1 (I3) R7b:=R3c + R4b (I4) Without “subscript” refers to logical register in instruction With subscript is hardware register allocated: R3a R3b R3c Note: R3c avoids: antidependency on I2 output dependency I1

Machine Parallelism Support Duplication of Resources Out of order issue Renaming Windowing

Speedups of Machine Organizations ( Without Procedural Dependencies) Not worth duplication of functional units without register renaming Need instruction window large enough (more than 8, probably not more than 32)

Branch Prediction in Superscalar Machines Delayed branch not used much. Why? Multiple instructions need to execute in the delay slot. This leads to much complexity in recovery. Branch prediction may be used - Branch history MAY still be useful Are there any alternatives ?

Superscalar Execution

Committing or Retiring Instructions Results need to be put into order (commit or retire) Results sometimes must be held in temporary storage until it is certain they can be placed in “permanent” storage. (commit or retire) Temporary storage requires regular clean up - overhead.

Superscalar Hardware Support Facilities to simultaneously fetch multiple instructions Logic to determine true dependencies involving register values and Mechanisms to communicate these values Mechanisms to initiate multiple instructions in parallel Resources for parallel execution of multiple instructions Mechanisms for committing process state in correct order

Conclusions What are the relative benefits of Superscalar Superpipelining

Superscalar CISC machines Can Superscalar design be applied to CISC machines ?

javax.comm Basically, javax.comm is no longer supported on Windows (hasn't been since 2002), so we switched to RxTx, which is nearly identical. /: According to /: "Converting a JavaComm Application to RxTx", all that is required to convert a javacomm application to an RxTx application is simply changing the import statement: import javax.comm.*; to import gnu.io.*; Everything else in the program can remain exactly the same because the package gnu.io apparently encompasses the same classes as javax.comm. Indeed, rxtx version of SimpleWrite is identical to the javacomm version of SimpleWrite except that it imports gnu.io.* rather than javax.comm.*."

Basic Concepts of the IA-64 Architecture Instruction level parallelism —Explicit in machine instruction rather than determined at run time by processor Long or very long instruction words (LIW/VLIW) —Fetch bigger chunks already “preprocessed” Branch predication (not the same as branch prediction) —Go ahead and fetch & decode instructions, but keep track of them so the decision to “issue” them, or not, can be practically made later Speculative loading —Go ahead and load data so it is ready when need, and have a practical way to recover if speculation proved wrong Software Pipelining —Allows multiple iterations of a loop to execute in parallel