Superscalar Processors & VLIW Processors

Slides:

Advertisements

Similar presentations

Instruction Level Parallelism and Superscalar Processors

Advertisements

Topics Left Superscalar machines IA64 / EPIC architecture

Instruction Level Parallelism

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

Superscalar and VLIW Architectures Miodrag Bolic CEG3151.

Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.

Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

Chapter One Introduction to Pipelined Processors.

Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

CS5222 Advanced Computer Architecture Part 3: VLIW Architecture

Processor Architecture

Pipelining and Parallelism Mark Staveley

EKT303/4 Superscalar vs Super-pipelined.

Pipelining Example Laundry Example: Three Stages

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Chapter One Introduction to Pipelined Processors

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

PipeliningPipelining Computer Architecture (Fall 2006)

Dynamic Scheduling Why go out of style?

Advanced Architectures

Instruction Level Parallelism

William Stallings Computer Organization and Architecture 8th Edition

Chapter 9 a Instruction Level Parallelism and Superscalar Processors

CS203 – Advanced Computer Architecture

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Pipelining: Advanced ILP

Instruction Level Parallelism and Superscalar Processors

Morgan Kaufmann Publishers The Processor

Pipelining and Vector Processing

Chapter 8. Pipelining.

Superscalar Pipelines Part 2

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Instruction Level Parallelism and Superscalar Processors

How to improve (decrease) CPI

Control unit extension for data hazards

* From AMD 1996 Publication #18522 Revision E

Computer Architecture

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Superscalar and VLIW Architectures

Control unit extension for data hazards

CSC3050 – Computer Architecture

Created by Vivi Sahfitri

Control unit extension for data hazards

Lecture 5: Pipeline Wrap-up, Static ILP

Instruction Level Parallelism

COMPUTER ORGANIZATION AND ARCHITECTURE

Presentation transcript:

Superscalar Processors & VLIW Processors

Topics to be covered Introduction to Super scalar Processor. Architecture of Superscalar Processor. VLIW Processor. Architecture of VLIW Processor. Difference between Superscalar and VLIW processor.

A Superscalar machine executes multiple independent instructions in parallel. They are pipelined as well. “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently. The order of execution is usually assisted by the compiler.

Super pipelined Processor In traditional pipelined system has a single pipeline stage for each sub-operation and it has to pass through a dedicated segment. Where as A super pipelined processor has a pipeline where each of these logical steps may be subdivided into multiple pipeline stages.

Superscalar v Super-pipelined

A more aggressive approach to achieve parallelism is to equip the processor with multiple processing units to handle several instructions in parallel in each processing stage. Such processors are capable of achieving an instruction execution throughput of more than one instruction per cycle. These processors are known as superscalar processors.

In superscalar processor the instruction queue has to be remain filled. Multiple issue operation requires a wider path to the cache and multiple execution units. Separate execution units are provided for integer and floating-point instructions.

Working Principle The IF unit is capable of reading two instructions at a time & storing them in the instruction queue. In each clock cycle the Dispatch unit retrieves and decodes up to two instructions from the front of the queue. If there is one integer and one floating point instruction and no hazards, both instructions are dispatched in the same clock cycle.

Out of order execution may lead to exception again which may cause inconsistency to the program. Exceptions: Two types—Imprecise and Precise. Imprecise: Let i1 and i2 are two instructions issued at the same time (clock cycle). i1 causes an exception which leads the program to inconsistency situation. While i2 has completed the WB operation. If such situation is permitted, then such type of exception is known as Imprecise Exception. To achieve consistency in the program, writing in to the destination must be followed in the program instruction order. i.e. in order.

Precise Exception: If an exception occurs during an instruction execution all subsequent instructions that may have been partially executed are discarded. This is called precise exception.

Execution Completion Out of order execution is desirable to free execution unit for other instructions. Instruction must be completed in program order to allow precise exceptions. Both the above requirements are conflicting to each other. The above problem can be resolved if execution is allowed to proceed but the results are written in to the temporary registers. Latter transferred in to the destination register in the correct program order. The above step is called commitment step.

When, out of order execution is allowed a special control unit is needed to guarantee in-order commitment. This is called commitment unit.

Dispatch Operation Should instructions be dispatched out of order? Ensure that there is no possibility of deadlock occurring. If instructions are dispatched out of order, a deadlock can arise as follows. Suppose that the processor has only one temporary register, and that when I5 is dispatched , that register is reserved for it. Instruction I4 can not be dispatched because it is waiting for the temporary register, which in turn will become free until I5 is retired. Since I5 can not be retired before I4, we have a deadlock.

Issues related to Superscalar Processor Dependent upon: - Instruction level parallelism possible - Compiler based optimization - Hardware support Limited by Data dependency Procedural dependency Resource conflicts

VLIW Processor

Basic Working Principles of VLIW Aim at speeding up computation by exploiting instruction-level parallelism. Same hardware core as superscalar processors, having multiple execution units (EUs) working in parallel. An instruction is consisted of multiple operations; typical word length from 52 bits to 1 Kbits. All operations in an instruction are executed in a lock-step mode. Rely on compiler to find parallelism and schedule dependency free program code.

Basic VLIW Approach

Register File Structure for VLIW

Differences Between VLIW & Superscalar Architecture (I)

Differences Between VLIW & Superscalar Architecture (II) Instruction formulation: Superscalar: Receive conventional instructions conceived for seq. processors. VLIW: Receive (very) long instruction words, each comprising a field (or opcode) for each execution unit. Instruction word length depends (a) number of execution units, and (b) code length to control each unit (such as opcode length, register names, …). Typical word length is 64 – 1024 bits, much longer than conventional machine word length.

Instruction scheduling: Superscalar: Done dynamically at run-time by the hardware. Data dependency is checked and resolved in hardware. Need a look ahead hardware window for instruction fetch. VLIW: Static scheduling done at compile-time by the compiler. Advantages: Reduce hardware complexity. Tasks such as decoding, data dependency detection, instruction issue, …, etc. becoming simple. Potentially higher clock rate. Higher degree of parallelism with global program information.