Advanced Computer Architecture Dataflow Processing

Slides:

Advertisements

Similar presentations

Network II.5 simulator ..

Advertisements

Computer Architecture

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

DATAFLOW ARHITEKTURE. Dataflow Processors - Motivation In basic processor pipelining hazards limit performance –Structural hazards –Data hazards due to.

Processor Technology and Architecture

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.3 Program Flow Mechanisms.

Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,

Chapter 16 Control Unit Implemntation. A Basic Computer Model.

Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.

(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.

Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.

Henry Hexmoor1 Chapter 10- Control units We introduced the basic structure of a control unit, and translated assembly instructions into a binary representation.

Introduction to Parallel Processing Ch. 12, Pg

1 Sequential Circuits Registers and Counters. 2 Master Slave Flip Flops.

CS321 Functional Programming 2 © JAS Implementation using the Data Flow Approach In a conventional control flow system a program is a set of operations.

High Performance Architectures Dataflow Part 3. 2 Dataflow Processors Recall from Basic Processor Pipelining: Hazards limit performance  Structural hazards.

Very Long Instruction Word (VLIW) Architecture. VLIW Machine It consists of many functional units connected to a large central register file Each functional.

Compiler Chapter# 5 Intermediate code generation.

Cis303a_chapt04.ppt Chapter 4 Processor Technology and Architecture Internal Components CPU Operation (internal components) Control Unit Move data and.

Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.

Principles of Linear Pipelining

Different parallel processing architectures Module 2.

Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.

Overview von Neumann Architecture Computer component Computer function

Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.

March 4, 2008http://csg.csail.mit.edu/arvindDF3-1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.

Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.

Chapter One Introduction to Pipelined Processors.

Dr.Ahmed Bayoumi Dr.Shady Elmashad

CS 270: Mathematical Foundations of Computer Science

Overview Parallel Processing Pipelining

Dataflow Machines CMPE 511

Computer Organization and Architecture + Networks

Data Structure Interview Question and Answers

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.

William Stallings Computer Organization and Architecture 8th Edition

Chapter 4 The Von Neumann Model

Prof. Onur Mutlu Carnegie Mellon University

Chap 7. Register Transfers and Datapaths

Chapter 10 © 2011, The McGraw-Hill Companies, Inc.

REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

Introduction to Micro Controllers & Embedded System Design Stored Program Machine Department of Electrical & Computer Engineering Missouri University.

Number Representations and Basic Processor Architecture

Multiprocessor Introduction and Characteristics of Multiprocessor

CSCE Fall 2013 Prof. Jennifer L. Welch.

Morgan Kaufmann Publishers Computer Organization and Assembly Language

Multivector and SIMD Computers

Overview Parallel Processing Pipelining

Introduction to Micro Controllers & Embedded System Design

Computer Architecture

COMS 361 Computer Organization

Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.

Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.

CSCE Fall 2012 Prof. Jennifer L. Welch.

Topic B (Cont’d) Dataflow Model of Computation

Languages and Compilers (SProg og Oversættere)

Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.

ECE 352 Digital System Fundamentals

ECE 352 Digital System Fundamentals

Review: The whole processor

Digital Circuits and Logic

Chapter 11 Processor Structure and function

Samira Khan University of Virginia Jan 23, 2019

Microprocessor I 7/18/2019.

Prof. Onur Mutlu Carnegie Mellon University

Multiprocessors and Multi-computers

Presentation transcript:

Advanced Computer Architecture Dataflow Processing A.R. Hurson 128 EECH Building, Missouri S&T hurson@mst.edu

Advanced Computer Architecture Control Flow Computation Operands are accessed by their addresses. Shared memory cells are the means by which data is passed between instructions. Flow of control is implicitly sequential, but special control instructions can be introduced to explicitly identify concurrency.

Advanced Computer Architecture Control Flow Computation Program Counter(s) is (are) used to sequence the execution of instructions in a centralized environment.

Advanced Computer Architecture A dataflow program is a program with a partial ordering defined by the data interdependencies. In a dataflow program the activation (execution) of an instruction is triggered (fired) by the availability of its input data.

Advanced Computer Architecture + a b + + a+b Þ

Advanced Computer Architecture input (a,b,c) a := 2*a b := -b/a c := b2 -2*a*c c := sqrt(c) c := c/a a := b+c b := b-c output(a,b) Data Dependencies

Advanced Computer Architecture Dataflow Principles The dataflow model of computation deviates from the conventional control-flow method in two basic principles: asynchrony and functionality:

Advanced Computer Architecture Dataflow Principles Asynchrony: an instruction is fired (executed) only when all the required operands are available. Functionality: any two enabled instructions can be executed in either order or concurrently — i.e., no side-effects.

Advanced Computer Architecture Dataflow Principles Within the scope of dataflow processing, implicit parallelism is achieved by allowing side-effect free expressions and functions to be evaluated in parallel.

Advanced Computer Architecture Dataflow Principles In a dataflow environment, conventional concepts such as variables and memory updating are non-existent. Objects (operand values) are consumed by an actor (instruction) yielding a result object which is passed to the next actor(s).

Advanced Computer Architecture Dataflow Principles Within the scope of a concurrent environment, dataflow computation addresses the programmability, memory latency, and synchronization issues.

Advanced Computer Architecture Questions Define; programmability, memory latency, and synchronization. How have these issues been addressed in the conventional multiprocessor systems? Why does the dataflow model of computation offer good solutions for these problems?

Advanced Computer Architecture Classification The dataflow model of computation has been traditionally classified as either static or dynamic:

Advanced Computer Architecture In the static organization, a dataflow actor can be executed only when all of the tokens are available on its input arcs and no token exists on any of its output arcs. In the dynamic organization, a dataflow actor can be enabled only when all of the tokens of the same tag (color) are available on its input arc

Advanced Computer Architecture Dataflow Graph A dataflow program can be represented as a directed graph, G = G(N,A), where nodes (actors) in N represent instructions, and arcs in A represent data dependencies among the nodes. The operands are conveyed from one node to another in data packets called tokens via the arcs.

Advanced Computer Architecture b - + * Dataflow Graph (a+b) - (a*b)

Advanced Computer Architecture . . Advanced Computer Architecture - * + 1 ready to fire 2 4 Fall 2012

Advanced Computer Architecture

Advanced Computer Architecture - * + 3 ready to fire 6 8

Advanced Computer Architecture - * + 4 - 2 Fall 2012

Advanced Computer Architecture b a 2 c *  - sqrt / neg +

Advanced Computer Architecture Dataflow Computation Data are stored in the instructions — i.e., no shared memory. Data are passed among instructions as tokens. An instruction independent of other instructions can begin its execution as soon as it is ready to be fired — e.g., firing rules for static and dynamic environments.

Advanced Computer Architecture (b) (c) + ( ) 1 g /1 a - ( ) ( ) /2 b * ( ) ( ) d a = (b+1) * (b+c) An Example

Advanced Computer Architecture The Basic Primitives In a dataflow graph two types of links are distinguished, the data link, and the Boolean link. A data link is used to pass data tokens — i.e., real numbers, integers, ... — among the arcs. Data Link

Advanced Computer Architecture The Basic Primitives A Boolean link is used to pass control tokens among the arcs. Boolean Link

Advanced Computer Architecture The Basic Primitives Operators: a data value is produced by an operator as a result of some function f. f J 1 n g = f( , ... ) Þ

Advanced Computer Architecture The Basic Primitives Decider: a true or false control value is generated by a decider depending on its input tokens. The control token produced at a decider can be combined with other control tokens by means of a Boolean operator. J 1 n P b = P( , ..., ) Þ

Advanced Computer Architecture An Example NOR Operator NOR T F Þ NOR T F Þ

Advanced Computer Architecture An Example NOR Operator NOR F T Þ NOR F T Þ

Advanced Computer Architecture The Basic Primitives Control tokens direct the flow of data tokens by means of T-gates, F-gates, and merge actors. T-gate F-gate T F merge

Advanced Computer Architecture The Basic Primitives A T-gate passes the data token on its input arc to its output arc when it receives a control token conveying the value true. T-gate T Þ Ú T-gate F Þ Ú

Advanced Computer Architecture The Basic Primitives An F-gate will pass its input data token to its output arc only on the False value token on its control input. F-gate T Þ Ú Ú F-gate F Þ

Advanced Computer Architecture The Basic Primitives A merge actor has a true input, a false input and a control input. It passes to its output arc a data token from the input arc corresponding to the value of the control token received. Any token on the other input is not affected.

Advanced Computer Architecture T F F J 1 2 Þ T F T J 1 Þ

Advanced Computer Architecture The Basic Primitives A switch actor is a combination of T-gate and F-gate. It directs an input data token to one of its output arcs depending on the control input. T F F J 1 Þ

Advanced Computer Architecture The Basic Primitives A copy actor is an identity operator which duplicates the input token. J 1 b Þ

Advanced Computer Architecture An Example — Using the basic primitives, draw the dataflow graph of the following program: + * / x y a b Input (a,b); y := (a+b)/x; x := (a*(a+b))+b; Output (x,y);

Advanced Computer Architecture Conditional Construct One can build more complex constructions using the basic primitive structures.

Advanced Computer Architecture T-gate F-gate Input Data Generated by a Predicate Actor Then Part Else T F

Advanced Computer Architecture F T Input Data Condition F-gate T-gate Loop Body Initially False While Loop

Advanced Computer Architecture An Example — Show the dataflow graph of: input (w,x); y := x; t := 0; while t  w do begin if y > 1 then y := y ÷ 2 else y := y * 3; t := t+1; end output (y);

Advanced Computer Architecture Dataflow Architecture Dataflow computers have a data-driven organization. The data-driven concept means asynchrony. As a result, a high degree of implicit parallelism is expected in a dataflow computer. since there is no use for shared memory cells, dataflow programs are free from side effects. Finally, dataflow computations have no far-reaching effects (locality of effect).

Advanced Computer Architecture Dataflow Architecture Depending on the way data tokens are handled, dataflow computers are divided into the static model and the dynamic model. In a static dataflow machine only one token is allowed to exist on any arc at any given time. In a dynamic dataflow machine more than one token can exist in an arc.

Advanced Computer Architecture A Static Dataflow Machine System is Composed of Five Modules: Memory Section consists of instruction cells holding a dataflow instruction. Processing Section consists of processing units that perform the basic dataflow operations on data values.

Advanced Computer Architecture A Static Dataflow Machine Arbitration Network transfers operation packets from the memory section to the processing section. Distribution Network transfers the generated data packets from the processing section to the memory section. Control Network transfers control packets from the processing section to the memory section.

Advanced Computer Architecture Processing Section Unit • Control Network Control Tokens Instruction Cell Block Distribution Arbitration Memory Section Data Tokens Operation Packets • • •

Advanced Computer Architecture A Static Dataflow Machine Memory Section — The memory section holds a representation of the program to be executed and the data values. Memory section is organized into instruction cells. Each instruction cell corresponds to an actor of the dataflow program.

Advanced Computer Architecture A Static Dataflow Machine Instruction Cell Each instruction cell is composed of three words. The first word holds the operation code and the addresses of the instruction cells to which the result of the operation is to be directed. The next two words hold the operands. Each operand word may be set to behave as a constant or a variable. There are six different instruction formats.

Advanced Computer Architecture Instruction Format Operators can be of two forms: Unary operator or Binary operator:

Advanced Computer Architecture Instruction format Deciders can be of Unary or Binary types:

Advanced Computer Architecture Instruction Format Boolean operators can be Unary or Binary operators:

Advanced Computer Architecture Instruction Format Each operand value - i.e., gi - has the following format: Gate Flag Value Data Value Off No gate value control packet is received True True gate value control packet is received False False gate value control packet is received Off No data value is received On Data Value is received

Advanced Computer Architecture Instruction Format n: # of acknowledge signals expected m: # of acknowledge signals received gi: gate code ti: result-tag defines whether control packet is of gate type or data type.

Advanced Computer Architecture Instruction Format operand word became active operand word became active operand word became active D, True, (off, off, d D, True, (off, on, d) D, True, (true, off, ) D, True ,(true, on, d) True D, True, (false, off, t F

Advanced Computer Architecture A Static Dataflow Machine — An Example The following expression is assumed: Y(t) = A * X(t) + B * Y(t-1) + C * Y(t-2) Show its dataflow graph and its "simple“ representation in the memory section.

Advanced Computer Architecture A Static Dataflow Machine — An Example I out y( - 1) * + B 7 8 3 6 4 5 C A 2 x(0) in 1 2)

Advanced Computer Architecture An Example — Initialization of Memory Cells

Advanced Computer Architecture A Static Dataflow Machine Processing Section — It is a collection of five pipeline processing units: Multiplier Unit for complex operands, Adder and Subtractor Unit for complex operands, Distributor Unit to replicate and distribute complex values,

Advanced Computer Architecture A Static Dataflow Machine Integer Processor Unit for integer and test operations, and Control Processor Unit to replicate and distribute the integer and Boolean values.

Advanced Computer Architecture A Static Dataflow Machine Processing Section — Each functional unit is organized as three independent pipelines. One performs the operation and the other two carry destination addresses.

Advanced Computer Architecture A Static Dataflow Machine Instruction Packet op. code d 1 2 x y Identity Pipeline Computation Pipeline Z Result Packet

Advanced Computer Architecture A Static Dataflow Machine Arbitration Network Arbitration network is designed to establish a smooth flow of the instruction packets from the memory section to the processing section. The network is composed of five basic building blocks:

Advanced Computer Architecture A Static Dataflow Machine Arbitration Network • arbitration unit sw switch unit buf buffer unit s p serial to parallel transfer parallel to serial arb

Advanced Computer Architecture A Static Dataflow Machine Distribution Network It is designed to transfer the result packets from the processing section to the memory section. It utilizes the same basic building blocks as the arbitration network does.

Advanced Computer Architecture Static Dataflow Machine Control Network It is used to transfer Boolean values and acknowledge signals from the processing section to the memory section. Because of the very nature of the data value transferred via control networks, it is composed of the switch and arbitration units only. The control network transfers two types of tokens, namely: gate type and data type.

Advanced Computer Architecture Static Dataflow Machine Control Network  True False  Gate Type: gate, , address  True False  Data Type: Value, , address

Advanced Computer Architecture A Dynamic Dataflow Machine It is a backend system composed of five units connected as a pipeline ring around which the tokens flow. The processing unit allows concurrent execution of data graph nodes, The token queue temporarily stores tokens lying on the data graph arcs,

Advanced Computer Architecture A Dynamic Dataflow Machine The matching unit gathers pairs of tokens with the same destination node address and label, The node store represents information regarding the dataflow graph, and The switch unit establishes communication between the frontend and backend processor, and reroutes the resultant tokens back to the pipeline ring.

Advanced Computer Architecture A Dynamic Dataflow Machine Each unit in the pipeline is internally synchronous, but communications with other units is based on a standard asynchronous protocol.

Advanced Computer Architecture A Dynamic Dataflow Machine Switch Unit tokens From Host To Host Token Queue Matching Unit token pairs node store inst. packets processing unit

Advanced Computer Architecture A Dynamic Dataflow Machine Token Queue It is a static RAM FIFO buffer of size 16k * 96 bits which allows a read and a write to be performed in a single 200 hsec pipeline period. multiple bank structures could be considered, however, it was not considered in the prototype due to its complexity.

Advanced Computer Architecture A Dynamic Dataflow Machine Each token, is comprised of 96 bits, has the following format: (10 bits) Label (36 bits) Destination Address (18 bits) Token Value (32 bits) Type Information and Control

Advanced Computer Architecture A Dynamic Dataflow Machine Matching Unit It is associative in nature and a critical part of the machine operation. It should provide storage for a large number of tokens awaiting their matching pairs. It is composed of 8 banks of 2k * 96 bits.

Advanced Computer Architecture A Dynamic Dataflow Machine Matching Unit An 11-bit hash addressed from the 54-bit label and destination address parts of each incoming token is generated. This address references 8 memory words in the eight parallel banks.

Advanced Computer Architecture A Dynamic Dataflow Machine Matching Unit Tokens (if any) at the addressed locations are compared with the incoming token. If a match is found, a token-pair is generated and passed to the node store, otherwise, it is written to any parallel bank awaiting its pair. An unsuccessful match takes 320 hsec and a successful match requires 240 hsec.

Advanced Computer Architecture A Dynamic Dataflow Machine Node Store It is a 16K * 72-bit memory with a 200 hsec access time augmented by a segment table. It is used to store dataflow graphs. Each instruction has the following format: (10 bits) op. code (12 bits) Destination address1 (18 bits) Destination address2 or literal (32 bits) Type information and control

Dataflow Processing A Dynamic Dataflow Machine Node Store Upon receiving a token pair, an instruction packet is generated and passed to the processing unit. (10 bits) op. code (12 bits) Destination address1 Destination address2 or literal Type information and control Label (36 bits) operand1 (32 bits) operand2 (18 bits)

Advanced Computer Architecture A Dynamic Dataflow Machine Processing Unit It is a writeable micro-program processor consisting of two pipeline stages. The first stage handles simple label operations and gathers some performance statistics. The second stage is a parallel array of 15 processing elements. Each element is capable of performing 24-bit integer or 32-bit floating-point arithmetic operation.

Advanced Computer Architecture A Dynamic Dataflow Machine Processing Unit The microinstruction cycle time of each processor is 200 hsec. an instruction requires about five to 50 microinstructions (giving an average running time of 4.5 µsec).

Advanced Computer Architecture A Dynamic Dataflow Machine An Example The following expression is assumed, show its representation in the node store and initial tokens. A = (W * X) + (Y * Z)

Advanced Computer Architecture A Dynamic Dataflow Machine An Example W X Y Z * * + A

Advanced Computer Architecture A Dynamic Dataflow Machine An Example — Node Store -- Output + 3 3 R.H. * 2 3 L.H. 1 Dest. Address2 Dest Address1 op. code Address

Dataflow Processing A Dynamic Dataflow Machine An Example — Initial Tokens Control Label Inst. Address Value no ? 1 L.H. W 1 R.H. X 2 L.H. Y 2 R.H. Z