Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Computer Architecture Dataflow Processing

Similar presentations


Presentation on theme: "Advanced Computer Architecture Dataflow Processing"— Presentation transcript:

1 Advanced Computer Architecture Dataflow Processing
A.R. Hurson 128 EECH Building, Missouri S&T

2 Advanced Computer Architecture
Control Flow Computation Operands are accessed by their addresses. Shared memory cells are the means by which data is passed between instructions. Flow of control is implicitly sequential, but special control instructions can be introduced to explicitly identify concurrency.

3 Advanced Computer Architecture
Control Flow Computation Program Counter(s) is (are) used to sequence the execution of instructions in a centralized environment.

4 Advanced Computer Architecture
A dataflow program is a program with a partial ordering defined by the data interdependencies. In a dataflow program the activation (execution) of an instruction is triggered (fired) by the availability of its input data.

5 Advanced Computer Architecture
+ a b + + a+b Þ

6 Advanced Computer Architecture
input (a,b,c) a := 2*a b := -b/a c := b2 -2*a*c c := sqrt(c) c := c/a a := b+c b := b-c output(a,b) Data Dependencies

7 Advanced Computer Architecture
Dataflow Principles The dataflow model of computation deviates from the conventional control-flow method in two basic principles: asynchrony and functionality:

8 Advanced Computer Architecture
Dataflow Principles Asynchrony: an instruction is fired (executed) only when all the required operands are available. Functionality: any two enabled instructions can be executed in either order or concurrently — i.e., no side-effects.

9 Advanced Computer Architecture
Dataflow Principles Within the scope of dataflow processing, implicit parallelism is achieved by allowing side-effect free expressions and functions to be evaluated in parallel.

10 Advanced Computer Architecture
Dataflow Principles In a dataflow environment, conventional concepts such as variables and memory updating are non-existent. Objects (operand values) are consumed by an actor (instruction) yielding a result object which is passed to the next actor(s).

11 Advanced Computer Architecture
Dataflow Principles Within the scope of a concurrent environment, dataflow computation addresses the programmability, memory latency, and synchronization issues.

12 Advanced Computer Architecture
Questions Define; programmability, memory latency, and synchronization. How have these issues been addressed in the conventional multiprocessor systems? Why does the dataflow model of computation offer good solutions for these problems?

13 Advanced Computer Architecture
Classification The dataflow model of computation has been traditionally classified as either static or dynamic:

14 Advanced Computer Architecture
In the static organization, a dataflow actor can be executed only when all of the tokens are available on its input arcs and no token exists on any of its output arcs. In the dynamic organization, a dataflow actor can be enabled only when all of the tokens of the same tag (color) are available on its input arc

15 Advanced Computer Architecture
Dataflow Graph A dataflow program can be represented as a directed graph, G = G(N,A), where nodes (actors) in N represent instructions, and arcs in A represent data dependencies among the nodes. The operands are conveyed from one node to another in data packets called tokens via the arcs.

16 Advanced Computer Architecture
b - + * Dataflow Graph (a+b) - (a*b)

17 Advanced Computer Architecture
. . Advanced Computer Architecture - * + 1 ready to fire 2 4 Fall 2012

18 Advanced Computer Architecture

19 Advanced Computer Architecture
- * + 3 ready to fire 6 8

20 Advanced Computer Architecture
- * + 4 - 2 Fall 2012

21 Advanced Computer Architecture
b a 2 c * - sqrt / neg +

22 Advanced Computer Architecture
Dataflow Computation Data are stored in the instructions — i.e., no shared memory. Data are passed among instructions as tokens. An instruction independent of other instructions can begin its execution as soon as it is ready to be fired — e.g., firing rules for static and dynamic environments.

23 Advanced Computer Architecture
(b) (c) ( ) g /1 a ( ) ( ) /2 b * ( ) ( ) d a = (b+1) * (b+c) An Example

24 Advanced Computer Architecture
The Basic Primitives In a dataflow graph two types of links are distinguished, the data link, and the Boolean link. A data link is used to pass data tokens — i.e., real numbers, integers, ... — among the arcs. Data Link

25 Advanced Computer Architecture
The Basic Primitives A Boolean link is used to pass control tokens among the arcs. Boolean Link

26 Advanced Computer Architecture
The Basic Primitives Operators: a data value is produced by an operator as a result of some function f. f J 1 n g = f( , ... ) Þ

27 Advanced Computer Architecture
The Basic Primitives Decider: a true or false control value is generated by a decider depending on its input tokens. The control token produced at a decider can be combined with other control tokens by means of a Boolean operator. J 1 n P b = P( , ..., ) Þ

28 Advanced Computer Architecture
An Example NOR Operator NOR T F Þ NOR T F Þ

29 Advanced Computer Architecture
An Example NOR Operator NOR F T Þ NOR F T Þ

30 Advanced Computer Architecture
The Basic Primitives Control tokens direct the flow of data tokens by means of T-gates, F-gates, and merge actors. T-gate F-gate T F merge

31 Advanced Computer Architecture
The Basic Primitives A T-gate passes the data token on its input arc to its output arc when it receives a control token conveying the value true. T-gate T Þ Ú T-gate F Þ Ú

32 Advanced Computer Architecture
The Basic Primitives An F-gate will pass its input data token to its output arc only on the False value token on its control input. F-gate T Þ Ú Ú F-gate F Þ

33 Advanced Computer Architecture
The Basic Primitives A merge actor has a true input, a false input and a control input. It passes to its output arc a data token from the input arc corresponding to the value of the control token received. Any token on the other input is not affected.

34 Advanced Computer Architecture
T F F J 1 2 Þ T F T J 1 Þ

35 Advanced Computer Architecture
The Basic Primitives A switch actor is a combination of T-gate and F-gate. It directs an input data token to one of its output arcs depending on the control input. T F F J 1 Þ

36 Advanced Computer Architecture
The Basic Primitives A copy actor is an identity operator which duplicates the input token. J 1 b Þ

37 Advanced Computer Architecture
An Example — Using the basic primitives, draw the dataflow graph of the following program: + * / x y a b Input (a,b); y := (a+b)/x; x := (a*(a+b))+b; Output (x,y);

38 Advanced Computer Architecture
Conditional Construct One can build more complex constructions using the basic primitive structures.

39 Advanced Computer Architecture
T-gate F-gate Input Data Generated by a Predicate Actor Then Part Else T F

40 Advanced Computer Architecture
F T Input Data Condition F-gate T-gate Loop Body Initially False While Loop

41 Advanced Computer Architecture
An Example — Show the dataflow graph of: input (w,x); y := x; t := 0; while t  w do begin if y > 1 then y := y ÷ 2 else y := y * 3; t := t+1; end output (y);

42 Advanced Computer Architecture
Dataflow Architecture Dataflow computers have a data-driven organization. The data-driven concept means asynchrony. As a result, a high degree of implicit parallelism is expected in a dataflow computer. since there is no use for shared memory cells, dataflow programs are free from side effects. Finally, dataflow computations have no far-reaching effects (locality of effect).

43 Advanced Computer Architecture
Dataflow Architecture Depending on the way data tokens are handled, dataflow computers are divided into the static model and the dynamic model. In a static dataflow machine only one token is allowed to exist on any arc at any given time. In a dynamic dataflow machine more than one token can exist in an arc.

44 Advanced Computer Architecture
A Static Dataflow Machine System is Composed of Five Modules: Memory Section consists of instruction cells holding a dataflow instruction. Processing Section consists of processing units that perform the basic dataflow operations on data values.

45 Advanced Computer Architecture
A Static Dataflow Machine Arbitration Network transfers operation packets from the memory section to the processing section. Distribution Network transfers the generated data packets from the processing section to the memory section. Control Network transfers control packets from the processing section to the memory section.

46 Advanced Computer Architecture
Processing Section Unit Control Network Control Tokens Instruction Cell Block Distribution Arbitration Memory Section Data Tokens Operation Packets • • •

47 Advanced Computer Architecture
A Static Dataflow Machine Memory Section — The memory section holds a representation of the program to be executed and the data values. Memory section is organized into instruction cells. Each instruction cell corresponds to an actor of the dataflow program.

48 Advanced Computer Architecture
A Static Dataflow Machine Instruction Cell Each instruction cell is composed of three words. The first word holds the operation code and the addresses of the instruction cells to which the result of the operation is to be directed. The next two words hold the operands. Each operand word may be set to behave as a constant or a variable. There are six different instruction formats.

49 Advanced Computer Architecture
Instruction Format Operators can be of two forms: Unary operator or Binary operator:

50 Advanced Computer Architecture
Instruction format Deciders can be of Unary or Binary types:

51 Advanced Computer Architecture
Instruction Format Boolean operators can be Unary or Binary operators:

52 Advanced Computer Architecture
Instruction Format Each operand value - i.e., gi - has the following format: Gate Flag Value Data Value Off No gate value control packet is received True True gate value control packet is received False False gate value control packet is received Off No data value is received On Data Value is received

53 Advanced Computer Architecture
Instruction Format n: # of acknowledge signals expected m: # of acknowledge signals received gi: gate code ti: result-tag defines whether control packet is of gate type or data type.

54 Advanced Computer Architecture
Instruction Format operand word became active operand word became active operand word became active D, True, (off, off, d D, True, (off, on, d) D, True, (true, off, ) D, True ,(true, on, d) True D, True, (false, off, t F

55 Advanced Computer Architecture
A Static Dataflow Machine — An Example The following expression is assumed: Y(t) = A * X(t) + B * Y(t-1) + C * Y(t-2) Show its dataflow graph and its "simple“ representation in the memory section.

56 Advanced Computer Architecture
A Static Dataflow Machine — An Example I out y( - 1) * + B 7 8 3 6 4 5 C A 2 x(0) in 1 2)

57 Advanced Computer Architecture
An Example — Initialization of Memory Cells

58 Advanced Computer Architecture
A Static Dataflow Machine Processing Section — It is a collection of five pipeline processing units: Multiplier Unit for complex operands, Adder and Subtractor Unit for complex operands, Distributor Unit to replicate and distribute complex values,

59 Advanced Computer Architecture
A Static Dataflow Machine Integer Processor Unit for integer and test operations, and Control Processor Unit to replicate and distribute the integer and Boolean values.

60 Advanced Computer Architecture
A Static Dataflow Machine Processing Section — Each functional unit is organized as three independent pipelines. One performs the operation and the other two carry destination addresses.

61 Advanced Computer Architecture
A Static Dataflow Machine Instruction Packet op. code d 1 2 x y Identity Pipeline Computation Pipeline Z Result Packet

62 Advanced Computer Architecture
A Static Dataflow Machine Arbitration Network Arbitration network is designed to establish a smooth flow of the instruction packets from the memory section to the processing section. The network is composed of five basic building blocks:

63 Advanced Computer Architecture
A Static Dataflow Machine Arbitration Network arbitration unit sw switch unit buf buffer unit s p serial to parallel transfer parallel to serial arb

64 Advanced Computer Architecture
A Static Dataflow Machine Distribution Network It is designed to transfer the result packets from the processing section to the memory section. It utilizes the same basic building blocks as the arbitration network does.

65 Advanced Computer Architecture
Static Dataflow Machine Control Network It is used to transfer Boolean values and acknowledge signals from the processing section to the memory section. Because of the very nature of the data value transferred via control networks, it is composed of the switch and arbitration units only. The control network transfers two types of tokens, namely: gate type and data type.

66 Advanced Computer Architecture
Static Dataflow Machine Control Network True False Gate Type: gate, , address True False Data Type: Value, , address

67 Advanced Computer Architecture
A Dynamic Dataflow Machine It is a backend system composed of five units connected as a pipeline ring around which the tokens flow. The processing unit allows concurrent execution of data graph nodes, The token queue temporarily stores tokens lying on the data graph arcs,

68 Advanced Computer Architecture
A Dynamic Dataflow Machine The matching unit gathers pairs of tokens with the same destination node address and label, The node store represents information regarding the dataflow graph, and The switch unit establishes communication between the frontend and backend processor, and reroutes the resultant tokens back to the pipeline ring.

69 Advanced Computer Architecture
A Dynamic Dataflow Machine Each unit in the pipeline is internally synchronous, but communications with other units is based on a standard asynchronous protocol.

70 Advanced Computer Architecture
A Dynamic Dataflow Machine Switch Unit tokens From Host To Host Token Queue Matching Unit token pairs node store inst. packets processing unit

71 Advanced Computer Architecture
A Dynamic Dataflow Machine Token Queue It is a static RAM FIFO buffer of size 16k * 96 bits which allows a read and a write to be performed in a single 200 hsec pipeline period. multiple bank structures could be considered, however, it was not considered in the prototype due to its complexity.

72 Advanced Computer Architecture
A Dynamic Dataflow Machine Each token, is comprised of 96 bits, has the following format: (10 bits) Label (36 bits) Destination Address (18 bits) Token Value (32 bits) Type Information and Control

73 Advanced Computer Architecture
A Dynamic Dataflow Machine Matching Unit It is associative in nature and a critical part of the machine operation. It should provide storage for a large number of tokens awaiting their matching pairs. It is composed of 8 banks of 2k * 96 bits.

74 Advanced Computer Architecture
A Dynamic Dataflow Machine Matching Unit An 11-bit hash addressed from the 54-bit label and destination address parts of each incoming token is generated. This address references 8 memory words in the eight parallel banks.

75 Advanced Computer Architecture
A Dynamic Dataflow Machine Matching Unit Tokens (if any) at the addressed locations are compared with the incoming token. If a match is found, a token-pair is generated and passed to the node store, otherwise, it is written to any parallel bank awaiting its pair. An unsuccessful match takes 320 hsec and a successful match requires 240 hsec.

76 Advanced Computer Architecture
A Dynamic Dataflow Machine Node Store It is a 16K * 72-bit memory with a 200 hsec access time augmented by a segment table. It is used to store dataflow graphs. Each instruction has the following format: (10 bits) op. code (12 bits) Destination address1 (18 bits) Destination address2 or literal (32 bits) Type information and control

77 Dataflow Processing A Dynamic Dataflow Machine Node Store
Upon receiving a token pair, an instruction packet is generated and passed to the processing unit. (10 bits) op. code (12 bits) Destination address1 Destination address2 or literal Type information and control Label (36 bits) operand1 (32 bits) operand2 (18 bits)

78 Advanced Computer Architecture
A Dynamic Dataflow Machine Processing Unit It is a writeable micro-program processor consisting of two pipeline stages. The first stage handles simple label operations and gathers some performance statistics. The second stage is a parallel array of 15 processing elements. Each element is capable of performing 24-bit integer or 32-bit floating-point arithmetic operation.

79 Advanced Computer Architecture
A Dynamic Dataflow Machine Processing Unit The microinstruction cycle time of each processor is 200 hsec. an instruction requires about five to 50 microinstructions (giving an average running time of 4.5 µsec).

80 Advanced Computer Architecture
A Dynamic Dataflow Machine An Example The following expression is assumed, show its representation in the node store and initial tokens. A = (W * X) + (Y * Z)

81 Advanced Computer Architecture
A Dynamic Dataflow Machine An Example W X Y Z * * + A

82 Advanced Computer Architecture
A Dynamic Dataflow Machine An Example — Node Store  -- Output + 3 3 R.H. * 2 3 L.H. 1 Dest. Address2 Dest Address1 op. code Address

83 Dataflow Processing A Dynamic Dataflow Machine
An Example — Initial Tokens Control Label Inst. Address Value no ? 1 L.H. W 1 R.H. X 2 L.H. Y 2 R.H. Z


Download ppt "Advanced Computer Architecture Dataflow Processing"

Similar presentations


Ads by Google