Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network-on-Chip Introduction Axel Jantsch / Ingo Sander

Similar presentations


Presentation on theme: "Network-on-Chip Introduction Axel Jantsch / Ingo Sander"— Presentation transcript:

1 Network-on-Chip Introduction Axel Jantsch / Ingo Sander axel@kth.se

2 May 26, 2016SoC Architecture2 Network-on-Chip Today buses are the dominating technology for system-on-chips However, buses have severe limitations that become evident, if the number of components in a system is large The bus is a communication bottleneck, bandwidth is limited Buses are only scalable to a certain extent Networks-on-Chip shall overcome the limitation of buses, since the provide a much larger amount of communication resources and are scalable

3 May 26, 2016SoC Architecture3 A Network-on-Chip S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T Terminal Node Switch Channel

4 May 26, 2016SoC Architecture4 Network-on-Chip A terminal node can be any kind of component like Processor Memory Hardware component Bus-based system with several components, e.g. Processor and Memory S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

5 May 26, 2016SoC Architecture5 Network-on-Chip Information in the form of packets is routed via channels and switches from one terminal node to another S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

6 May 26, 2016SoC Architecture6 Network Interface Different terminals with different interfaces shall be connected to the network The network uses a specific protocol and all traffic on the network has to comply to the format of this protocol Switch Network Interface Terminal Node (Resource)

7 May 26, 2016SoC Architecture7 Network Interface In order to allow for different resources to connect to the network, the network interface can be divided into A resource independent part (Network Interface) A resource dependent part (Resource Network Interface) This is also the solution for the Nostrum NoC (developed at KTH) Switch Network Interface Resource Network Interface Terminal Node (Resource)

8 May 26, 2016SoC Architecture8 Network abstractions International Standards Organization (ISO) developed the Open Systems Interconnection (OSI) model to describe networks: 7-layer model Provides a standard way to classify network components and operations Networks-on-Chips use a similar protocol stack corresponding to the 4 lowest layers of the OSI protocol

9 May 26, 2016SoC Architecture9 OSI model physical mechanical, electrical data link reliable data transport network end-to-end service transport connections presentation data format session application dialog control application end-use interface

10 May 26, 2016SoC Architecture10 OSI layers Physical: connectors, bit formats, electrical properties Data link: error detection and control across a single link (single hop). Network: end-to-end multi-hop data communication Transport: connection-oriented services over multiple links, e.g. ordering of packets, errorfree connection

11 May 26, 2016SoC Architecture11 OSI layers, cont’d. Session: services for end-user applications: data grouping, checkpointing, etc Presentation: data formats, transformation services Application: interface between network and end-user programs

12 May 26, 2016SoC Architecture12 Internet Protocol (not an on-chip protocol!) physical data link network transport presentation application session physical data link network transport presentation application session physical data link network node A routernode B IP

13 May 26, 2016SoC Architecture13 Units of Resource Allocation A message is a continuous group of bits that is delivered from source terminal to destination terminal. A message consists of packets. A packet is the basic unit for routing and sequencing. Packets maybe divided into flits. A flit (flow control digits) is the basic unit of bandwidth and storage allocation. Flits do not have any routing or sequence information and have to follow the route for the whole packet. A phit (physical transfer digits) is the unit that is transfered across a channel in a single clock cycle.

14 May 26, 2016SoC Architecture14 Units of Resource Allocation Message RISN Header Head FlitBody FlitTail Flit Packet TypeVC Body Flit Packet Flit Phit Messages, Packets, Flits and Phits are handled in different layers of the network protocol

15 May 26, 2016SoC Architecture15 Performance Factors Factors that influence the performance of a network-on- chip are Topology (static arrangement of channels and nodes) Routing Techniques (selection of a path through the network) Switching Techniques (How a route is traversed) Flow Control (how are network resources allocated, if packets traverse the network) Router Architecture (buffers and switches) Traffic Pattern S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

16 Network-on-Chip Topologies Axel Jantsch / Ingo Sander axel@kth.se Dally: Ch 3, (4), 5

17 May 26, 2016SoC Architecture17 Network Topology The network topology refers to the static arrangement of channels and nodes in the network A good topology allows to fulfill the requirements of the traffic at reasonable costs Network topology can be compared with a network of roads

18 May 26, 2016SoC Architecture18 Topology Examples 0123 4-node ring 00010203 10111213 20212223 30313233 ”4x4”-Torus 4 5 6 7 0 1 2 3 00 01 02 03 1020 1121 1222 1323 4 5 6 7 0 1 2 3 Butterfly with 8 nodes

19 May 26, 2016SoC Architecture19 Rings, Tori and Meshes 0123 4-node ring (4-ary 1-cube) 00010203 10111213 20212223 30313233 ”4x4”-Torus (4-ary 2-cube) 00010203101112132021222330313233 ”4x4”-Mesh (4-ary 2-mesh)

20 May 26, 2016SoC Architecture20 Combined Node consists of Terminal and Switch Node Combined Node is equivalent to Switch Node Terminal Node

21 May 26, 2016SoC Architecture21 Nomenclature Network-on-Chip The topology of an interconnection network is specified by a set of nodes N * connected by a set of channels C Messages originate and terminate in set of terminal nodes N, where N  N * Here: N = N * = 16 C = 16 ∙ 4 = 64 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional)

22 May 26, 2016SoC Architecture22 Nomenclature Network-on-Chip Each channel c = (x,y) ∈ C connects a source node x to a destination node y, where x, y ∈ N * A channel is characterized by its width w c or w xy, which is the number of parallel signals it contains The source node of a channel is denoted s c and the destination node d c 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional)

23 May 26, 2016SoC Architecture23 Nomenclature Network-on-Chip Its frequency f c or f xy is the rate at which bits are transported on a signal Its latency t c or t xy is the time required for a bit to travel from x to y Usually the latency is directly related to the physical length of the channel l c = vt c of the by a propagation velocity v The bandwidth of the channel is b c = w c f c 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional)

24 May 26, 2016SoC Architecture24 Nomenclature Network-on-Chip Each switch node x has a channel set C x = C Ix ⋃ C Ox, where C Ix = {c ∈ C | d c = x} is the input channel set C Ox = {c ∈ C | s c = x} is the output channel set The degree of x is δ x = |C x |, which is the sum of the in degree and out degree 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional)

25 May 26, 2016SoC Architecture25 Direct and Indirect Networks Direct Network Every Node in the network is both a terminal and a switch 00010203 10111213 20212223 30313233 Direct Network 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 Indirect Network Nodes are either switches or terminal

26 May 26, 2016SoC Architecture26 Bisection of a network A bisection of a network is a cut that partitions the entire network nearly in half The channel bisection of a network is the minimum channel count over all bisections of the network The bisection bandwidth of a network is the minimum bandwidth over all bisections of the network

27 May 26, 2016SoC Architecture27 Bisection Channel bisection B C = 4 (2 bidirectional channels go through the bisection) Bandwidth bisection B B = 4b (b is the bandwidth of each channel) 0123 4-node ring

28 May 26, 2016SoC Architecture28 Bisection Bandwidth Mesh – Uniform Traffic B T … total channel count B C … bisection E … Emitted packets per cycle H avg … average hop count Total Load: Balance: Bisection Load: Balance: kT-LimitB-Limit 23/2=1.52 49/8=1.1251 65/6=0.832/3=0.67 821/32=0.656½=0.5 1027/50=0.542/5=0.4

29 May 26, 2016SoC Architecture29 Paths A path is an ordered set of channels P = { c 1, c 2,...,c n }, where d c,i = s c,i+1 for i = 1... (n - 1) The length or hop count of a path is |P | A minimal path from node x to node y is a path with the smallest hop-count 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional) Minimal Path (|P| = 3) Non-Minimal Path (|P| = 5)

30 May 26, 2016SoC Architecture30 Paths The set of all minimal paths between x and y is denoted R xy 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional) Minimal Paths (|P| = 3)

31 May 26, 2016SoC Architecture31 Paths The diameter H max is the largest minimal hop count over all pairs of terminal nodes 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional) Largest minimal hop count (Diameter H max = 4)

32 May 26, 2016SoC Architecture32 Paths The average minimum hop count H min is defined as the average hop count over all sources and destinations Here: H min = 2 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional) 0121 1232 2343 1232 Distance in hops from node 00

33 May 26, 2016SoC Architecture33 Paths A specific implementation may choose to incorporate some non-minimal path Then the actual average hop count H avg is defined over the path used by the network 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional) Non-Minimal Path (|P| = 5)

34 May 26, 2016SoC Architecture34 Paths The physical distance of the path is 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional) Non-Minimal Path (|P| = 5) The delay of the path is

35 May 26, 2016SoC Architecture35 Traffic Patterns The traffic pattern is a very important factor for the performance of a network In uniform random traffic each source is equally likely to send to each destination Uniform random traffic is the most commonly used traffic pattern, however it implies a balancing of the load, which often does not cause a problem for the network

36 May 26, 2016SoC Architecture36 Throughput The throughput of a network is the data rate in bits per second that the networks accepts per input port The topology of a network has a significant impact on the throughput (besides flow control and routing) The ideal throughput is defined as the throughput assuming a perfect routing and flow control Load is balanced over alternate paths No idle cycles on bottleneck channels

37 May 26, 2016SoC Architecture37 Throughput Maximum throughput occurs, if some channel of the network becomes saturated The channel load  of a channel is the ratio of the bandwidth demanded from the channel to the bandwidth of the input ports (in other words) the amount of traffic that must cross the channel, if each input unit injects one unit of traffic according to the given traffic pattern The channel that carries the largest fraction of the traffic determines the maximum channel load  max

38 May 26, 2016SoC Architecture38 Throughput The ideal throughput  ideal is the input bandwidth that saturates the bottleneck channel  ideal = b /  max In general it is difficult to determine the maximum channel load  max, but in case for uniform traffic, bounds can be found. Use the ideal throughput of a network on uniform traffic  ideal (U) as the capacity of the network.

39 May 26, 2016SoC Architecture39 Ideal Throughput in a Torus Assuming uniform traffic, 50% of the packets cross the bisection channels Best throughput, if packets are evenly distributed over the bisection channels Load on these channels is then  B = N / 2B C Thus  max ≥  B = N / 2B C And the ideal throughput is  ideal = b /  max ≤ 2bB C /N 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional)

40 May 26, 2016SoC Architecture40 Ideal Throughput in a Torus Thus  max ≥  B = N / 2B C  ideal = b /  max ≤ 2bB C /N Example: 4 x 4 torus:  B = 16 / 2B C = 8/16 = ½ 4 x 4 mesh:  B = 16 / 2B C = 8/8 = 1 n x n torus:  B = n 2 / 2B C = n 2 /(2 · 4n) = n/8 n x n mesh:  B = n 2 / 2B C = n 2 /4n = n/4 00010203 10111213 20212223 30313233 ”4x4”-Torus (Channels are bidirectional)

41 May 26, 2016SoC Architecture41 Another useful lower bound on channel load A packet needs H min hops to be delivered There are C channels in the network We have N nodes sending packets With equal load, we get a lower bound for

42 May 26, 2016SoC Architecture42 Lower Bounds on Bottlneck Hop count bound: Bisection bound:

43 May 26, 2016SoC Architecture43 Latency The latency of the network is the time required for a message to traverse a network, from the time head arrives at the input port to the time where the tail of the mesage departs the output port Latency depends not only on topology, but also on routing, flow control and the design of the router Topology gives a lower bound on latency

44 May 26, 2016SoC Architecture44 Latency There are two latency components: Head latency T h : Time required for head of the message to traverse the network Serialization latency T s = L/b : Time required for the tail to catch up (time for a message of length L to cross a channel with bandwidth b )

45 May 26, 2016SoC Architecture45 Head Latency Head latency depends on two topology factors Router delay T r (time spent in the routers) and time of flight T w (time spent on wires) T r = H min t r T w = D min / v (average distance D min, propagation velocity v )

46 May 26, 2016SoC Architecture46 Latency Together this gives average latency: T 0 = H min t r + D min / v + L / b (no congestion) Clearly H min, D min, and b are to a large extent determined by the topology If there is congestion in the network there is a forth term T C

47 May 26, 2016SoC Architecture47 Latency with time-space diagram trtr trtr t xy L/b Arrival at node x Leave x Arrival at node y Leave y Arrival at switch z x z Head Tail y

48 May 26, 2016SoC Architecture48 Examples The network has N = 64 nodes H min = 4 Channel width w c = 16 Channel frequency f c = 1 GHz Channel latency t c = 5 ns Router delay t r = 8 ns Packet Length L = 64 bytes T r = H min t r = 4 ∙ 8 ns = 32 ns T w = H min t c = 4 ∙ 5 ns = 20 ns T s = L / b = L / (f c w c ) = 64 ∙ 8 / (1 GHz ∙ 16) = 512 / 16 ns = 32 ns T 0 = 32 ns + 20 ns + 32 ns = 84 ns

49 May 26, 2016SoC Architecture49 To get a feeling about NoC (Toy example) The network has N = 64 nodes H min = 2 ∙ 8 / 3 = 5.33 Channel width w c = 32 Channel frequency f c = 1 GHz Channel latency t c = 1 ns Router delay t r = 1 ns Packet Length L = 16 bytes

50 May 26, 2016SoC Architecture50 To get a feeling about NoC (Toy example) The network has N = 64 nodes H min = 2 ∙ 8 / 3 = 5.33 Channel width w c = 32 Channel frequency f c = 1 GHz Channel latency t c = 1 ns Router delay t r = 1 ns Packet Length L = 16 bytes T r = H min t r = 5.33 ∙ 1ns = 5.33 ns T w = H min t c = 5.33 ns T s = L / b = L / (f c w c ) = 16 ∙ 8 / (1 GHz ∙ 32) = 128 / 32 ns = 4 ns T 0 = 5.33 ns + 5.33 ns + 4 ns = 14.66 ns

51 May 26, 2016SoC Architecture51 Path Diversity A network with multiple minimal paths between most pairs of nodes is more robust than a network that has only one single route between the nodes

52 May 26, 2016SoC Architecture52 Path Diversity Random Traffic Each node is equally likely to send a message to any other node 50% of the packets pass the bisection  max = 1 4 5 6 7 0 1 2 3 00 01 02 03 1020 1121 1222 1323 4 5 6 7 0 1 2 3 Butterfly with 8 nodes

53 May 26, 2016SoC Architecture53 Traffic Patterns The performance of a network is strongly depending on the traffic pattern The table below shows a number of different traffic patterns that can be used to analyze the performance of the network

54 May 26, 2016SoC Architecture54 Path Diversity Bit Rotation Traffic The node with address { b 2, b 1, b 0 } sends to { b 1, b 0, b 2 } Thus we get the following permutation { 0, 2, 4, 6, 1, 3, 5, 7 } Thus packets from nodes {0,1,4, 5} will all have to pass switch node 10  max,BR = 4 (since for instance channel 00, 10 is used by two connections) Max capacity: 25% 4 5 6 7 0 1 2 3 00 01 02 03 1020 1121 1222 1323 4 5 6 7 0 1 2 3 Butterfly with 8 nodes

55 May 26, 2016SoC Architecture55 Torus and Mesh Networks Torus and Mesh networks, k-ary n-cubes, pack N = k n nodes. Advantages Regular structure allows efficient packaging For local communication latency is low Good path diversity Disadvantage Comparably larger hop count

56 May 26, 2016SoC Architecture56 Rings, Tori and Meshes 0123 4-node ring (4-ary 1-cube) 00010203 10111213 20212223 30313233 ”4x4”-Torus (4-ary 2-cube) 00010203101112132021222330313233 ”4x4”-Mesh (4-ary 2-mesh)

57 May 26, 2016SoC Architecture57 Properties of Tori and Meshes Torus Channel Bisection B C,T = 4 N / k Channel load under uniform traffic (50% of traffic crosses bisection)  T,U = k / 8 Channel load under worst traffic (100% of traffic crosses bisection)  T,W = k / 4 Average minimum hop count (k even) H min, T = nk / 4 Mesh Channel Bisection B C,M = 2 N / k Channel load under uniform traffic (50% of traffic crosses bisection)  M,U = k / 4 Channel load under worst traffic (100% of traffic crosses bisection)  M,W = k / 2 Average minimum hop count (k even) H min, M = nk / 3

58 May 26, 2016SoC Architecture58 Physical implementation of Mesh and Tori In order to implement a network on a chip, the abstract nodes of the network must be mapped to real positions in physical space A goal is to have the same latency for all channels

59 May 26, 2016SoC Architecture59 Folding networks leads to shorter largest channel length Folded 4-ary 2 cube

60 May 26, 2016SoC Architecture60 Summary The topology is an important factor of the network Mesh and Tori offer a huge amount of bandwidth and path diversity Performance is dependent on the traffic pattern


Download ppt "Network-on-Chip Introduction Axel Jantsch / Ingo Sander"

Similar presentations


Ads by Google