Network-on-Chip Introduction Axel Jantsch / Ingo Sander

Slides:



Advertisements
Similar presentations
Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Answers of Exercise 7 1. Explain what are the connection-oriented communication and the connectionless communication. Give some examples for each of the.
Optical communications & networking - an Overview
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
ECE669 L12: Interconnection Network Performance March 9, 2004 ECE 669 Parallel Computer Architecture Lecture 12 Interconnection Network Performance.
Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]
1 Chapter 9 Computer Networks. 2 Chapter Topics OSI network layers Network Topology Media access control Addressing and routing Network hardware Network.
1 Fall 2005 Internetworking: Concepts, Architecture and TCP/IP Layering Qutaibah Malluhi CSE Department Qatar University.
Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.
The OSI Model and the TCP/IP Protocol Suite
Communication operations Efficient Parallel Algorithms COMP308.
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Chapter 2 Network Models.
Storage area network and System area network (SAN)
COMPUTER NETWORKS.
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
OIS Model TCP/IP Model.
Switching, routing, and flow control in interconnection networks.
Interconnect Network Topologies
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Interconnect Networks
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Presentation on Osi & TCP/IP MODEL
Lecture 2 TCP/IP Protocol Suite Reference: TCP/IP Protocol Suite, 4 th Edition (chapter 2) 1.
Protocol Layering Chapter 10. Looked at: Architectural foundations of internetworking Architectural foundations of internetworking Forwarding of datagrams.
Data Comm. & Networks Instructor: Ibrahim Tariq Lecture 3.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 2:00-3:00 PM.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
Computer Networks Performance Metrics. Performance Metrics Outline Generic Performance Metrics Network performance Measures Components of Hop and End-to-End.
1 Dynamic Interconnection Networks Miodrag Bolic.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Network Models.
Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Super computers Parallel Processing
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 CH. 8: SWITCHING & DATAGRAM NETWORKS 7.1.
SYSTEM ADMINISTRATION Chapter 2 The OSI Model. The OSI Model was designed by the International Standards Organization (ISO) as a structural framework.
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Switching By, B. R. Chandavarkar, CSE Dept., NITK, Surathkal Ref: B. A. Forouzan, 5 th Edition.
Network Models.
Interconnection Networks: Topology
Lecture 23: Interconnection Networks
Azeddien M. Sllame, Amani Hasan Abdelkader
IOS Network Model 2nd semester
Chapter 3: Open Systems Interconnection (OSI) Model
Switching, routing, and flow control in interconnection networks
Interconnection Network Design Lecture 14
Communication operations
Storage area network and System area network (SAN)
Embedded Computer Architecture 5SAI0 Interconnection Networks
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

Network-on-Chip Introduction Axel Jantsch / Ingo Sander

May 26, 2016SoC Architecture2 Network-on-Chip Today buses are the dominating technology for system-on-chips However, buses have severe limitations that become evident, if the number of components in a system is large The bus is a communication bottleneck, bandwidth is limited Buses are only scalable to a certain extent Networks-on-Chip shall overcome the limitation of buses, since the provide a much larger amount of communication resources and are scalable

May 26, 2016SoC Architecture3 A Network-on-Chip S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T Terminal Node Switch Channel

May 26, 2016SoC Architecture4 Network-on-Chip A terminal node can be any kind of component like Processor Memory Hardware component Bus-based system with several components, e.g. Processor and Memory S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

May 26, 2016SoC Architecture5 Network-on-Chip Information in the form of packets is routed via channels and switches from one terminal node to another S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

May 26, 2016SoC Architecture6 Network Interface Different terminals with different interfaces shall be connected to the network The network uses a specific protocol and all traffic on the network has to comply to the format of this protocol Switch Network Interface Terminal Node (Resource)

May 26, 2016SoC Architecture7 Network Interface In order to allow for different resources to connect to the network, the network interface can be divided into A resource independent part (Network Interface) A resource dependent part (Resource Network Interface) This is also the solution for the Nostrum NoC (developed at KTH) Switch Network Interface Resource Network Interface Terminal Node (Resource)

May 26, 2016SoC Architecture8 Network abstractions International Standards Organization (ISO) developed the Open Systems Interconnection (OSI) model to describe networks: 7-layer model Provides a standard way to classify network components and operations Networks-on-Chips use a similar protocol stack corresponding to the 4 lowest layers of the OSI protocol

May 26, 2016SoC Architecture9 OSI model physical mechanical, electrical data link reliable data transport network end-to-end service transport connections presentation data format session application dialog control application end-use interface

May 26, 2016SoC Architecture10 OSI layers Physical: connectors, bit formats, electrical properties Data link: error detection and control across a single link (single hop). Network: end-to-end multi-hop data communication Transport: connection-oriented services over multiple links, e.g. ordering of packets, errorfree connection

May 26, 2016SoC Architecture11 OSI layers, cont’d. Session: services for end-user applications: data grouping, checkpointing, etc Presentation: data formats, transformation services Application: interface between network and end-user programs

May 26, 2016SoC Architecture12 Internet Protocol (not an on-chip protocol!) physical data link network transport presentation application session physical data link network transport presentation application session physical data link network node A routernode B IP

May 26, 2016SoC Architecture13 Units of Resource Allocation A message is a continuous group of bits that is delivered from source terminal to destination terminal. A message consists of packets. A packet is the basic unit for routing and sequencing. Packets maybe divided into flits. A flit (flow control digits) is the basic unit of bandwidth and storage allocation. Flits do not have any routing or sequence information and have to follow the route for the whole packet. A phit (physical transfer digits) is the unit that is transfered across a channel in a single clock cycle.

May 26, 2016SoC Architecture14 Units of Resource Allocation Message RISN Header Head FlitBody FlitTail Flit Packet TypeVC Body Flit Packet Flit Phit Messages, Packets, Flits and Phits are handled in different layers of the network protocol

May 26, 2016SoC Architecture15 Performance Factors Factors that influence the performance of a network-on- chip are Topology (static arrangement of channels and nodes) Routing Techniques (selection of a path through the network) Switching Techniques (How a route is traversed) Flow Control (how are network resources allocated, if packets traverse the network) Router Architecture (buffers and switches) Traffic Pattern S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

Network-on-Chip Topologies Axel Jantsch / Ingo Sander Dally: Ch 3, (4), 5

May 26, 2016SoC Architecture17 Network Topology The network topology refers to the static arrangement of channels and nodes in the network A good topology allows to fulfill the requirements of the traffic at reasonable costs Network topology can be compared with a network of roads

May 26, 2016SoC Architecture18 Topology Examples node ring ”4x4”-Torus Butterfly with 8 nodes

May 26, 2016SoC Architecture19 Rings, Tori and Meshes node ring (4-ary 1-cube) ”4x4”-Torus (4-ary 2-cube) ”4x4”-Mesh (4-ary 2-mesh)

May 26, 2016SoC Architecture20 Combined Node consists of Terminal and Switch Node Combined Node is equivalent to Switch Node Terminal Node

May 26, 2016SoC Architecture21 Nomenclature Network-on-Chip The topology of an interconnection network is specified by a set of nodes N * connected by a set of channels C Messages originate and terminate in set of terminal nodes N, where N  N * Here: N = N * = 16 C = 16 ∙ 4 = ”4x4”-Torus (Channels are bidirectional)

May 26, 2016SoC Architecture22 Nomenclature Network-on-Chip Each channel c = (x,y) ∈ C connects a source node x to a destination node y, where x, y ∈ N * A channel is characterized by its width w c or w xy, which is the number of parallel signals it contains The source node of a channel is denoted s c and the destination node d c ”4x4”-Torus (Channels are bidirectional)

May 26, 2016SoC Architecture23 Nomenclature Network-on-Chip Its frequency f c or f xy is the rate at which bits are transported on a signal Its latency t c or t xy is the time required for a bit to travel from x to y Usually the latency is directly related to the physical length of the channel l c = vt c of the by a propagation velocity v The bandwidth of the channel is b c = w c f c ”4x4”-Torus (Channels are bidirectional)

May 26, 2016SoC Architecture24 Nomenclature Network-on-Chip Each switch node x has a channel set C x = C Ix ⋃ C Ox, where C Ix = {c ∈ C | d c = x} is the input channel set C Ox = {c ∈ C | s c = x} is the output channel set The degree of x is δ x = |C x |, which is the sum of the in degree and out degree ”4x4”-Torus (Channels are bidirectional)

May 26, 2016SoC Architecture25 Direct and Indirect Networks Direct Network Every Node in the network is both a terminal and a switch Direct Network Indirect Network Nodes are either switches or terminal

May 26, 2016SoC Architecture26 Bisection of a network A bisection of a network is a cut that partitions the entire network nearly in half The channel bisection of a network is the minimum channel count over all bisections of the network The bisection bandwidth of a network is the minimum bandwidth over all bisections of the network

May 26, 2016SoC Architecture27 Bisection Channel bisection B C = 4 (2 bidirectional channels go through the bisection) Bandwidth bisection B B = 4b (b is the bandwidth of each channel) node ring

May 26, 2016SoC Architecture28 Bisection Bandwidth Mesh – Uniform Traffic B T … total channel count B C … bisection E … Emitted packets per cycle H avg … average hop count Total Load: Balance: Bisection Load: Balance: kT-LimitB-Limit 23/2= /8= /6=0.832/3= /32=0.656½= /50=0.542/5=0.4

May 26, 2016SoC Architecture29 Paths A path is an ordered set of channels P = { c 1, c 2,...,c n }, where d c,i = s c,i+1 for i = 1... (n - 1) The length or hop count of a path is |P | A minimal path from node x to node y is a path with the smallest hop-count ”4x4”-Torus (Channels are bidirectional) Minimal Path (|P| = 3) Non-Minimal Path (|P| = 5)

May 26, 2016SoC Architecture30 Paths The set of all minimal paths between x and y is denoted R xy ”4x4”-Torus (Channels are bidirectional) Minimal Paths (|P| = 3)

May 26, 2016SoC Architecture31 Paths The diameter H max is the largest minimal hop count over all pairs of terminal nodes ”4x4”-Torus (Channels are bidirectional) Largest minimal hop count (Diameter H max = 4)

May 26, 2016SoC Architecture32 Paths The average minimum hop count H min is defined as the average hop count over all sources and destinations Here: H min = ”4x4”-Torus (Channels are bidirectional) Distance in hops from node 00

May 26, 2016SoC Architecture33 Paths A specific implementation may choose to incorporate some non-minimal path Then the actual average hop count H avg is defined over the path used by the network ”4x4”-Torus (Channels are bidirectional) Non-Minimal Path (|P| = 5)

May 26, 2016SoC Architecture34 Paths The physical distance of the path is ”4x4”-Torus (Channels are bidirectional) Non-Minimal Path (|P| = 5) The delay of the path is

May 26, 2016SoC Architecture35 Traffic Patterns The traffic pattern is a very important factor for the performance of a network In uniform random traffic each source is equally likely to send to each destination Uniform random traffic is the most commonly used traffic pattern, however it implies a balancing of the load, which often does not cause a problem for the network

May 26, 2016SoC Architecture36 Throughput The throughput of a network is the data rate in bits per second that the networks accepts per input port The topology of a network has a significant impact on the throughput (besides flow control and routing) The ideal throughput is defined as the throughput assuming a perfect routing and flow control Load is balanced over alternate paths No idle cycles on bottleneck channels

May 26, 2016SoC Architecture37 Throughput Maximum throughput occurs, if some channel of the network becomes saturated The channel load  of a channel is the ratio of the bandwidth demanded from the channel to the bandwidth of the input ports (in other words) the amount of traffic that must cross the channel, if each input unit injects one unit of traffic according to the given traffic pattern The channel that carries the largest fraction of the traffic determines the maximum channel load  max

May 26, 2016SoC Architecture38 Throughput The ideal throughput  ideal is the input bandwidth that saturates the bottleneck channel  ideal = b /  max In general it is difficult to determine the maximum channel load  max, but in case for uniform traffic, bounds can be found. Use the ideal throughput of a network on uniform traffic  ideal (U) as the capacity of the network.

May 26, 2016SoC Architecture39 Ideal Throughput in a Torus Assuming uniform traffic, 50% of the packets cross the bisection channels Best throughput, if packets are evenly distributed over the bisection channels Load on these channels is then  B = N / 2B C Thus  max ≥  B = N / 2B C And the ideal throughput is  ideal = b /  max ≤ 2bB C /N ”4x4”-Torus (Channels are bidirectional)

May 26, 2016SoC Architecture40 Ideal Throughput in a Torus Thus  max ≥  B = N / 2B C  ideal = b /  max ≤ 2bB C /N Example: 4 x 4 torus:  B = 16 / 2B C = 8/16 = ½ 4 x 4 mesh:  B = 16 / 2B C = 8/8 = 1 n x n torus:  B = n 2 / 2B C = n 2 /(2 · 4n) = n/8 n x n mesh:  B = n 2 / 2B C = n 2 /4n = n/ ”4x4”-Torus (Channels are bidirectional)

May 26, 2016SoC Architecture41 Another useful lower bound on channel load A packet needs H min hops to be delivered There are C channels in the network We have N nodes sending packets With equal load, we get a lower bound for

May 26, 2016SoC Architecture42 Lower Bounds on Bottlneck Hop count bound: Bisection bound:

May 26, 2016SoC Architecture43 Latency The latency of the network is the time required for a message to traverse a network, from the time head arrives at the input port to the time where the tail of the mesage departs the output port Latency depends not only on topology, but also on routing, flow control and the design of the router Topology gives a lower bound on latency

May 26, 2016SoC Architecture44 Latency There are two latency components: Head latency T h : Time required for head of the message to traverse the network Serialization latency T s = L/b : Time required for the tail to catch up (time for a message of length L to cross a channel with bandwidth b )

May 26, 2016SoC Architecture45 Head Latency Head latency depends on two topology factors Router delay T r (time spent in the routers) and time of flight T w (time spent on wires) T r = H min t r T w = D min / v (average distance D min, propagation velocity v )

May 26, 2016SoC Architecture46 Latency Together this gives average latency: T 0 = H min t r + D min / v + L / b (no congestion) Clearly H min, D min, and b are to a large extent determined by the topology If there is congestion in the network there is a forth term T C

May 26, 2016SoC Architecture47 Latency with time-space diagram trtr trtr t xy L/b Arrival at node x Leave x Arrival at node y Leave y Arrival at switch z x z Head Tail y

May 26, 2016SoC Architecture48 Examples The network has N = 64 nodes H min = 4 Channel width w c = 16 Channel frequency f c = 1 GHz Channel latency t c = 5 ns Router delay t r = 8 ns Packet Length L = 64 bytes T r = H min t r = 4 ∙ 8 ns = 32 ns T w = H min t c = 4 ∙ 5 ns = 20 ns T s = L / b = L / (f c w c ) = 64 ∙ 8 / (1 GHz ∙ 16) = 512 / 16 ns = 32 ns T 0 = 32 ns + 20 ns + 32 ns = 84 ns

May 26, 2016SoC Architecture49 To get a feeling about NoC (Toy example) The network has N = 64 nodes H min = 2 ∙ 8 / 3 = 5.33 Channel width w c = 32 Channel frequency f c = 1 GHz Channel latency t c = 1 ns Router delay t r = 1 ns Packet Length L = 16 bytes

May 26, 2016SoC Architecture50 To get a feeling about NoC (Toy example) The network has N = 64 nodes H min = 2 ∙ 8 / 3 = 5.33 Channel width w c = 32 Channel frequency f c = 1 GHz Channel latency t c = 1 ns Router delay t r = 1 ns Packet Length L = 16 bytes T r = H min t r = 5.33 ∙ 1ns = 5.33 ns T w = H min t c = 5.33 ns T s = L / b = L / (f c w c ) = 16 ∙ 8 / (1 GHz ∙ 32) = 128 / 32 ns = 4 ns T 0 = 5.33 ns ns + 4 ns = ns

May 26, 2016SoC Architecture51 Path Diversity A network with multiple minimal paths between most pairs of nodes is more robust than a network that has only one single route between the nodes

May 26, 2016SoC Architecture52 Path Diversity Random Traffic Each node is equally likely to send a message to any other node 50% of the packets pass the bisection  max = Butterfly with 8 nodes

May 26, 2016SoC Architecture53 Traffic Patterns The performance of a network is strongly depending on the traffic pattern The table below shows a number of different traffic patterns that can be used to analyze the performance of the network

May 26, 2016SoC Architecture54 Path Diversity Bit Rotation Traffic The node with address { b 2, b 1, b 0 } sends to { b 1, b 0, b 2 } Thus we get the following permutation { 0, 2, 4, 6, 1, 3, 5, 7 } Thus packets from nodes {0,1,4, 5} will all have to pass switch node 10  max,BR = 4 (since for instance channel 00, 10 is used by two connections) Max capacity: 25% Butterfly with 8 nodes

May 26, 2016SoC Architecture55 Torus and Mesh Networks Torus and Mesh networks, k-ary n-cubes, pack N = k n nodes. Advantages Regular structure allows efficient packaging For local communication latency is low Good path diversity Disadvantage Comparably larger hop count

May 26, 2016SoC Architecture56 Rings, Tori and Meshes node ring (4-ary 1-cube) ”4x4”-Torus (4-ary 2-cube) ”4x4”-Mesh (4-ary 2-mesh)

May 26, 2016SoC Architecture57 Properties of Tori and Meshes Torus Channel Bisection B C,T = 4 N / k Channel load under uniform traffic (50% of traffic crosses bisection)  T,U = k / 8 Channel load under worst traffic (100% of traffic crosses bisection)  T,W = k / 4 Average minimum hop count (k even) H min, T = nk / 4 Mesh Channel Bisection B C,M = 2 N / k Channel load under uniform traffic (50% of traffic crosses bisection)  M,U = k / 4 Channel load under worst traffic (100% of traffic crosses bisection)  M,W = k / 2 Average minimum hop count (k even) H min, M = nk / 3

May 26, 2016SoC Architecture58 Physical implementation of Mesh and Tori In order to implement a network on a chip, the abstract nodes of the network must be mapped to real positions in physical space A goal is to have the same latency for all channels

May 26, 2016SoC Architecture59 Folding networks leads to shorter largest channel length Folded 4-ary 2 cube

May 26, 2016SoC Architecture60 Summary The topology is an important factor of the network Mesh and Tori offer a huge amount of bandwidth and path diversity Performance is dependent on the traffic pattern