Presentation is loading. Please wait.

Presentation is loading. Please wait.

Networks-on-Chip.

Similar presentations


Presentation on theme: "Networks-on-Chip."— Presentation transcript:

1 Networks-on-Chip

2 Seminar contents The Premises The Network-on-Chip approach
Homogenous and Heterogeneous Systems-on-Chip and their interconnection networks The Network-on-Chip approach Slide from S. Tota and M. R. Casu [1]

3 The premises The System-on-Chip (SoC) today
Heterogeneous ~10 IP’s Homogeneous (MP-SoC) ~ 10 uP (with exceptions) On-Chip BUS (AMBA, Core Connect, Wishbone, …) IP and uP are sold with proprietary Bus IF Near and long-term forecast  100 IP/uP: Busses are non scalable! Physical Design issues: signal integrity, power consumption, timing closure Clock issues: Is time for the Globally Asynchronous paradigm? (Still locally synchronous) Need for “more regular” design Near and long-term forecast : Heterogeneous : IP Homogeneous : uP Clock issues : Clusters might work with different clock domains. Slide from S. Tota and M. R. Casu [1]

4 Heterogeneous Today’s SoC
CPU DSP MEM Interconnection network (BUS) Embedded FPGA Dedicated IP I/O ASIC with embedded FPGAs. Heterogeneous : Different IPs connected together. Slide from S. Tota and M. R. Casu [1]

5 Maya (Rabaey’00) Slide from S. Tota and M. R. Casu [1]
“Uses FPGA for infrequent and bit-level functions that do not justify custom implementation” (Ref : Rabaey’00) Hierarchical Mesh Reconfigurable Network : Uses switches in order to maximize locality of traffic. Slide from S. Tota and M. R. Casu [1]

6 Maya (Rabaey’00) Slide from S. Tota and M. R. Casu [1]

7 Maya (Rabaey’00) Slide from S. Tota and M. R. Casu [1]

8 Homogeneous SoC (MP-SoC)
CPU MEM CPU MEM CPU MEM CPU MEM Interconnection network (BUS, XBAR) CPU MEM CPU MEM CPU MEM CPU MEM MPSoC : Multiprocessor System on Chip Homogeneous : Copies of the same IP connected together. Slide from S. Tota and M. R. Casu [1]

9 MP-SoC: Cisco CRS-1 Router
CRS-1 Router uses 188 extensible network processors per “Silicon Packet Processor” chip CRS : Carrier Routing System Fully configurable VLIW. Slide from S. Tota and M. R. Casu [1]

10 MP-SoC: Cisco CRS-1 Router
CRS-1 Router uses 188 extensible network processors per “Silicon Packet Processor” chip 16 PPE Clusters of 12 PPEs each Slide from S. Tota and M. R. Casu [1]

11 Very long wires B A Year 2005 Year 2010 1 ns (1 GHz) 0.1 ns (10 GHz) B
Signals take several clock cycles to go over a link, which limits the frequency that can be used and as such, the bandwidth offered. In order to optimize timings, repeaters can be inserted on the link. This technique is called wave pipelining. A A Slide from S. Tota and M. R. Casu [1]

12 Bus pros () and cons ()
 Every unit attached adds parasitic capacitance, therefore electrical performance degrades with growth.  Bus timing is difficult in a deep submicron process.  Bus arbiter delay grows with the number of masters. The arbiter is also instance-specific.  Bandwidth is limited and shared by all units attached.  The silicon cost of a bus is small.  Any bus is almost directly compatible with most available IPs, including software running on CPUs.  The concepts are simple and well understood. Slide from S. Tota and M. R. Casu [1]

13 What are NoC’s? According to Wikipedia:
“Network-on-a-chip (NoC) is a new paradigm for System-on-Chip (SoC) design. NoC based-systems accommodate multiple asynchronous clocking that many of today's complex SoC designs use. The NoC solution brings a networking method to on-chip communications and claims roughly a threefold performance increase over conventional bus systems.” NoCs are a GALS (Globally Asynchronous Locally Synchronous) solution. NoCs borrows from networking concepts such as OSI in order to abstract underlying complexity of interconnects. Slide from S. Tota and M. R. Casu [1]

14 NoC exemple Slide from S. Tota and M. R. Casu [1]
Processor Master Processor Master Processor Master Global Memory Slave Routing Node Routing Node Routing Node Processor Master Processor Master Processor Master Global I/O Slave Routing Node Routing Node Routing Node Global I/O Slave Processor Master Processor Master Processor Master Notice how edge routers are connected to shared memory/components. Routing Node Routing Node Routing Node Slide from S. Tota and M. R. Casu [1]

15 Basic Ingredients of a NoC
N Computational Resources Processing Elements (PE) 1 Connection Topology 1 Routing technique M  N Switches N Network Interfaces 1 Addressing system 1 Communication Protocol 1 Programming model Message passing Shared Memory Slide from S. Tota and M. R. Casu [1]

16 Problems Internal network contention causes (often unpredictable) latency. The network has a significant silicon area. Bus-oriented IPs need smart wrappers. Software needs clean synchronization in multiprocessor systems. System designers need reeducation for new concepts. Other problems : Repeaters used to optimize interconnects consumes lots of silicon area and power. Internal contention can also cause throughput to lower and packets to be dropped, not just latency. Lack of simulators covering full scope of NoC. Good Points : -Only point-to-point one way wires for all network sizes -Bandwidth scales with network size. -Routing can be distributed, the same router can be re-instanciated regardless of network size. Slide from S. Tota and M. R. Casu [1]

17 Network on Chip (NoC) Adoption of network-based packet communication paradigm. Use abstraction and layering to decouple the communication issue from computation Distribute the responsibility of reliable transmission evenly over higher and lower layers of abstraction Software Application systems Architecture and control Transport Network Data link # Layer Coverage 7 Application Not covered 6 Presentation Wrappers 5 Session Wrappers 4 Transport Components 3 Network Components 2 Link Components 1 Physical Components Physical wiring Protocol stack abstraction Benini & De Micheli, Computer 2002 Slide from L. Benini [2]

18 Physical layer - Synchronization
Physical design: Voltage levels Driver design Sizing Physical routing Synchronization: How and when to sample the channel? Avoid a clock: asyncronous communication The clock travels with the data The clock can be reconstructed from the data Synchronization recovery has a cost Cannot be abstracted away Can cause errors (e.g., metastability) Synchronization : When clock is not sent with the signal, the signal needs to be resynchronized. Errors can be made while re-synchronizing data. A simple solution is to add an header to data specifically for synchronization. Slide from L. Benini [2]

19 Data-link layer Provide reliable data transfer on an unreliable physical channel Access to the communication medium Dealing with contention and arbitration Issues Fairness and safe communication Achieve high throughput Error resiliency Unreliable physical channel : Information transfer is unreliable at the electric level : Timing errors, Cross-talk, Electro-magnetic interference, Soft Errors. CRC and parity checks can be implemented to detect and, when possible, correct errors. Slide from L. Benini [2]

20 Topologies Heritage of networks with new constraints SPIN, CLICHE’
Need to accommodate interconnects in a 2D layout Cannot route long wires (clock frequency bound) SPIN, CLICHE’ Torus Folded torus Octagon BFT. Slide from S. Tota and M. R. Casu [1]

21 Topologies Throughput as a function of number of IPs.
Comparison of topologies according to different QoS parameters. (ADDED SLIDE) Ref.: Pratiksha Gehlot, Shailesh Singh Chouhan, "Performance evaluation of Network on Chip architectures", in Proc. International Conference on Emerging Trends in Electronic and Photonic Devices & Systems, Varanasi, India, December 22-24, 2009, pp Throughput as a function of number of IPs.

22 Topologies Drop probability as a function of number of IPs.
Comparison of topologies according to different QoS parameters. (ADDED SLIDE) Ref.: Pratiksha Gehlot, Shailesh Singh Chouhan, "Performance evaluation of Network on Chip architectures", in Proc. International Conference on Emerging Trends in Electronic and Photonic Devices & Systems, Varanasi, India, December 22-24, 2009, pp Drop probability as a function of number of IPs.

23 Topologies Latency as a function of number of IPs.
Comparison of topologies according to different QoS parameters. (ADDED SLIDE) Ref.: Pratiksha Gehlot, Shailesh Singh Chouhan, "Performance evaluation of Network on Chip architectures", in Proc. International Conference on Emerging Trends in Electronic and Photonic Devices & Systems, Varanasi, India, December 22-24, 2009, pp Latency as a function of number of IPs.

24 Switching Again, techniques inherited from Computer and Communication Networks New constraints in silicon: area and power Use as few buffers as possible Store & Forward and Virtual-Cut-Through Need buffers size for an entire packet, unsuited! Limited buffer size in Wormhole Deflection Routing, a.k.a. “Hot Potato” Virtual channels Increase buffer size… Wormhole routing : Route flits instead of packets, allows for smaller buffer size. Deflection routing : Packets have preferred outputs along which they want to be routed. If the output is not available, it is sent to another output even though this could lead to a non-optimal path. Virtual channels : More than one logical channel over one physical channel. Allows for interleaving of packets but require more buffer space. Slide from L. Benini [2]

25 Switching Classification of Switching Techniques :
(ADDED SLIDE) Ref : Ankur Agarwal, Cyril Iskander, and Ravi Shankar, “Survey of Network on Chip (NoC) Architectures & Contributions”, Journal of Engineering, Computing and Architecture[online], vol.3, no.1, 2009 [cited Nov. 21, 2010], available :

26 Routing Deterministic vs. Adaptive
Simplify/Complicate routing logic Easy/Uneasy deadlock free Prone/Robust to congestion 2D dimension order routing (XY) most used static routing in NoC (e.g. with Wormhole and Mesh) Static routing algorithms : X-Y : Packets are forwarded first in one dimension then in the other. For example, first in X and then in Y or vice-versa. Street Sign : Router makes decision by looking at a lookup table based on the destination address. Routing can also be distributed or global. Distributed : Each router makes a routing decision independently of the others. Global : Routing decisions are made according to a central repository of the network’s current condition. Slide from L. Benini [2]

27 Routing Classification of Routing Algorithms :
(ADDED SLIDE) Ref : Ankur Agarwal, Cyril Iskander, and Ravi Shankar, “Survey of Network on Chip (NoC) Architectures & Contributions”, Journal of Engineering, Computing and Architecture[online], vol.3, no.1, 2009 [cited Nov. 21, 2010], available :

28 Transport layer Decompose and reconstruct information
Important choices Packet granularity Admission/congestion control Packet retransmission parameters (Ex.:Timeout) All these factors affect heavily energy and performance Application-specific schemes vs. standards (added info directly on slide) Slide from L. Benini [2]

29 Flow control Determines how resources are allocated to packets moving in the network. Classification of Flow Control Algorithms : (ADDED SLIDE) Ref : Ankur Agarwal, Cyril Iskander, and Ravi Shankar, “Survey of Network on Chip (NoC) Architectures & Contributions”, Journal of Engineering, Computing and Architecture[online], vol.3, no.1, 2009 [cited Nov. 21, 2010], available :

30 System software Programming paradigms Middleware: Shared memory
Message passing Middleware: Layered system software Should provide low communication latency Modular, scalable, robust …. Slide from L. Benini [2]

31 Who first had the idea? The most referred papers according to Google (#cit.) Guerrier’00 (204), A Generic Architecture for On-Chip Packet-Switched Interconnections Dally’01 (392), Route Packets, Not Wires: On-Chip Interconnection Networks Benini’02 (417), Networks on Chips: A New SoC Paradigm Kumar’02 (184), A Network on Chip Architecture and Design Methodology Another paper from the same period also from Pierre Guerrier : Pierre Guerrier, Alain Greiner, "A Scalable Architecurefor System-On-Chip Interconnections",in Proceedings of the Sophia-Antipolis MicroElectronics Conference, Sophia Antipolis, France, October 1999, pp Slide from S. Tota and M. R. Casu [1]

32 Some NoC References J. Rabaey et al., “A 1-V heterogeneous reconfigurable DSP IC for wireless baseband digital signal processing,” IEEE Journal of Solid State Circuits, Vol. 35,  No. 11,  Nov. 2000, pp P. Guerrier and A. Greiner, “A Generic Architecture for On-Chip Packet-Switched Interconnections,” Proc. Design and Test in Europe (DATE), pp , Mar A. Adriahantenaina et al., “SPIN: a Scalable, Packet Switched, On-chip Micro-network,” Proc. Design and Test in Europe (DATE), Mar L. Benini and G. De Micheli, “Networks on Chips: A New SoC Paradigm,” Computer, vol. 35, no. 1, Jan. 2002, pp S. Kumar et al., “A network on chip architecture and design methodology,” in Proc. ISVLSI, 2002. W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Proc. Design Automation Conf., 2001. K. Goossens et al., “Trade-offs in the design of a router with both guaranteed and best-effort services for networks on chip,” IEE Proc.-Comput. Digit. Tech., Vol. 150, No. 5, Sep. 2003, pp P.P. Pande et al., “Performance Evaluation and Design Trade-offs for Network-on-Chip Interconnect Architectures,” IEEE Trans. Computers, vol. 54, no. 8, Aug. 2005, pp Slide from S. Tota and M. R. Casu [1]

33 References S. Tota and M. R. Casu Sergio Tota and Mario R. Casu, “Networks-on-Chip,” presentation. L. Benini, “Networks on chip,” presentation,


Download ppt "Networks-on-Chip."

Similar presentations


Ads by Google