1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
On-Chip Networks and Testing
José Vicente Escamilla José Flich Pedro Javier García 1.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
1 Flow Identification Assume you want to guarantee some type of quality of service (minimum bandwidth, maximum end-to-end delay) to a user Before you do.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
15.1 Chapter 15 Connecting LANs, Backbone Networks, and Virtual LANs Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Lecture # 03 Switching Course Instructor: Engr. Sana Ziafat.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Lecture 16: Router Design
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Ch 8. Switching. Switch  Devices that interconnected with each other  Connecting all nodes (like mesh network) is not cost-effective  Some topology.
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Boris Grot, Joel Hestness, Stephen W. Keckler 1 The University of Texas at Austin 1 NVIDIA Research Onur Mutlu Carnegie Mellon University.
LECTURE 13 CT1303 LAN. DYNAMIC MAC PROTOCOL: CONTENTION PROTOCOL CSMA\CA Carrier Sense Multiple Access\Collision Avoidance CSMA\CA Each station will sense.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Chapter 8 Switching Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Lecture 23: Interconnection Networks
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Lecture 16: On-Chip Networks
NoC Switch: Basic Design Principles &
Lecture 17: NoC Innovations
Lecture: Interconnection Networks
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Presentation transcript:

1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs

2 Network Power Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton Energy for a flit = E R. H + E wire. D = (E buf + E xbar + E arb ). H + E wire. D E R = router energy H = number of hops E wire = wire transmission energy D = physical Manhattan distance E buf = router buffer energy E xbar = router crossbar energy E arb = router arbiter energy This paper assumes that E wire. D is ideal network energy (assuming no change to the application and how it is mapped to physical nodes)

3 Segmented Crossbar By segmenting the row and column lines, parts of these lines need not switch  less switching capacitance (especially if your output and input ports are close to the bottom-left in the figure above) Need a few additional control signals to activate the tri-state buffers Overall crossbar power savings: ~15-30%

4 Cut-Through Crossbar Attempts to optimize the common case: in dimension-order routing, flits make up to one turn and usually travel straight 2/3 rd the number of tristate buffers and 1/2 the number of data wires “Straight” traffic does not go thru tristate buffers Some combinations of turns are not allowed: such as E  N and N  W (note that such a combination cannot happen with dimension-order routing) Crossbar energy savings of 39-52%

5 Write-Through Input Buffer Input flits must be buffered in case there is a conflict in a later pipeline stage If the queue is empty, the input flit can move straight to the next stage: helps avoid the buffer read To reduce the datapaths, the write bitlines can serve as the bypass path Power savings are a function of rd/wr energy ratios and probability of finding an empty queue

6 Express Channels Express channels connect non-adjacent nodes – flits traveling a long distance can use express channels for most of the way and navigate on local channels near the source/destination (like taking the freeway) Helps reduce the number of hops The router in each express node is much bigger now

7 Express Channels Routing: in a ring, there are 5 possible routes and the best is chosen; in a torus, there are 17 possible routes A large express interval results in fewer savings because fewer messages exercise the express channels

8 Express Virtual Channels To a large extent, maintain the same physical structure as a conventional network (changes to be explained shortly) Some virtual channels are treated differently: they go through a different router pipeline and can effectively avoid most router overheads

9 Router Pipelines If Normal VC (NVC):  at every router, must compete for the next VC and for the switch  will get buffered in case there is a conflict for VA/SA If EVC (at intermediate bypass router):  need not compete for VC (an EVC is a VC reserved across multiple routers)  similarly, the EVC is also guaranteed the switch (only 1 EVC can compete for an output physical channel)  since VA/SA are guaranteed to succeed, no need for buffering  simple router pipeline: incoming flit directly moves to ST stage If EVC (at EVC source/sink router):  must compete for VC/SA as in a conventional pipeline  before moving on, must confirm free buffer at next EVC router

10 Hierarchical Buses Udipi et al., HPCA’10 Use buses to reduce routers Use bloom filters to stifle broadcasts Use page coloring to minimize travel beyond a bus

11 MC Placement Abts et al., ISCA 2009 Bullet Tilera

12 MC Placement Diamond placement has the least link contention as it distributes traffic XY routing for requests and YX routing for replies is also effective in distributing traffic

13 NoX Router Hayenga and Lipasti, MICRO 2011 In speculative routers, two packets may be sent out at the same time on the link; receiver gets garbage Instead, send the XOR of the colliding packets In the next cycles, send the XOR of colliding packets, minus one of the colliding packets Receiver decodes all of the packets with XORs Saves one wasted cycle on every collision or mis-speculation

14 Title Bullet