A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

Slides:



Advertisements
Similar presentations
QuT: A Low-Power Optical Network-on-chip
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
Lizhong Chen and Timothy M. Pinkston SMART Interconnects Group
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Network-on-Chip Examples System-on-Chip Group, CSE-IMM, DTU.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
Issues in System-Level Direct Networks Jason D. Bakos.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
Approaching Ideal NoC Latency with Pre-Configured Routes George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
A Vertical Bubble Flow Network using Inductive-Coupling for 3D CMPs
José Vicente Escamilla José Flich Pedro Javier García 1.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) Daihan Wang (Keio.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ,
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
Jose Miguel Montanana (NII, Japan) Michihiro Koibuchi (NII, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) Stabilizing.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Yu Cai Ken Mai Onur Mutlu
Lecture 16: Router Design
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.1 FAULT TOLERANT SYSTEMS Part 12 - Networks.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Lecture 23: Interconnection Networks
Fault-tolerant routing
Azeddien M. Sllame, Amani Hasan Abdelkader
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Lecture 17: NoC Innovations
Network-on-Chip & NoCSim
Rahul Boyapati. , Jiayi Huang
Deadlock Free Hardware Router with Dynamic Arbiter
Switching, routing, and flow control in interconnection networks
RECONFIGURABLE NETWORK ON CHIP ARCHITECTURE FOR AEROSPACE APPLICATIONS
Presentation transcript:

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** *National Institute of Informatics, Japan/JST, Japan *Keio University, Japan ***University of Southern California

Background Improvement of the die yield Circuit Level Architecture Level e.g. Cell Brd. Eng. Play Station 3:7SPE HPC-Purpose: 8SPE Fault tolerance of the communication on multi-core systems Lightweight mechanism SPE SPE SPE SPE PPE Ring buses SPE SPE SPE SPE Cell for PS3 (One SPE is disabled) Cell Broadband Engine

Outline Fault patterns on Network-on-Chip (NoC) Default-backup path mechanism (DBP) maintains the connectivity of all healthy PEs, even if the network includes hard faults Objective Provide a highly reliable network using lightweight hardware! Evaluation Energy Amount of Hardware Throughput

Network-on-Chip (NoC) Processor Core Largest component Various fault-tolerant techniques Resource sparing Redundancy On-Chip Router Area is not so large. Infrastructure that affects on-chip communication Duplication Core On-chip router 16-Core Architecture (*) Kyoto U/VDEC/ASPLA 90nm CMOS

Failures in Communication Transient Error (e.g. bit error) Software layer is responsible, and recoverable Link-to-link, and/or end-to-end [Murali,DToC05] Error detection and/or error correction (e.g. CRC) Permanent Error (e.g. hard error) System avoids using the failed modules 0100110 0100010 Bit error Hard error! Router PE PE Router

Existing NoC Fault-tolerant Techniques Router Architecture Speculative Router [Kim ISCA06] Providing fault-tolerance at input buffer, routing computation, and switch allocation unit. Dependability for misrouted packets [Thottethodi IPDPS03] Channel Reconfiguration[DallyText03, Soteriou ICD04] Routing Paths Resource Sparing Dynamic Reconfiguration Fault-Tolerant Routing Each Technique is resilient for portion of possible failures. - Using them together enables high reliability! But, how about simplicity?   - Hard to recover crossbar failures

Outline Fault patterns on Network-on-Chip (NoC) Default-backup path mechanism (DBP) maintains the connectivity of all healthy PEs, even if the network includes hard faults Objective Provide a highly reliable network using lightweight hardware! Evaluation Energy Amount of Hardware Throughput

Motivation NoC Component Router, Link Failure NI Failure Core disabling healthy local PEs Segmentation of the network NI Failure Disabling the healthy local PE Core On-chip router Disabled Unlike off-chip systems, a faulty module cannot be removed and replaced 16-Core Architecture The proposed lightweight fault-tolerant technique on a router maintains network connectivity of all the healthy PEs Disabled healthy PE

Conventional NoC Router(2-D mesh) 5-by-5 Router, channel bit-width (flit size) 64-bit Each input buffer has two VCs(2x64-bit x 4) ARBITER X+ X+ FIFO Each module may fail. Duplication of all the input ports is too expensive. X- X- FIFO Y+ Y+ FIFO Y- Y- FIFO 5x5 XBAR CORE CORE FIFO [Matsutani.ASP-DAC08] Area (after place and route) is 40~45 [KGate]; 75% is FIFO

Minimum Requirements for Communication ARBITER X+ X+ FIFO X- X- FIFO Y+ Y+ FIFO Y- Y- FIFO 5x5 XBAR CORE CORE FIFO To communicate a local core with neighboring cores,  It should send packets to at least one output port  It should receive packets from at least one input port

Default-backup Path(DBP) Mechanism ARBITER X+ FIFO X+ X- Y+ Y- CORE X- FIFO Y+ FIFO Y- FIFO 5x5 XBAR CORE FIFO  A local core can send packets to at least one output port  A local core can receive packets from at least one input port

Default-backup Path(DBP) Mechanism ARBITER X+ FIFO X+ X- Y+ Y- CORE X- FIFO Failure Y+ FIFO Y- FIFO 5x5 XBAR CORE FIFO Tail Head Body  A local core can send packets to at least one output port  A local core can receive packets from at least one input port

Behavior of the DBP Mechanism (within a Router)  Cores can communicate with each other, even if router modules fail maintain packet transfers from X- direction, o X+ direction ARBITER X+ FIFO X+ X- Y+ Y- CORE X- FIFO Failure Y+ FIFO Y- FIFO CORE FIFO 13

Behavior of the DBP Mechanism (bypassing Xbar and NI faults) ARBITER X+ FIFO X+ X- Y+ Y- CORE Failure X- FIFO Using 3:1 Mux instead of 2:1 mux Y+ FIFO Y- FIFO 5x5 XBAR CORE FIFO

Another Issue: Network Connectivity Core On-chip router Router, link failure Disabling healthy local PEs Segmentation of the networks may disable all the PEs The DBP mechanism provides reliability not only on intra-router datapath but also on routing paths 16-Core Architecture Dividing into two regions!

DBP Mechanism (inter-router behavior) ARBITER X+ FIFO X+ X- Y+ Y- CORE X- FIFO Y+ FIFO Y- FIFO Router 5x5 XBAR CORE FIFO Router Y- X- X+ Y+ Set the DBP ports along a unidirectional embedded ring topology (omit PEs)

Routing Bypasses Faults (e.g., failed crossbar) Default-backup path is used only at the faulty port Link The corresponding network graph Router A unidirectional channel on a link

DBP Applied to Up*/Down* Routing The router has only a single output port D Up Down S Down Up Existing deadlock-free routing cannot provide the network connectivity, due to the directional routing restrictions

X X DBP Routing Mechanism Guaranteeing deadlock-freedom and connectivity by imposing routing restrictions Allows packet transfer along the DBP ring Allows VC transitions in increasing order Uses existing deadlock-free routing within every virtual-channel network X X Turn Model[Glass,1992]  Virtual channel (VC) transition The Idea is similar to the SAN routing [koibuchi,ICPP03] We propose a new routing strategy for NoCs with directional routing restrictions!

Outline Fault patterns on Network-on-Chip (NoC) Default-backup path mechanism (DBP) maintains the connectivity of all healthy PEs, even if the network includes hard faults Objective Provide a highly reliable network using lightweight hardware! Evaluation Energy Amount of Hardware Throughput

Energy: NoC Energy Model Ave. flit energy: Send 1-flit to destination How much energy[J] ? Simulation parameters 6/12mm square chip (16/64 cores) 90nm CMOS 12mm [Wang, DATE’05]

Energy Consumption almost constant! 16 cores 64 cores As the number of faulty links increases, DBP gracefully increases the energy, due to the increased hop counts

The ratio of additional HW is decreased, as # of ports increases. Amount of Hardware The ratio of additional HW is decreased, as # of ports increases. Router area with various # of ports. Total router area of 2-D mesh Area is increased by at most only 11.1% (the 2-VC case)

Performance Evaluation Network simulation Throughput and latency 16 cores and 64 cores Topology 2-D mesh Traffic pattern Random (as a baseline) Packet size 16-flit (1-flit header) Buffer size 1-flit per channel Switching Wormhole switching # of VCs 2 Min latency 3-cycle per router

Throughput and Latency Throughput is decreased by the increased path hops. 64 cores 16 cores Topology is changed from 2-D mesh (no faults) to ring at 48/224 faults on 16/64 cores

Extensions of DBP Mechanism Faults within the DBP itself and various ports Partially duplication Multiple embedded DBP rings Another approach To improve the latency, a healthy router enables the DBP Link faults Router ARBITER X+ FIFO X+ X- Y+ Y- CORE X- FIFO Y+ FIFO Y- FIFO CORE FIFO Datapath via no crossbar

Conclusions We proposed a lightweight fault-tolerant mechanism, DBP, for NoCs (architecture level) Resilient for hardware faults of both intra-router modules and routing paths A new routing strategy was developed The idea is applicable to various NoC architectures As well as regular topologies Evaluation Energy consumption almost constant by up to 40 faults (64 cores) Amount of Hardware increasing by at most only 11.1% Throughput performance decreasing by the increased path hops The DBP serves the role of “lifeline” to increase the lifetime of NoCs