A Programmable Adaptive Router for a GALS Parallel System Jian Wu APT Group University of Manchester May 2009.

Slides:

Advertisements

Similar presentations

Bus arbitration Processor and DMA controllers both need to initiate data transfers on the bus and access main memory. The device that is allowed to initiate.

Advertisements

System Busses / Networks-on-Chip

Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.

Prof. Natalie Enright Jerger

Chapter 9 Introduction to MAN and WAN

Misbah Mubarak, Christopher D. Carothers

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

QuT: A Low-Power Optical Network-on-chip

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

FPGA Configuration. Introduction What is configuration? – Process for loading data into the FPGA Configuration Data Source Configuration Data Source FPGA.

COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.

Spring 2000CS 4611 Quality of Service Outline Realtime Applications Integrated Services Differentiated Services.

Internal Logic Analyzer Final presentation-part B

Module 12.  In Module 9, 10, 11, you have been introduced to examples of combinational logic circuits whereby the outputs are entirely dependent on the.

1 SpaceWire Router ASIC Steve Parkes, Chris McClements Space Technology Centre, University of Dundee Gerald Kempf, Christian Toegel Austrian Aerospace.

NETWORK LAYER. CONGESTION CONTROL In congestion control we try to avoid traffic congestion. Traffic Descriptor Traffic descriptors are qualitative values.

High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.

4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.

1 ReCPU:a Parallel and Pipelined Architecture for Regular Expression Matching Department of Computer Science and Information Engineering National Cheng.

10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

12/13/99 Page 1 IRAM Network Interface Ioannis Mavroidis IRAM retreat January 12-14, 2000.

A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.

Feng-Xiang Huang A Design-for-Debug (DfD) for NoC-based SoC Debugging via NoC Hyunbean Yi 1, Sungju Park 2, and Sandip Kundu 1 1 Department of Electrical.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.

Sequential Logic. Logic Styles Combinational circuits – Output determined solely by inputs – Can draw solely with left-to-right signal paths.

On-Chip Networks and Testing

Synchronization and Communication in the T3E Multiprocessor.

High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.

R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.

Network Layer4-1 Chapter 4: Network Layer Chapter goals: r understand principles behind network layer services: m network layer service models m forwarding.

CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.

DEVICES AND COMMUNICATION BUSES FOR DEVICES NETWORK

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

Multicast Routing Algorithms n Multicast routing n Flooding and Spanning Tree n Forward Shortest Path algorithm n Reversed Path Forwarding (RPF) algorithms.

SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,

Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

Static versus Dynamic Routes Static Route Uses a protocol route that a network administrators enters into the router Static Route Uses a protocol route.

Field Programmable Gate Arrays (FPGAs) An Enabling Technology.

© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.

Fall EE 333 Lillevik 333f06-l21 University of Portland School of Engineering Computer Organization Lecture 21 Subroutines, stack Interrupts, service.

Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer.

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.

Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.

UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.

Introducing a New Concept in Networking Fluid Networking S. Wood Nov Copyright 2006 Modern Systems Research.

MASCON: A Single IC Solution to ATM Multi-Channel Switching With Embedded Multicasting Ali Mohammad Zareh Bidoki April 2002.

Synchronous Sequential Logic A digital system has combinational logic as well as sequential logic. The latter includes storage elements. feedback path.

Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,

Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.

Routing Semester 2, Chapter 11. Routing Routing Basics Distance Vector Routing Link-State Routing Comparisons of Routing Protocols.

Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

TWEPP Biologically-Inspired Massively-Parallel Computation Steve Furber The University of Manchester

Mohamed Abdelfattah Vaughn Betz

Class Exercise 1B.

ESE532: System-on-a-Chip Architecture

Network Layer Goals: Overview:

Israel Cidon, Ran Ginosar and Avinoam Kolodny

Network-on-Chip Programmable Platform in Versal™ ACAP Architecture

"Computer Design" by Sunggu Lee

Lecture 12 Input/Output (programmer view)

Presentation transcript:

A Programmable Adaptive Router for a GALS Parallel System Jian Wu APT Group University of Manchester May 2009

SpiNNaker System for Neural Simulation Massively-Parallel (1 million ARMs) Massive neural net simulations (1 billion neurons in real time)  GALS infrastructure Fault-tolerant Node = SpiNNaker CMP + large off- chip memory

SpiNNaker Chip

Router Requirements Operation requirements:  Route multicast, point-to-point and nearest-neighbour packets.  Reprogrammable at run-time.  Provide an external interface to system resources.  Fault-tolerant operation.  Power efficiency. Bandwidth Requirements: ~7.4Gb/s  On-Chip traffic: (20-1)procs x 1000neurons x 72bit x 1000Hz = 1.368Gb/s  Inter-chip traffic: 1Gb/s x 6 links = 6Gb/s Bandwidth Target = 72bit x 200MHz = 14.4Gb/s

Router architecture Packet checking: - Check packet for errors and enable appropriate routing engine Multicast (MC) router: - Route neural spikes according to their source address Point-to-Point (P2P) router: - Route system management and control information packets. Nearest-neighbor (NN) router: - Route system boot-up and debugging info - Provide external I/F to resources Adaptive routing: - Redirect blocked packets Router Interface to system NoC: - AHB Master and Slave Interfaces

Multicast Router

Default and Adaptive Routing Route packets “across chip” by default (save RT entries!)  Automatically re-route packets destined to congested or failed links

Interfacing with System NoC Nearest-Neighbour packets are diverted to the System NoC.  Programming data is sourced from the System NoC.

Elastic Buffering  The spiking rate for the great majority of neurons is low -just a few Hz: Pipeline “bubbles” between valid packets.  There can be more than one request to the datapath issued in the same clock cycle.  The adaptive routing mechanism stalls the pipeline to find an alternative path for the congested packet. Simple, synthezisable design:  Use ordinary flip-flops for data latching.  Use a global, combinatorial circuit to generate stall signals

Elastic Buffering Pipeline1Pipeline2Pipeline3 Pipeline Control Pipeline Control Pipeline Control Flag1Flag2Flag3 Disable Back Pressure

Input Interchangeable Buffer  Used for flow control at the head of the pipeline.  One register is used in normal operation  The second is used when a stall occurs in the next stage  The delay is re-introduced when the stall is removed

Parallel-Path Synchronizer Avoid 2-cycle penalty to increase throuhgput

Packet Drop Rate

Power vs. Traffic Load

Power Distribution Power distribution under full traffic load Power distribution under 10% traffic load

Thank you