High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

Slides:

Advertisements

Similar presentations

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

Advertisements

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.

Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.

Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.

1 SpaceWire Router ASIC Steve Parkes, Chris McClements Space Technology Centre, University of Dundee Gerald Kempf, Christian Toegel Austrian Aerospace.

EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.

PRESENTED BY: PRIYANK GUPTA 04/02/2012 Generic Low Latency NoC Router Architecture for FPGA Computing Systems & A Complete Network on Chip Emulation Framework.

Extensible Networking Platform 1 Liquid Architecture Cycle Accurate Performance Measurement Richard Hough Phillip Jones, Scott Friedman, Roger Chamberlain,

Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.

Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.

NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.

Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.

Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCEMadrid, August 28-30, 2006 Jason D. Bakos, Charles L. Cathey, E. Allen Michalski,

Issues in System-Level Direct Networks Jason D. Bakos.

1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.

Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.

Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Switching, routing, and flow control in interconnection networks.

Introduction to Counter in VHDL

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.

On-FPGA Communication Architectures

On-Chip Networks and Testing

High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.

ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.

SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,

ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.

George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

J. Christiansen, CERN - EP/MIC

FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.

VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.

CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.

EE3A1 Computer Hardware and Digital Design

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.

4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.

Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Introducing a New Concept in Networking Fluid Networking S. Wood Nov Copyright 2006 Modern Systems Research.

Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.

SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.

Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.

Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.

-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.

Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering

Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 

Chapter 3 Part 3 Switching and Bridging

Topics discussed in this section:

Complex Programmable Logic Device (CPLD) Architecture and Its Applications

VLSI Testing Lecture 5: Logic Simulation

ESE532: System-on-a-Chip Architecture

Physical constraints (1/2)

Azeddien M. Sllame, Amani Hasan Abdelkader

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Deadlock Free Hardware Router with Dynamic Arbiter

Switching, routing, and flow control in interconnection networks

Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.

RECONFIGURABLE NETWORK ON CHIP ARCHITECTURE FOR AEROSPACE APPLICATIONS

Lecture: Interconnection Networks

Chapter 3 Part 3 Switching and Bridging

CS 6290 Many-core & Interconnect

Presentation transcript:

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams

Introduction  Continued shrinking of device dimension introduces new design challenges  Moving data around a chip can now be the limiting factor of performance  Existing interconnection solutions do not scale well 2

Why do existing solutions not scale?  Global connections are longer  Wire depth increased to counter width decrease  Parasitic capacitive effects increase and cause slow signal propagation 3

Why do existing solutions not scale?  Existing system-level connection uses buses  Buses increase resource efficiency and decrease wiring congestion  Not suitable for a large number of modules  A network based alternative would offer higher aggregate bandwidth 4

Why design for FPGA systems?  FPGA silicon area already dominated by wiring  Global wires are limited in number  Increasing gate count only increases wiring congestion 5

The Solution: Network-on-Chip  Use technologies from network systems  Replace inefficient global wiring with high-level interconnection network  Create scalable systems to handle large numbers of modules 6

Existing Solutions  Most existing systems are for ASIC designs  Stanford Interconnect  RAW  SCALE  SPIN  PNoC: An solution for FPGAs  Complex  High hardware cost  Other simulated solutions exist but few are implemented 7

Proposal: Two network systems  Existing solutions use either packet switching or circuit switching techniques  Design, implement, test and synthesise one of each to compare performance and hardware cost  Map solutions to an FPGA platform to evaluate hardware cost in current generation systems 8

Network Architecture Design  Topology  Simple  Scalable  2 Dimensional  Solution: 2D mesh Topology 9

Network Architecture Design  Routing Algorithm  Deterministic  Data always follows same path through network  Simple hardware  Sensitive to congestion  Adaptive  Paths through network can change according to load  Complex hardware  Avoids congestion 10

Network Architecture Design  When choosing routing algorithms must avoid:  Deadlock:  Livelock Solution: Use unidirectional wiring and allow each node to make two connections Solution: Use deterministic routing 11

Network Architecture Design  Flow control methods  Circuit switched  Circuit request propagates through network  Path reserved to destination  Grant signal propagates back  Data sent then circuit deallocated  Packet switched  Use header, body and tail  Wormhole routing  Forward header and body without waiting for tail  Need buffers to store stalled packets 12

Router Design  Each router contains a number of modules  FIFOs (only present in packet switched router)  Address to port-request decoder  Arbiter  Control finite state machines  Crossbar 13

Circuit Switched Router Structure Request In Request Out Grant In Grant Out Data In Data Out Data In In & Out Ports CrossbarCrossbar FSMFSM ArbiterArbiter Address to Port Decoder 14

Packet Switched Router Structure Request From FIFOs Request In Write Out Full In Grant Out Data From FIFOs Data Out Data From FIFOs In & Out Ports CrossbarCrossbar ControlControl ArbiterArbiter Address to Port Decoder FIFO FSM Data In Full Write Grant Req Data 15 5 Queue Modules

Router Implementation and Testing  Both routers were coded using VHDL  Simulation and testing used a combination of ModelSim and Xilinx ISE 9.1  Ad-hoc tests used for individual modules  VHDL testbench used for system verification 16

Testbench Structure Mesh Network Read Input Read Input Input Tables Test Table Source Output Table Sink Compare TESTBENCH Command File Output File Clock Gen Reset Gen Cycle Count Success: ID: 1 Source : (0,3) Dest : (1,0) Hops : 4 Latency: 34 Success: ID: 2 Source : (0,2) Dest : (1,0) Hops : 3 Latency: 27 Success: ID: 3 Source : (3,2) Dest : (1,1) Hops : 3 Latency: 22 Success: ID: 4 Source : (1,3) Dest : (0,1) Hops : 3 Latency: 22 Success: ID: 5 Source : (3,0) Dest : (3,1) Hops : 1 Latency: 12 #STARTSOURCEDESTSIZEID #

Synthesis  Each router was synthesised for a Virtex-4 LX platform  Post-synthesis verification  Resource usage  Timing 18

Circuit Switched Resource Usage LUTs Flip-Flops Total of Input LUTS ~0.1% of a Virtex 5 Total of 202 Flip Flops 19

Packet Switched Resource Usage LUTs Flip-Flops Total of Input LUTS +34% compared to circuit switched Total of 237Flip Flops 20

Timing Results Circuit Switched Packet Switched  Max Freq MHz  Setup time 5.308ns  Hold time 0.272ns  Max Freq MHz  Setup time 6.125ns  Hold time 0.272ns Critical path is through Arbiter in both designs 21

Project Appraisal  Maintaining an accurate software simulation proved difficult  A great deal was learnt during the implementation of the circuit switched network  HDL implementations are only prototypes  Testbench provides a good framework but more time is needed to gather performance data 22

Conclusions  Possible to make low complexity network-on-chip systems suitable for FPGAs  Latency has to be traded for throughput  Hard to collect performance data without application driven benchmarks  Both networks are viable so why not use both? 23

Future Work  Cycle accurate software simulations  Application driven benchmarking  Serial transmission  Power efficiency  Industry standard solution 24