Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

Slides:



Advertisements
Similar presentations
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Advertisements

REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
An Analytical Model for Worst-case Reorder Buffer Size of Multi-path Minimal Routing NoCs Gaoming Du 1, Miao Li 1, Zhonghai Lu 2, Minglun Gao 1, Chunhua.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Technion – Israel Institute of Technology Qualcomm Corp. Research and Development, San Diego, California Leveraging Application-Level Requirements in the.
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Module 3.4: Switching Circuit Switching Packet Switching K. Salah.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
1 Evgeny Bolotin – Efficient Routing, DATE 2007 Routing Table Minimization for Irregular Mesh NoCs Evgeny Bolotin, Israel Cidon, Ran Ginosar, Avinoam Kolodny.
Chapter 4 Network Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 14.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
NoC: Network OR Chip? Israel Cidon Technion. Israel Cidon, Technion Technion’s NoC Research: PIs  Israel Cidon (networking)  Ran Ginosar (VLSI)  Idit.
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
Adaptive Routing in (Q)NoC
EE 122: Router Design Kevin Lai September 25, 2002.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
1 Evgeny Bolotin – ClubNet Nov 2003 Network on Chip (NoC) Evgeny Bolotin Supervisors: Israel Cidon, Ran Ginosar and Avinoam Kolodny ClubNet - November.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
Issues in System-Level Direct Networks Jason D. Bakos.
Statistical Approach to NoC Design Itamar Cohen, Ori Rottenstreich and Isaac Keslassy Technion (Israel)
Optimal Load-Balancing Isaac Keslassy (Technion, Israel), Cheng-Shang Chang (National Tsing Hua University, Taiwan), Nick McKeown (Stanford University,
EECC694 - Shaaban #1 lec #7 Spring The OSI Reference Model Network Layer.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Computer Networks Switching Professor Hui Zhang
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
WAN technologies and routing Packet switches and store and forward Hierarchical addresses, routing and routing tables Routing table computation Example.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
J. Christiansen, CERN - EP/MIC
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Data and Computer Communications Chapter 11 – Asynchronous Transfer Mode.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Axel Jantsch 1 NOCARC Network on Chip Architecture KTH, VTT Nokia, Ericsson, Spirea TEKES, Vinnova.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Resource Allocation in Network Virtualization Jie Wu Computer and Information Sciences Temple University.
Dynamic Traffic Distribution among Hierarchy Levels in Hierarchical Networks-on-Chip Ran Manevich, Israel Cidon, and Avinoam Kolodny Group Research QNoC.
Module R R RRR R RRRRR RR R R R R Access Regulation to Hot-Modules in Wormhole NoCs Isask’har (Zigi) Walter Supervised by: Israel Cidon, Ran Ginosar and.
1 An Arc-Path Model for OSPF Weight Setting Problem Dr.Jeffery Kennington Anusha Madhavan.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Technion – Israel Institute of Technology Faculty of Electrical Engineering NOC Seminar Error Handling in Wormhole Networks Author: Amit Berman Mentor:
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Network Processing Systems Design
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Ch 13 WAN Technologies and Routing
ECE 544: Traffic engineering (supplement)
ESE532: System-on-a-Chip Architecture
What Are Routers? Routers are an intermediate system at the network layer that is used to connect networks together based on a common network layer protocol.
ECE 544 Protocol Design Project 2016
NoC: Network OR Chip? Israel Cidon Technion.
CONGESTION CONTROL.
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Presentation transcript:

Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar

Israel Cidon - Technion 2 FPGA One NoC does not fit all! Flexibility Traffic uncertainty single application General purpose computer Chip design Run time SOC CMP I. Cidon and K. Goossens, in “Networks on Chips”, G. De Micheli and L. Benini, Morgan Kaufmann, 2006 Configuration

Israel Cidon - Technion 3 Field Programmable Gate Array Flexible Soft logic  Configurable logic blocks (CLBs) and routing channels Programmed Look-up-tables (LUTs) Configurable switching boxes Area, power and speed efficient Hard logic  Wire and clock infrastructure  Special purpose modules, e.g., CPU, SerDes

Israel Cidon - Technion 4 FPGA Example

Israel Cidon - Technion 5 Challenges for Future FPGA Scalability of design methodology Dominance of wire delays  Already more than 50% of delay Power Complex communication patterns Prototyping for NoC-based SoCs

Israel Cidon - Technion 6 NoC Based FPGA Architecture Functional unit Routers NoC for inter- routing Configurable region – User logic Configurable network interface

Israel Cidon - Technion 7 Future FPGA: NoC-Based Hierarchical:  Divide chip into regions  Programmable wiring inside regions  Regions interconnected by NoCs Scalable  Short wires, spatial reuse, power cost  Modular design  Prototype for NoC based SoC

Israel Cidon - Technion 8 Hard or soft NoC? Why hard  Interconnect is a performance bottleneck  Interconnect power  Part of FPGA infrastructure Why soft  Application is not known when the network is built  Provides maximum flexibility  Prevents resource lockup

Israel Cidon - Technion 9 Suggested FPGA NoC Architecture NoC ElementImplementation Wires, repeaters, etc.Hard Routers, including VCs, buffers, QoS support Hard Network interfacesSoft: Configurable Network Interface (CNI) Routing algorithm and headers Soft: determined in CNI Routing tablesSoft

Israel Cidon - Technion 10 FPGA Routing – Optimization Problem Set of Applications Different Architectures Different Traffic Patterns Implemented on the same chip Common efficient NoC

Israel Cidon - Technion 11 The NoC design problem The cost  Hard grid links For uniform grids - the capacity of the most congestion link  NoC Logic Hard logic for router Soft logic for routing tables, headers, CNIs Design Envelope  Collection of designs supported by a given programmable chip The variables  Number of “hard-coded” wires per link  Possible configurable routing schemes

Israel Cidon - Technion 12 Routing Schemes XY  Very simple logic  Deadlock free  Unbalanced - high cost in uniform capacity grids

Israel Cidon - Technion 13 Toggle XY (TXY) Split packets evenly between XY, YX routes Deadlock avoided with 2 VCs Near-optimal for symmetric traffic (permutations) [Seo et al. 05; Towles & Dally 02]  Simple  Better Balanced  Split routes  Does not take into account the traffic pattern

Israel Cidon - Technion 14 Weighted Schemes TXY not always produces the best results - Max. Capacity for graph with two hotspots at (1,1) and (1,2) on 5x5 grid TXY Optimum

Israel Cidon - Technion 15 WTXY Given a traffic pattern, choose XY/YX ratio of lowest maximum capacity Compute the ratio at programming time Load into C xy field in router Router chooses XY route with probability C xy, otherwise YX

Israel Cidon - Technion 16 TXY, WTXY Limitation Traffic split  packets of the same flow take different paths Delays may cause out-of-order arrivals Re-ordering buffers are costly

Israel Cidon - Technion 17 Ordered Routing Algorithms One route per source-destination (S-D) pair  No traffic splitting Unordered RoutingOrdered Routing

Israel Cidon - Technion 18 Source Toggle XY The route is a function of source and destination ID  bitwise XOR Very simple algorithm Maximum capacity is similar to TXY

Israel Cidon - Technion 19 Weighted Ordered Toggle - WOT Weighted Ordered Toggle (WOT)  Route per S-D pair is chosen at programming time  Each source stores a routing bit for each destination Objective: minimize max link capacity  Optimal route assignment is difficult

Israel Cidon - Technion 20 WOT Min-max Route Assignment initial assignment - STXY Make changes that reduce the capacity:  Find most loaded link  Among S-D pairs sharing this link change one that minimizes the max capacity (if possible) Sub-optimal

Israel Cidon - Technion 21 Iteration Demonstration S3S2 S1 D3 D1 D2

Israel Cidon - Technion 22 Benchmarks Previous work consider uniform permutations Chips have one or more hotspots  CPU, on-chip memory, off-chip memory interface We use several hot-spot traffic models Also use a real world example

Israel Cidon - Technion 23 Single Hotspot

Israel Cidon - Technion 24 Two Hotspots Maximum Capacity Design Envelope for various distances between the hotspots for WOT

Israel Cidon - Technion 25 Three Hotspots Maximum capacity vs. Minimum distance between the hotspots

Israel Cidon - Technion 26 Mixed Traffic Model Three parameters per node  A probability to be a hotspot,  A probability to send data to a hotspot  A probability to send data to a non-hotspot Average improvement for WOT vs. TXY is 12% and vs. XT is 25%

Israel Cidon - Technion 27 Real-World Example Based on Bertozzi - video encoder  Mapping and placement are done manually

Israel Cidon - Technion 28 Real World Example Maximum Capacity  WOT  STXY  XY

Israel Cidon - Technion 29 Summary A new NoC-based architecture for FPGA A design methodology for this architecture. WOT routing algorithm –  Balanced  In-order  Low cost