A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian.

Slides:

Advertisements

Similar presentations

Torino, Italy – June 27th, 2013 A2B: AN I NTEGRATED F RAMEWORK FOR D ESIGNING H ETEROGENEOUS AND R ECONFIGURABLE S YSTEMS C. Pilato, R. Cattaneo, G. Durelli,

Advertisements

D ARMSTADT, G ERMANY - 11/07/2013 A Framework for Effective Exploitation of Partial Reconfiguration in Dataflow Computing Riccardo Cattaneo ∗, Xinyu Niu†,

Torino, Italy – June 25, 2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013) C. Pilato R. Cattaneo, C. Pilato, M. Mastinu, M.D. Santambrogio.

Politecnico di Milano, Italy

A Survey of Logic Block Architectures For Digital Signal Processing Applications.

Berlin, Germany – January 21st, 2013 A2B: A F RAMEWORK FOR F AST P ROTOTYPING OF R ECONFIGURABLE S YSTEMS Christian Pilato, R. Cattaneo, G. Durelli, A.A.

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.

Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.

Team Morphing Architecture Reconfigurable Computational Platform for Space.

TCSS 372A Computer Architecture. Getting Started Get acquainted (take pictures) Discuss purpose, scope, and expectations of the course Discuss personal.

ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.

Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.

GanesanP91 Synthesis for Partially Reconfigurable Computing Systems Satish Ganesan, Abhijit Ghosh, Ranga Vemuri Digital Design Environments Laboratory.

CS 151 Digital Systems Design Lecture 38 Programmable Logic.

BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.

Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Computerized Train Control System by: Shawn Lord Christian Thompson.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,

International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.

Coarse and Fine Grain Programmable Overlay Architectures for FPGAs

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Automated Design of Custom Architecture Tulika Mitra

FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.

THE TESTING APPROACH FOR FPGA LOGIC CELLS E. Bareiša, V. Jusas, K. Motiejūnas, R. Šeinauskas Kaunas University of Technology LITHUANIA EWDTW'04.

F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.

CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.

Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.

Design of a Novel Bridge to Interface High Speed Image Sensors In Embedded Systems Tareq Hasan Khan ID: ECE, U of S Term Project (EE 800)

A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.

DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

POLITECNICO DI MILANO Blanket Team Blanket Reconfigurable architecture and (IP) runtime reconfiguration support in Dynamic Reconfigurability.

Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.

Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,

Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:

Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.

Self-Adaptive Embedded Technologies for Pervasive Computing Architectures Self-Adaptive Networked Entities Concept, Implementations,

Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.

DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:

POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.

Design of OCDMA Demonstrator Yun Ping Yang, Alireza Hodjat, Herwin Chan, Eric Chen, Josh Conway.

System on a Programmable Chip (System on a Reprogrammable Chip)

400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.

Presenter: Darshika G. Perera Assistant Professor

Programmable Hardware: Hardware or Software?

Dynamo: A Runtime Codesign Environment

FPGA: Real needs and limits

Dynamo: A Runtime Codesign Environment

FPGA: Real needs and limits

Anne Pratoomtong ECE734, Spring2002

Characteristics of Reconfigurable Hardware

Hardware Assisted Fault Tolerance Using Reconfigurable Logic

Presentation transcript:

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian Pilato, Donatella Sciuto and Marco Domenico Santambrogio Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Milano, IT [durelli, nacci, rcattaneo, pilato, 1 20th Reconfigurable Architectures Workshop May 20-21, 2013, Boston, USA

Rationale Strive for performance in computing intensive applications Reconfigurable HW well suited for certain classes of applications –Multimedia, computational biology, physical simulation FPGA used in HPC systems High maintenance costs –need to share resources among users Need to dynamically share and reuse components on FPGA among different users 2

Outline Goals State of Art Proposed Solution Design and Evaluation Case Study Conclusions and Future work 3

Goals Design an interconnection able to: –Create different pipelines reusing available components on the FPGA –Share the resources between different applications –Not insert any stall in the pipeline Target FPGA for HPC scenario 4

State of Art BUS interconnection –Congestion problem –Does not scale Network on Chip –Possible congestion problem –Good scalability 5 Introduce unexpected delays in computation –Can’t assure performance when sharing the device between different users

Proposed Solution Switch based interconnection –Cores inputs connected to interconnection outputs –Cores outputs connected to interconnection inputs –Fully pipelined point-to-point communication Data read/write only when all the inputs are available Can be configured by setting for each input and output channels: –Switching configuration: Multiplexer configuration to route information –From which clock cycle the channel is active –How much data have to be read/write through that channel 6

Proposed Solution Suited for Dataflow/Pipelined applications Parameters can be extracted from an high level description of the application and pipeline structure: –Possibility to automate the parameter extraction and interconnection design

Implementation 8 Solution Implemented with HLS: –HLS well suited for dataflow/stencil loop synthesis –Simplify HW development –Generation of compatible interfaces Maxeler Technologies: –HPC Dataflow computing exploiting FPGA –Proprietary HLS starting from Java-like description: Proposed interconnection solution easily described in Java MaxWorkstation 3A: –Intel i7 quad-core –Xilinx Virtex6 XC6VSX547T –PCIe communication: Maximum 8 channels/streams

Evaluation: Area Occupation 9 Area increment (10-30%) due to increase in switching logic The interconnection consumes up to 6% of the FPGA: –Lot of space remains for user cores

Evaluation: Frequency 10 Tested with pass-through cores to evaluate maximum working frequency of the interconnection (300MHz) In case of real life applications (Brain network with cores working at 200MHz) the interconnection does not affect the critical path

Case Study Application: –Image processing pipeline (up to 4 stages): Gray scale (GS), Gaussian blur (GB), Edge detection (ED) filters Their combinations Tested architectures: Experiments: –Single execution of a N stages pipeline –Batch execution of a workload of 100 random applications 11 (A) (B) (C) (D)

Case Study: Single execution 12 (A) (B) (C) (D)

Case Study: Single execution 13 (A) (B) (C) (D)

Case Study: Batch execution 14 Proposed solution (D) does not introduce overhead in the overall execution time w.r.t. the other two architectures Low system load: –Up to 30% reduction in the overall workload execution time

Case Study: Batch execution 15 Low system load (1-2 applications): –Proposed solution (D) does not introduce delays in the execution of a single application of the workload Higher system loads (more than 2 applications): –10%-30% reduction in single application execution time

Conclusions and Future work Conclusion: –Design of a interconnection to support HW resource sharing in multi-application scenario –Solution suited for dataflow/pipelined systems –Possibility to realize different pipeline configurations at run-time Future works: –Design of a mapping/reconfiguration strategy to allocate user cores and configure new core instances at run-time 16

17