Metrics for Reconfigurable Architectures Characterization: Remanence and Scalability Pascal BENOIT G. Sassatelli – L. Torres – D. Demigny M. Robert – G.

Slides:



Advertisements
Similar presentations
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Advertisements

Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
10th Reconfigurable Architecture Workshop, RAW’03, Nice, France, Tuesday, April 22,
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Robert Barnes Utah State University Department of Electrical and Computer Engineering Thesis Defense, November 13 th 2008.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Introduction to: Reconfigurable Hardware Shervin Vakili December 22, 2007 All materials are copyrights of their respective authors as.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
Core-based SoCs Testing Julien Pouget Embedded Systems Laboratory (ESLAB) Linköping University Julien Pouget Embedded Systems Laboratory (ESLAB) Linköping.
Performance and Energy Bounds for Multimedia Applications on Dual-processor Power-aware SoC Platforms Weng-Fai WONG 黄荣辉 Dept. of Computer Science National.
Design of a Reconfigurable Hardware For Efficient Implementation of Secret Key and Public Key Cryptography.
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP Applications Multiple programmable VLIW processors arranged in a ring topology –Balances its.
Yongjoo Kim*, Jongeun Lee**, Jinyong Lee*, Toan Mai**, Ingoo Heo* and Yunheung Paek* *Seoul National University **UNIST (Ulsan National Institute of Science.
A performance analysis of multicore computer architectures Michel Schelske.
Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# ENG*6530 Tues, June, 10,
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: March 3, 2014 Instruction Space Modeling 1.
Paper Review: XiSystem - A Reconfigurable Processor and System
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
1. DAC 2006 CAD Challenges for Leading-Edge Multimedia Designs.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
CML REGISTER FILE ORGANIZATION FOR COARSE GRAINED RECONFIGURABLE ARCHITECTURES (CGRAs) Dipal Saluja Compiler Microarchitecture Lab, Arizona State University,
EE3A1 Computer Hardware and Digital Design
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye.
A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Computer Architecture SIMD Ola Flygt Växjö University
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Hyunchul Park†, Kevin Fan†, Scott Mahlke†,
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Presenter: Darshika G. Perera Assistant Professor
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
Topics Coarse-grained FPGAs. Reconfigurable systems.
ESE534: Computer Organization
System On Chip.
SmartCell: A Coarse-Grained Reconfigurable Architecture for High Performance and Low Power Embedded Computing Xinming Huang Depart. Of Electrical and Computer.
Energy Efficient Computing in Nanoscale CMOS
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
Targeting Tiled Architectures in Design Exploration
CSE-591 Compilers for Embedded Systems Code transformations and compile time data management techniques for application mapping onto SIMD-style Coarse-grained.
Centar ( Global Signal Processing Expo
Dynamically Reconfigurable Architectures: An Overview
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
Embedded Architectures: Configurable, Re-configurable, or what?
Fine-grained vs Coarse-grained multithreading
Department of Electrical Engineering Joint work with Jiong Luo
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

Metrics for Reconfigurable Architectures Characterization: Remanence and Scalability Pascal BENOIT G. Sassatelli – L. Torres – D. Demigny M. Robert – G. Cambon

Outline  Context  Remanence  Operative Density  Case Study: the Systolic Ring  Conclusion and perspectives

Context  SoC and Customizable Platform Based-Design Specifications Processing power Area Power consumption etc. Reconfigurable Hardware (Coarse Grain) ASIC 1 DSP Reconfigurable Hardware (Fine Grain) We need metrics to compare ! ASIC 2

Context  Architecture characterization Processing power Power consumption Flexibility Parallelism potential Dynamism Silicon area Scalability …  Metrics Dehon criterion Remanence Operative density Generalisation to Architectural model characterisation and metrics depend on architectural parameters « Comparing architectures with a minimum of criteria »

Remanence  Definition N PE : # of processing elements (PE) Nc: # of PE configurable per cycle Fe: operating frequency Fc configuration frequency  Characterizes the Dynamism # of cycles to (re)configure the whole architecture Amount of data to compute between 2 configurations Fe Fc

Remanence  Comparisons Only 1 cycle to (re)configure the DSP Few cycles to (re)configure coarse grain RA (  8) Many cycles to (re)configure fine grain RA N PE NcRNameTypeF (MHz) ARDOISE Systolic Ring DART MorphoSys TMS320C62 Fine Grain RA Coarse Grain RA DSP VLIW

Operative Density  Definition N PE : # of PEA: Core Area (relative unit ²) Area can be expressed as a function of N PE (architectural model)  Characterizes Fixed N PE # of operators per relative area unit Variable N PE OD as a function of N PE A(N PE ) = N PE *A PE +A interconnect (N PE )+A memory (N PE ) A sequencer (N PE ) OD(N PE ) = k  A(N PE ) =k.N PE  the architectural model is scalable

Operative Density  Comparisons DSP: sequencer area ARDOISE : fine granularity Coarse granularity Reconfigurable architectures Scalabilty of interconnect resources ? Generalization to architectural models NameType Area(M ²) ARDOISE Fine Grain RA Systolic Ring (S=1, C=6, N=2) Coarse Grain RA Systolic Ring (S=1, C=16, N=4) Coarse Grain RA DART Coarse Grain RA MorphoSys Coarse Grain RA TMS320C62 DSP VLIW NameType N PE Area(M ²) OD (N PE ) ARDOISE Fine Grain RA Systolic Ring (S=1, C=6, N=2) Coarse Grain RA Systolic Ring (S=1, C=16, N=4) Coarse Grain RA DART Coarse Grain RA MorphoSys Coarse Grain RA TMS320C62 DSP VLIW

-Architectural Model Characterization - A Case Study: The Systolic Ring

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths Dnode Switch

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths 3 parameters C: # of layers N: # of Dnodes per layer Dnode Switch layer 1 layer 2 layer 3 layer 4 # of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2)

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths 3 parameters C: # of layers N: # of Dnodes per layer layer 1layer 2 layer 3 layer 4 layer 5layer 6 layer 7 layer 8 # of layers : 8 (C = 8) # of Dnode per layer : 2 (N = 2)

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths 3 parameters C: # of layers N: # of Dnodes per layer S: # of Rings # of layers : 8 (C = 8) # of Dnode per layer : 2 (N = 2) 1 Systolic Ring (S = 1) layer 1layer 2 layer 3 layer 4 layer 5layer 6 layer 7 layer 8

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths 3 parameters C: # of layers N: # of Dnodes per layer S: # of Rings # of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2) 4 Systolic Ring (S = 4)

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths 3 parameters C: # of layers N: # of Dnodes per layer S: # of Rings Control Units Local Dnodes units Dnode Sequencer

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths 3 parameters C: # of layers N: # of Dnodes per layer S: # of Rings Control Units Local Dnode unit Local Ring unit Local Ring Sequencer Local Ring Sequencer Local Ring Sequencer Local Ring Sequencer

Architectural model Characterization  The Systolic Ring Architectural model Based on a coarse-grained configurable PE Circular datapaths 3 parameters C: # of layers N: # of Dnodes per layer S: # of Rings Control Units Local Dnode unit Local Ring unit Global unit Global Sequencer Local Ring Sequencer Local Ring Sequencer Local Ring Sequencer Local Ring Sequencer

Architectural model Characterization  Remanence Only one Systolic Ring S=1 N PE = # of Dnodes = N*C*S = N*C Remanence formalisation k= C/N

Architectural model Characterization  A(N PE ) formalisation for OD(N PE ) 0.18µ CMOS technology C = 4, N = 2, S = 1 A(8) = 3.3 mm ² A(8) = 407M ² Area formalisation: A ( N PE ) = f ( N, C, S ) depends on C / N ratio and S N PE = N.C.S Area formalisation calibrated on these results Systolic Ring layout (C=4, N=2, S=1)

Architectural model Characterization  OD(N PE ) for 1 Systolic Ring (S=1) k = C/N = [ 0.25 ; 4 ] decreasing OD(N PE )  OD(N PE ) for several Systolic Ring k = C/N = 4 multi-ring instanciations increase scalability

Architectural model Characterization  Customisation and design technique between 60 and 80 processing elements

Architectural model Characterization  Customisation and design technique between 60 and 80 processing elements

Architectural model Characterization  Customisation and design technique Design Space

Architectural model Characterization Best OD and remanence Worst interconnect resources and processing power Design Space

Architectural model Characterization Design Space Worst OD and remanence Best interconnect resources and processing power

Architectural model Characterization R and OD can be integrated in CAD tools to observe architectural parameters effects and choose best trade-offs in the design space

R 1 OD 1 R 2 OD 2 R 3 OD 3 R n OD n Conclusion and perspectives IP 1 Specifications Processing power Area Power consumption etc. IP 2IP 3IP n

R 1 OD 1 R 2 OD 2 R 3 OD 3 R n OD n Conclusion and perspectives IP 1 Specifications Processing power Area Power consumption etc. IP 2IP 3IP n Architectural models Comparisons

R 1 OD 1 R 2 OD 2 R 3 OD 3 R n OD n Conclusion and perspectives IP 1 Specifications Processing power Area Power consumption etc. IP 2IP 3IP n Architectural model Customisation

Thank You