1 © FASTER Consortium Proprietary Novel Design Methods and a Tool Flow for Unleashing Dynamic Reconfiguration Kyprianos Papadimitriou, Christian Pilato,

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

D ARMSTADT, G ERMANY - 11/07/2013 A Framework for Effective Exploitation of Partial Reconfiguration in Dataflow Computing Riccardo Cattaneo ∗, Xinyu Niu†,
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Berlin, Germany – January 21st, 2013 A2B: A F RAMEWORK FOR F AST P ROTOTYPING OF R ECONFIGURABLE S YSTEMS Christian Pilato, R. Cattaneo, G. Durelli, A.A.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
Chapter 1 and 2 Computer System and Operating System Overview
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
1 © FASTER Consortium Catalin Ciobanu Chalmers University of Technology Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
A New Method For Developing IBIS-AMI Models
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Hardware-software Interface Xiaofeng Fan
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
EE3A1 Computer Hardware and Digital Design
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Graphical Design Environment for a Reconfigurable Processor IAmE Abstract The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Final Presentation Hardware DLL Real Time Partial Reconfiguration Management of FPGA by OS Submitters:Alon ReznikAnton Vainer Supervisors:Ina RivkinOz.
POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
Reconfigurable Computing1 Reconfigurable Computing Part II.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Dynamo: A Runtime Codesign Environment
Ph.D. in Computer Science
FPGA: Real needs and limits
Introduction to cosynthesis Rabi Mahapatra CSCE617
Matlab as a Development Environment for FPGA Design
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
A High Performance SoC: PkunityTM
HIGH LEVEL SYNTHESIS.
Presentation transcript:

1 © FASTER Consortium Proprietary Novel Design Methods and a Tool Flow for Unleashing Dynamic Reconfiguration Kyprianos Papadimitriou, Christian Pilato, Dionisios Pnevmatikatos, Marco. D. Santambrogio, Catalin Ciobanu, Tim Todman, Tobias Becker, Tom Davidson, Xinyu Niu, Georgi Gaydadjiev, Wayne Luk, Dirk Stroobandt Foundation for Research and Technology-Hellas (FORTH) IEEE/IFIP Int’l Conference on Embedded and Ubiquitous Computing (EUC) Cyprus, Dec 5-7, 2012

2 © FASTER Consortium Proprietary Reconfiguration “The process of physically altering the location or functionality of network or system elements. Automatic configuration describes the way sophisticated networks can readjust themselves in the event of a link or device failing, enabling the network to continue operation” Gerald Estrin, 1960

3 © FASTER Consortium Proprietary Reconfigurable Technology Technology for adaptable hardware systems –Add/remove components at run-time/product lifetime –Flexibility at hardware speed (not quite ASIC) –Parallelism at hardware level (depending on application) –Ideally: alter function & interconnection of blocks; dynamically Implementation in: –Field Programmable Gate Arrays (FPGAs): fine grain, complex gate + memory blocks + DSP blocks, etc. –Coarse Grain chips: multiple ALUs, multiple (simple) programmable processing blocks, etc.

4 © FASTER Consortium Proprietary Technological Status - Opportunity Programming has become very difficult –Impossible to balance all constraints manually & effectively More than ever before –Cores are free, reconfigurable computational horse-power logic available on chip, cores can be heterogeneous Energy tends to be #1 in priority –Software must become energy and space-aware FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) vision: –Optimize and meet changing requirements while taking advantage of the underlying complex architectures

5 © FASTER Consortium Proprietary Partners

6 © FASTER Consortium Proprietary Outline Motivation Scope Design Methods 1.High-level Analysis 2.Partitioning Methodology 3.Region-based Reconfiguration 4.Micro-reconfiguration 5.Baseline Scheduling and Mapping 6.Verification Tool Flow Run-time System Manager Experimental Systems

7 © FASTER Consortium Proprietary FASTER Motivation Focus on fine-grain reconfiguration (but not-limited) Creating reconfigurable systems is not straightforward. The designer has to: –Identify portions to be reconfigured –Establish a schedule that (a) respects dependencies, (b) achieves good performance, (c) meets constraints –Manage system resources (reconfiguration area mainly) –Consider reconfiguration cost –Verify a changing system Tool support for these tasks is still quite basic

8 © FASTER Consortium Proprietary FASTER Scope Include reconfigurability as an explicit design concept in designing systems with reconfigurable resources Balance effectively performance, power, area Propose new design methods for HW/SW systems; integrate them into a unified tool flow Provide flexibility, while keeping complexity low Efficient and transparent runtime support

9 © FASTER Consortium Proprietary 1. High-level Analysis Automatically identify and exploit run-time reconfiguration opportunities –While optimizing resource utilization Based on –Data Flow Graph (DFG) –Application parameters, e.g. input data size –Physical design constraints, e.g. area, memory bandwidth Output –Estimated values for consumed area, computation time, reconfiguration time, power consumption –Identification of partitions (determination of data dependencies, idle functions etc)

10 © FASTER Consortium Proprietary 2. Partitioning Methodology Employ methods for –Partitioning tasks between SW and HW –Identifying proper level of reconfiguration for HW tasks, i.e. none, region-based, micro-reconfiguration –Task graph transformation, e.g. clustering consecutive tasks assigned to the same processing element Characteristics taken into account –Communication costs –Logic dedicated to cores –Physical design constraints, e.g. area, memory bandwidth –Power consumption –Computation time –Reconfiguration time

11 © FASTER Consortium Proprietary 3. Region-based Reconfiguration Function(s) encapsulated into a specific region of the FPGA –Process carried out at design time by creating bitfiles for specific regions A region can be reconfigured while the rest FPGA executes –“On the fly reconfiguration” FASTER research challenge: relocation support –Loading function(s) into a different region than it was originally created for

12 © FASTER Consortium Proprietary 4. Micro-reconfiguration In some applications we can identify fast changing inputs vs. slow ‐ changing “parameters” –Triggers a small-scale reconfiguration to specialize a circuit dynamically –Results in smaller and faster circuit vs. original one We want to –Identify the parameters (use of profiler vs. manually) –Create bitfile with “holes” –Parameter values => reconfiguration bits for missing “holes” –Perform fine grain changes; allows for fast reconfiguration –Extend the idea from logic (TLUT) to wires (TCON)

13 © FASTER Consortium Proprietary 4. Micro-reconfiguration (cont’d)

14 © FASTER Consortium Proprietary 5. Baseline Scheduling and Mapping Performed after generation of HW cores & interfaces Cores characterized in terms of resources –To evaluate compatibility of the implementation of a reconfigurable region candidate for the mapping –To annotate the corresponding implementation associated with each task Determination of reconfigurable regions –Amount of regions; Size; Position; Constraints Provide initial assignment of the tasks tagged for region-based reconfiguration onto specific regions Supports alternative mapping Baseline scheduling; to drive the runtime scheduler

15 © FASTER Consortium Proprietary 6. Verification Check if behaviour of optimized design (target) = unoptimized design (source) Traditional approach: extensive simulation –Large test inputs; all cases covered? FASTER approach –Combine symbolic simulation with equivalence checking In some cases static approaches aren’t enough. Dynamic aspects of behaviour to be verified at – compile time (virtual multiplexers to model mutually exclusive reconfigurable regions), – runtime (light-weight support for small impact on performance)

16 © FASTER Consortium Proprietary 6. Verification (cont’d) Source Equivalent? Equivalent Not equivalent, couter-example Checker Symbolic simulator Compiler Target Transformations Symbolic input Output (from source) Output (from target) SourceTarget YesNo Compile to simulation Design optimization Symbolic simulation Validation Source Target

17 © FASTER Consortium Proprietary Shaping the Tool Flow - 1 System described in XML format 4 independent XML parts are analyzed, generated and updated through iterations Starting point is a C description of the application –Annotated with OpenMP pragmas

18 © FASTER Consortium Proprietary Shaping the Tool Flow - 2 High-level analysis Estimation of metrics (power, speed, area) App task profiling + Identification of reconfigurable cores Optimization of app for Region-based & micro- reconfiguration Compile-time baseline scheduling and core mapping into reconfigurable regions Platform Architecture App Task Graph Performance Characteristics

19 © FASTER Consortium Proprietary Tool Flow - Putting it All Together Incorporates all design methods Design automation methodology to generate HW and SW components Exploits dynamic reconfiguration for different target platforms Runtime system support Outcome quality is evaluated with regard to: –Amount of FPGA resources –Clock frequency –Reconfiguration time –Power and energy consumption

20 © FASTER Consortium Proprietary Run-Time System Manager (RTSM) RTSM in traditional systems –Low level operations to relieve the programmer from dealing with delicate operations –Actions transparent to the programmer –Implemented as a standard library RTSM for partial and dynamic reconfiguration –Extend the OS capabilitilies –Seamless, easy integration into the existing system –Handle efficient on-line scheduling and placement of tasks Advanced mechanisms need to be supported –Scheduling –Defragmentation = f(relocation, scheduling) –Configuration management (caching, prefetching) –Thermal management

21 © FASTER Consortium Proprietary Configuration Content Agnostic ISA I/F Based on the Molen model FPGA viewed as co-processor, extends the GPP architecture Arbiter between the memory and the GPP Register File XREGs used to pass parameters between GPP and reconfigurable units ISA needs to be expanded with more instructions –Minimal set: SET, EXECUTE, MOVTX and MOVFX –Additional instructions to support partial reconfiguration, prefetching, and GPP-FPGA parallel execution

22 © FASTER Consortium Proprietary Actions at Design Time Task Configuration Microcode –Stored at Bitstream (BS) Address –BS length has the bitstream size –Task Parameter Address (TPA) points to the input/output parameters –Task width/height –Execution Time Per Unit of Data (ETPUD)

23 © FASTER Consortium Proprietary Actions at Design Time (cont’d) Micro-reconfiguration support –RT flag : reconfiguration type –N : the number of parameters of the parameterized configuration –N parameter width / XREG index pairs –A binary representation of the parameterized configuration data

24 © FASTER Consortium Proprietary RTSM Scheduler Responsible for –The time slot in which reconfiguration of a task module will occur –The portion of the FPGA in which a HW task will be placed –The time slot in which its execution will start How –Input from a dependency/communication graph –Based on a list of criteria, e.g. reconfiguration time, area constraint, precedence between the modules, fragmentation level, power –Directions from baseline scheduling

25 © FASTER Consortium Proprietary RTSM Scheduler Input Requirements Static parameters, i.e determined at compile time and are not changed at runtime –Size of reconfigurable areas –Reconfiguration time = f (bitstream size, reconfiguration mechanism+path) –ETPUD (Execution Time Per Unit of Data) –Tasks assigned to be executed in fixed Processing Elements (PE), i.e. CPU or static HW certain reconfigurable areas Dynamic parameters, i.e. updated at runtime –ABD

26 © FASTER Consortium Proprietary Fixed interface for communication of cores with a runtime manager Scheduling policies implemented as libraries Software cores taken into account during exploration Edge Detection app running on a XUPV5 FPGA board –An embedded processor used as the execution manager –2 nd processor for execution of SW tasks and reconfiguration GUI for the designer (minimize errors in XML file) –Add implementations –Task mapping –Selection of parameters of the architecture (e.g. memory addresses) Experimental System - Embedded

27 © FASTER Consortium Proprietary XUPV5 FPGA board plugged onto a PCIe 1x –CentOS; transactions performed through DMA (1.5 Gbps) SW components include a user application and a kernel driver –User application issues an IOCTL call to send/receive data to/from kernel driver –Driver is responsible for low-level data transfer Reconfiguration performed through JTAG –Using vendor's USB programmer Host SW awaiting user’s selection –Different bitstreams stored in host HDD –Select precompiled circuit; configure FPGA; control communication between host-FPGA; deliver results back to the user; operation transparent to the user (runtime system) Experimental System - Desktop

28 © FASTER Consortium Proprietary Summary Reconfiguration feature inserted at early stages of system designing New design methods combined to form a new tool flow Abstract view of the system but efficient use Target application domains: embedded, desktop, high performance computing