GanesanP91 Synthesis for Partially Reconfigurable Computing Systems Satish Ganesan, Abhijit Ghosh, Ranga Vemuri Digital Design Environments Laboratory.

Slides:



Advertisements
Similar presentations
Finite State Machines (FSM)
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Register Transfer Level
Give qualifications of instructors: DAP
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
Integrated Circuits Laboratory Faculty of Engineering Digital Design Flow Using Mentor Graphics Tools Presented by: Sameh Assem Ibrahim 16-October-2003.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
CS 151 Digital Systems Design Lecture 37 Register Transfer Level
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Performed by: Lin Ilia Khinich Fanny Instructor: Fiksman Eugene המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי.
1 Performed by: Lin Ilia Khinich Fanny Instructor: Fiksman Eugene המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Design Flow – Computation Flow. 2 Computation Flow For both run-time and compile-time For some applications, must iterate.
1 Chapter 9 Design Constraints and Optimization. 2 Overview Constraints are used to influence Synthesizer tool Place-and-route tool The four primary types.
Configurable System-on-Chip: Xilinx EDK
Chapter 7 Design Implementation (II)
1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.
9/15/09 - L25 Registers & Load Enable Copyright Joanne DeGroat, ECE, OSU1 Registers & Load Enable.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
Computer Science 210 Computer Organization The Instruction Execution Cycle.
CSET 4650 Field Programmable Logic Devices
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
Shashi Kumar 1 Logic Synthesis: Course Introduction Shashi Kumar Embedded System Group Department of Electronics and Computer Engineering Jönköping Univ.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Xilinx Development Software Design Flow on Foundation M1.5
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Los Alamos National Lab Streams-C Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin- Paine Los Alamos National Laboratory Janice M. Stone Stone.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
TOPIC : SYNTHESIS INTRODUCTION Module 4.3 : Synthesis.
Paper Review Presentation Paper Title: Hardware Assisted Two Dimensional Ultra Fast Placement Presented by: Mahdi Elghazali Course: Reconfigurable Computing.
1 - CPRE 583 (Reconfigurable Computing): VHDL to FPGA: A Tool Flow Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: 9/7/2011.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Introduction to FPGA Tools
ECE-C662 Lecture 2 Prawat Nagvajara
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
04/26/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Introduction to VHDL 12.2: VHDL versus Verilog (Separate File)
Digital Design Using VHDL and PLDs ECOM 4311 Digital System Design Chapter 1.
IAY 0600 Digital Systems Design Register Transfer Level Design (GCD example) Lab. 7 Alexander Sudnitson Tallinn University of Technology.
Meenakshi Kaul, Vinoo Srinivasan, Sriram Govindarajan, Iyad Ouaiss, and Ranga Vemuri University of Cincinnati
FPGA-Based System Design Copyright  2004 Prentice Hall PTR Topics n Modeling with hardware description languages (HDLs).
Chapter 11: System Design Methodology Digital System Designs and Practices Using Verilog HDL and 2008, John Wiley11-1 Chapter 11: System Design.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,
ASIC Design Methodology
Introduction to Programmable Logic
Topics Modeling with hardware description languages (HDLs).
IAY 0600 Digital Systems Design
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Topics Modeling with hardware description languages (HDLs).
Introduction to cosynthesis Rabi Mahapatra CSCE617
Reconfigurable Computing
Hardware Description Languages
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
VHDL Introduction.
THE ECE 554 XILINX DESIGN PROCESS
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

GanesanP91 Synthesis for Partially Reconfigurable Computing Systems Satish Ganesan, Abhijit Ghosh, Ranga Vemuri Digital Design Environments Laboratory Dept of ECECS, University of Cincinnati [satish, This work is sponsored in part by the US Air Force, Wright Laboratory, WPAFB, under contract number F C-1043

GanesanP92 Synthesis System Overview Translator High-level Synthesis Dynamic Reconfiguration Set Generation Dynamic Reconfiguration Set Generation Logic Elaboration Host-side Controller Layout Synthesis PARTIALLY RECONFIGURABLE FPGA Input Specification (VHDL / C)

GanesanP93 Target Architecture Model P1 P2 device Features: Partially reconfigurable device where a portion of the device can be reconfigured while the remaining part is still operational Target device split into two parts : P1, P2 Design is split into sequential blocks and loaded on the two portions of the device Reconfiguration of a block is overlapped with execution of another

GanesanP94 Input Specification Behavior specification in VHDL/C subset Translated into Intermediate Representation Intermediate Representation: Block 1 Block 2 Block 3Block 4 Block 5 Block 6 Behavior Block Input Format Single thread of control Each block performs set of computations Data transfer through branch interface Supports control constructs

GanesanP95 High-level Synthesis (HLS) High-level Synthesis Engine RTL Component Library Input Specification (Behavior Blocks) Area / Timing Constraints Register - Transfer Level Design (RTL Blocks) SchedulingAllocationBinding

GanesanP96 High-level Synthesis (HLS) Block 1 Block 2 Block 3Block 4 Block 5 Block 6 Each behavior block in the block graph separately synthesized HLS RTL Blk 1 RTL Blk 2 RTL Blk 3RTL Blk 4 RTL Blk 5 RTL Blk 6

GanesanP97 RTL Model DATAPATH (net-list of components) CONTROLLER (finite state machine) DESIGN I/0 Clock Reset Start Finish Flags Controls Glushkovian Model Components in the datapath implement operations specified in behavior Controller (FSM) provides necessary controls for execution HLS generates 4 signals : Clock(in), Reset(in), Start(in), Finish(out)

GanesanP98 Dynamic Reconfiguration RTL Blk 1 RTL Blk 2 RTL Blk 3RTL Blk 4 RTL Blk 5 RTL Blk 6 RTL Blk 1RTL Blk 2 RTL Blk3|4 RTL Blk 5 RTL Blk 6 DR Input: RTL block graph, with each block having been separately synthesized Output: Sequence of reconfiguration sets Each reconfiguration set has two blocks: one reconfigures, other executes Intermediate data between blocks stored in board registers

GanesanP99 Dynamic Reconfiguration: Example Step1: RTL Block 1 is loaded on the device Step2: RTL Block 1 is executed ; RTL Block 2 is configured Step3: RTL Block 1 completes execution ; RTL Block 3 is reconfigured in place of RTL Block 1; RTL Block 2 is executed Step4: Repeat Steps 2 and 3 until all RTL blocks have been loaded and executed RTL Blk 1 RTL Blk 2 RTL Blk 3 RTL Blk 4 RTL Blk 5

GanesanP910 Latency Improvement Latency of design without DRSG approach L 1 =  (R i + E i ) 1 <= i <= n Latency of design with DRSG approach L 2 = R 1 +  max(R i+1, E i ) 1 <= i <= n where : Ri : reconfiguration time of i th block Ei : execution time of i th block It is easily seen that L 2 <= L 1 RTL Blk 1 RTL Blk 2 RTL Blk 3 RTL Blk 4 RTL Blk 5

GanesanP911 Handling Conditional Constructs RTL Block 1 is a conditional block Either RTL Block2 or RTL Block3 is executed due to single thread of control Two approaches to handle conditional branching Approach I: host polling The host waits on the conditional predicate to evaluate to load the appropriate branch L 1 = R 1 +  max(R i+1, E i ) +R j 1 <= i <= n where R j : reconfiguration time of the branch that is executed RTL Blk 1 RTL Blk 2RTL Blk 3 RTL Blk 4

GanesanP912 Handling Conditional Constructs Approach II: branch prediction The host loads one of the branches based on a user given profile Latency of the design if the correct branch was loaded L 1 = R 1 +  max(R i+1, E i ) 1 <= i <= n If the wrong branch was loaded, L 2 = R 1 +  max(R i+1, E i ) +R j 1 <= i <= n where R j : reconfiguration time of the branch L 1 <= L 2, always RTL Blk 1 RTL Blk 2RTL Blk 3 RTL Blk 4

GanesanP913 Logic Elaboration VELAB Logic Elaboration VELAB RTL Component Library Input RTL Specification Elaborated net-list file in EDIF format Features: Pre-placed component library to aid layout synthesis RTL specification obtained form HLS tool ASSERTA Net-list produced in EDIF format

GanesanP914 Layout Synthesis XACT6000 Layout Synthesis XACT6000 Input Net-list Specification FPGA bit-stream Features: Manual placement required to ensure place and route using XACT6000 Replaced blocks are placed in the same location as the blocks they substitute Bitmap files produced in cal format

GanesanP915 Host-side Controller Bitmap filesReconfiguration Set Sequence RTR implementation of design Features: Manages the partially reconfigurable FPGA device Loads and executes bitmap files based on the reconfiguration sequence generated by DRSG phase Device used is Xilinx 6200

GanesanP916 Results : Percentage Configuration time Design 4x4 2D FFT 4x4 1D DCT 16-tap FIR Total rec. 929 us 1416 us 338 us Total exec 1025 us 2008 us 200 us Overlap 678 us 1161 us 0 us Latency 1276 us 2263 us 538 us % conf Table presents percentage total time spent only in configuration using the synthesis flow The examples show significant improvements in overall latency

GanesanP917 Conclusions and Future Work Conclusions: Presented a synthesis system for partially reconfigurable FPGAs Proposed a dynamic reconfiguration set generation strategy to improve overall design latency by reducing reconfiguration time Results showed considerable decrease in reconfiguration times Future work: Automate the procedure of generating run-time reconfigurable designs for partially reconfigurable FPGAs