Workshop - November 2011 - Toulouse Toulouse, J.LACHAIZE (Astrium) High Level Synthesis.

Slides:



Advertisements
Similar presentations
Some Trends in High-level Synthesis Research Tools Tanguy Risset Compsys, Lip, ENS-Lyon
Advertisements

1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
D ARMSTADT, G ERMANY - 11/07/2013 A Framework for Effective Exploitation of Partial Reconfiguration in Dataflow Computing Riccardo Cattaneo ∗, Xinyu Niu†,
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Using emulation for RTL performance verification
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Workshop - November Toulouse Paul Brelet TRT Modeling of a smart camera systems 24/11/
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
System Partitioning Kris Kuchcinski
Tejas Bhatt and Dennis McCain Hardware Prototype Group, NRC/Dallas Matlab as a Development Environment for FPGA Design Tejas Bhatt June 16, 2005.
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
ECE 667 Synthesis & Verification - Design Flow GAUT: Génération Automatic d’Unité de Traitement ECE 667 Fall 2014 ECE 667 Fall 2014 Synthesis and Verification.
ECE 699: Lecture 2 ZYNQ Design Flow.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Embedded Systems Design at Mentor. Platform Express Drag and Drop Design in Minutes IP Described In XML Databook s Simple System Diagrams represent complex.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
Workshop - November Toulouse Ronan LUCAS - Magillem Design Services 07/04/2011.
Impulse Embedded Processing Video Lab Generate FPGA hardware Generate hardware interfaces HDL files HDL files FPGA bitmap FPGA bitmap C language software.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Extreme Makeover for EDA Industry
Automated Design of Custom Architecture Tulika Mitra
3. ISP Hardware Design & Verification
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
1 H ardware D escription L anguages Modeling Digital Systems.
System Design with CoWare N2C - Overview. 2 Agenda q Overview –CoWare background and focus –Understanding current design flows –CoWare technology overview.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
Workshop - November Toulouse Paul Brelet TRT Exploration and application deployment on a SoC: efficient application.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Winter-Spring 2001Codesign of Embedded Systems1 Methodology for HW/SW Co-verification in SystemC Part of HW/SW Codesign of Embedded Systems Course (CE.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Workshop - November Toulouse Astrium Use Case.
Workshop - November Toulouse (SoC toolKit for critical Embedded sysTems) Thales Use Case: Pedestrian tracking with smart cameras SoCKET Collaborative.
Hardware Accelerator for Hot-word Recognition Gautam Das Govardan Jonathan Mathews Wasim Shaikh Mojes Koli.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Design for Verification in System-level Models and RTL Anmol Mathur Venkat Krishnaswamy Calypto Design Systems, Inc.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
Workshop - November Toulouse Paul Brelet TRT Case of smart camera system 24/11/
Final Presentation Hardware DLL Real Time Partial Reconfiguration Management of FPGA by OS Submitters:Alon ReznikAnton Vainer Supervisors:Ina RivkinOz.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
Programmable Hardware: Hardware or Software?
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
Introduction to High-level Synthesis
Design Flow System Level
CoCentirc System Studio (CCSS) by
Matlab as a Development Environment for FPGA Design
THE ECE 554 XILINX DESIGN PROCESS
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

Workshop - November Toulouse Toulouse, J.LACHAIZE (Astrium) High Level Synthesis

Application to industrial case studies: Astrium Global SoC spec. SoC Architecture Functional validation SW Performance validation C/C++/ASM Functionality Functionality + timing Instruction Set Simulator System requirements Platform assembly Metrics HLS System Properties HW Properties SW Properties TLMLT TLMAT Software Co-simulation/Co-emulation Silicon Software Device execution HLS Traffic generators Metrics IP-Xact SoC Header generation RTL Software Requirement traceability

Overview  Algorithm  Process  C translation  C adaptation to GAUT  First iteration  Simulation

Algorithm

Process Reference Model : MATLAB code Manual transformation of MATLAB to C code Validation based on 3 reference cases Output comparison (bit-accurate objective) Identification of C functions that can yield better performance in HW Synthesis of C code (GAUT) Testbench generation (GAUT) Test of generated code (Modelsim®) Iteration on IO control

Algorithm -> HLS Validation Intermediate results required Validity criteria (computation precision)

HLS for architecture exploration Metrics (area, performance) – 20% pessimistic - Usable for for tradeoffs Help for bit accurate arithmetic migration (ac_type/sc_type) HLS requires to consider any IO architecture bottlenecks HLS incremental refinement try/test loop: heuristic approach Allows to measure latency introduced by pipelining Separation of the processing process and the IO constraints

Academic tool Public Domain (CECILL-B License) Open source and free Dedicated to DSP applications Data-dominated algorithm Inputs : Algorithm written in bit-accurate C/C++ Bit-accurate integer and fixed-point from Mentor Graphics Synthesis constraints (data average throughput, clock, I/O constraints…) Outputs : RTL Architecture written in VHDL (IEEE 1076) Simulation model in SystemC Automated Test-bench generation High Level Synthesis: GAUT Global SoC spec. SoC Architecture Functional validation SW Performance validation C/C++/ASM Functionalit y Functionality + timing Instruction Set Simulator System requirements Platform assembly Metrics HLS System Properties HW Properties SW Properties TLM LT TLM AT Software Co-simulation/Co-emulation Silicon Software Device execution HLS Traffic generators Metrics IP-Xact SoC Header generation RTL Software Requirement traceability

Example : Static detection Conversion of RGB pixel (i,j) into pseudo-chromatic value. Generation of a bit mask corresponding to the validity of the pixel (i,j) if the value is in a pre-defined range. Pseudo-code Val = Pix.R + Pix.G + Pix.B If LowThreshold <= Val <= HighThreshold then Mask = 1 else Mask = 0

Original C Code #include "socket.h" #include /* for printf */ void staticDetection (T_PARAMS *params,T_IMAGE image, T_MASK mask) { int ligne,colonne ; int tmp ; #if DEBUG_STATICDETECTION printf("--> StaticDetection \n") ; #endif for (colonne=0;colonne imSizeC;colonne++) { for (ligne=0;ligne imSizeL;ligne++) { tmp = MAT(image,ligne,colonne).R+MAT(image,ligne,colonne).G+MAT(image,ligne,colonne).B ; if ((tmp >= params->lowThres) && (tmp highThres)) { MAT(mask,ligne,colonne) = (unsigned char)1 ; } else { MAT(mask,ligne,colonne) = 0 ; #if DEBUG_STATICDETECTION printf("StaticDetection Invalid pixel (%d,%d) \n",ligne,colonne) ; #endif }

C transformation GAUT needs « main » function Inputs transmitted by value Outputs transmitted by address ( *) Remove non synthesisable code (e.g. printf) Complete path for include files (absolute or relative to the C file) corrected with GAUT last release => Use of pragma

GAUT C Code #include «../include/socket.h" #ifndef GAUT #include #endif #ifdef GAUT static const hiThres = 1022 ; static const lowThres =2 ; #endif #ifdef GAUT int main (T_PARAMS *params,T_IMAGE image, T_MASK mask) #else void staticDetection (T_PARAMS *params,T_IMAGE image, T_MASK mask) #endif { int ligne,colonne ; int tmp ; …. }

Graph Generation

Code impact

VHDL simulation

Synthesis improvement Use of IO constraints to select the bus used by the data, allows serialization or parallelisation

Single Data bus

Parallel data buses

GAUT REX 1/2 Mainly limited to pipelinable algorithm (which restrict the usage in the industrial world) Strong effort to write synthesisable C code Strong effort to write the constraint file Some limitations into the HDL generation Missing functions (division) Not critical and partially corrected Generation of non synthesisable VHDL (for edge-case) Multiple outputs are not synchronous

GAUT REX 2/2 Powerful academic tool for Data Flow Graph Good support of Xilinx targets ATC18RHA targeting on-going study Generated component directly pluggable on a bus (not used) Generated HDL efficiency (# gates and speed) Further evolution in the frame of project P CDFG support IO communication pattern instantiation

HLS usage Fast algorithm exploration (ease iteration) Area and achievable speed estimation HW/SW partitioning trade-offs Improve early confidence in the design Delayed the design freeze Data exchange characterisation Help identifying the factorisable operators (use of intermediate representation)

HLS highlights Ease IP maintenance/evolution. Requires both hardware competence and software skills. It’s quite natural to transform Matlab to C then to RTL but it is a trap. The C implementation is optimized for SW not for HW and the SW optimization could be counterproductive. Not optimal for data handling (FIFO, Cache, prefetch) Manager : no gain in the development process but the exploration process and avoid some dead-end

GAUT expected enhancement Pipelining needs to be externally handled Valid signal after the pipeline fill Synchronization of multiple outputs on the same edge Control of loops unrolling Automatic loop unrolling under constraint: No manual override Selection of loop to be unrolled Output timing constraint propagation Add traceability between the C code and generated VHDL code Map C variable which HDL datapath Operators set extension (e.g: fixed-point division) IO management (optimal data organization,...) Would require some additional work for tool qualification (documentation, validation of generated HDL, IO interface configuration/control,...)

Perspectives ASTRIUM is convinced by the interest of the HLS The quality of the tools depends mainly of the basic library The Control Data Flow is mandatory Majority of algorithm embed multiple phases. GAUT is accessible for preliminary studies and its performance are comparable with some commercial tools. A strong cooperation is foreseen with the Lab-STICC to improve the tool in on-going project.

Thank you for your attention ? ? ? Any questions ?