Some Trends in High-level Synthesis Research Tools Tanguy Risset Compsys, Lip, ENS-Lyon

Slides:



Advertisements
Similar presentations
Reconfigurable Computing After a Decade: A New Perspective and Challenges For Hardware-Software Co-Design and Development Tirumale K Ramesh, Ph.D. Boeing.
Advertisements

Embedded System, A Brief Introduction
SoC Challenges & Transaction Level Modeling (TLM) Dr. Eng. Amr T. Abdel-Hamid ELECT 1002 Spring 2008 System-On-a-Chip Design.
1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
Copyright  2003 Dan Gajski and Lukai Cai 1 Transaction Level Modeling: An Overview Daniel Gajski Lukai Cai Center for Embedded Computer Systems University.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Give qualifications of instructors: DAP
Synchron’08 Jean-François LE TALLEC INRIA SOP lab, AOSTE INRIA SOP lab, EPI AOSTE ScaleoChip Company SoC Conception Methodology.
CS 151 Digital Systems Design Lecture 37 Register Transfer Level
The Design Process Outline Goal Reading Design Domain Design Flow
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Copyright  1999 Daniel D. Gajski IP – Based Design Methodology Daniel D. Gajski University of California
Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.
Transaction Level Modeling Definitions and Approximations Trevor Meyerowitz EE290A Presentation May 12, 2005.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts,
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Embedded Systems Design at Mentor. Platform Express Drag and Drop Design in Minutes IP Described In XML Databook s Simple System Diagrams represent complex.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,
Role of Standards in TLM driven D&V Methodology
Using Mathematica for modeling, simulation and property checking of hardware systems Ghiath AL SAMMANE VDS group : Verification & Modeling of Digital systems.
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
CAD for Physical Design of VLSI Circuits
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Extreme Makeover for EDA Industry
Automated Design of Custom Architecture Tulika Mitra
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Workshop - November Toulouse Toulouse, J.LACHAIZE (Astrium) High Level Synthesis.
Electrical and Computer Engineering University of Cyprus LAB 1: VHDL.
Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
ECE-C662 Lecture 2 Prawat Nagvajara
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Tanguy Risset 1 Formal Bit With Determination for Nested Loop Programs David Cachera, Tanguy Risset, Djamel Zegaoui.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.
System-on-Chip Design
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
ASIC Design Methodology
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
IP – Based Design Methodology
Design Flow System Level
Introduction to cosynthesis Rabi Mahapatra CSCE617
Reconfigurable Computing
Simulation of computer system
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Architectural-Level Synthesis
HIGH LEVEL SYNTHESIS.
Transaction Level Modeling: An Overview
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Presentation transcript:

Some Trends in High-level Synthesis Research Tools Tanguy Risset Compsys, Lip, ENS-Lyon

2 Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Some on-going projects

3 Context: Embedded Computing Systems design SoC or MPSoC for multimedia application will soon includes: Network on chip dozens of initiators (CPU, DMA,…) Mbytes of code Operating systems Shared memory coherency protocols … SoC Design problems: Time to market Design space exploration Software complexity

4 Some envisaged solutions Time to market IP re-use High level design Design space exploration Fast prototyping and performance evaluation, refinement methodology (specification, algorithm, TLM, CABA) Software complexity Tools for embedded code generation/embedded OS High level synthesis is only a small part of the « High level Design » process

5 Definition of High Level Synthesis HLS: Generates register-transfer level description from behavioral specification, in an automatic or semi-automatic way. Input: A behavioral specification Design constraints Library of available RTL components Output: RTL description Performance evaluations

6 IP block design System application design Refinement : from algorithm to hardware Matlab Matlab C C block implementation RTL Synthesis RTL Synthesis block implementation RTL Synthesis, VHDL, Verilog RTL Synthesis, VHDL, Verilog block specification algorithmic exploration algorithm domain SoC platform design abstract architecture virtual prototype Transaction Level Modeling SoC Intermediate Representation SoC Intermediate Representation Architecture Description Language

7 Abstraction levels for HLS AL= Algorithm prior to HW/SW partition TLM= Transaction-Level Model after HW/SW partition models bit-true behavior, register bank, data transfers, system synchronisation no timing needed T-TLM= Timed TLM (also PVT) TLM + timing annotation refined communication model CABA= Cycle Accurate-Bit Accurate models state at each clock edge RT= Register Transfer (ASIC flow entry point) synthesisable model

8 Pros and Cons « Traditional » motivations: Fast design Safe design : formal refinement approach « Must be used » to cope with Moores law But! Commercial tools are not here A new tool is a big investment Designers have managed without it

9 New motivations ? IP-reuse Slightly change design parameter for re-using IP New target technologies and languages (FPGA, SystemC, etc.) Tools can easily re-target the designs CAD tools companies are investing a lot in « high level- like » synthesis tools Monet, Behavioural compiler, VCC, … Technological advantage Traditional RTL design will be de-localized to Asia

10 Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Some on-going projects

11 HLS Hard Problems Huge design space Complex design space exploration Multi-criteria optimization techniques Integration into a design environment Lack of standard interchange format SoC simulation time is a crucial issue Acceptance by the designers Find a language common to SoC designers and tools designer Refinement technical problems (detailed hereafter)

12 HLS technical problems Compilation occurs when the target architecture is precisely known In HLS, target architecture is only partially specified, Examples: Data-flow architecture/systolic arrays : pure RTL description FSM+data path : closer to processor description HLS technical problems : Initial specification format / language Specification refinement : fixed point arithmetic Scheduling/Mapping refinement: resource constraints Technological Mapping refinement

13 Initial specification format Restriction on the input language expressivity are necessary … but designers hate new languages C-like language (handel-C, silicon-C,hardware-C, etc…) are actually hardware description languages Main problems: How to express parallelism/sequentially -Data-flow, CSP-like, process network, event-driven How to express both algorithmic and RTL description How much expressivity -Dynamic control, loops How to introduce constraints/hints

14 Fixed point arithmetic Problem: translate a floating point computation to fixed point computation Most of the tools start with an initial fixed point specification found by extensive simulation. Automatic techniques are not handling loops In the case of signal processing application the signal processing theory can help (transfer function used to compute signal-to-noise ratio).

15 Scheduling/Mapping For a « basic bloc », resource constraints scheduling is NP-Hard, but widely studied. Computations Currently, two way to handle loops: -Unroll them -Keep them sequential Other solutions: -Use software pipelining theory -Use the polyhedral model Memory and communication Memory mapping is usually strongly guided by the user -Highly active research field (Catthoor, Darte) Communication refinement is also an important issue -Highly dependent on the chosen computation model (Gajski, Kenhuis)

16 Technological mapping refinement Fine technological mapping are very target-dependent Predefined libraries are not precise enough Delays on wires Power consumption VLSI designers « tricks » are difficult to integrate in tools Sub-Micronics technologies constraints are changing too fast for high level tools Cross talk Capacitance

17 Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Some on-going projects

18 Some solution in existing tools Digital signal processing circuits: Gaut: Source: signal processing (one infinite loop) Target: RTL + FSM FSM+datapath Ugh: Source: restricted C Target: FSM+data path Regular computation and polyhedral Model MMAlpha: Source : functional specification Systolic like architectures

19 GAUT:Génération Automatic dUnité de Traitement Developed first at LASTI (Lannion) and then LESTER (Lorient): free Generate RTL description from behavioral description for signal processing algorithm Kernel technology: highly optimized ressource constraint scheduling Inputs are - a behavioral VHDL description (one process repeated infinitely) -Libraries of operators pre-characterized -Some design constraints Outputs are -a synthesizable RTL VHDL description (data path, memory, and communication units) -Gantt chart for I/O specification

20 Compiling -analyzing -loop unrolling Synthesis -selection -Scheduling Mapping Behavioral description VHDL Operator library RTL description (data path+control) graph Memory and IO specifications.src.lib.gc.vhd.mem Gaut design flow User constraints: Latency, clock frequency Operators, Alloc,etc.

21 Gaut : VHDL Input code Sequential instruction in one single process (no clock, no reset, no sensitivity list) ENTITY fir IS PORT (xn:IN INTEGER; yn:OUT INTEGER); END fir; ARCHITECTURE behavioral OF fir IS... BEGIN PROCESS VARIABLE H,x: vecteur; VARIABLE tmp: INTEGER; VARIABLE i: CONTROL; BEGIN tmp := xn * H(0); FOR i IN 1 TO N-1 LOOP tmp := tmp + x(i) * H(i); END LOOP; yn <= tmp; FOR i IN N-1 DOWNTO 2 LOOP x(i) := x(i-1); END LOOP; x(1) := xn; WAIT FOR cadence; END PROCESS; END behavioral;

22 Gaut : Input code Types Bit, boolean, std_logic, Integer (single size), Bit_Vector, Std_Logic_Vector Arrays (to be inlined) Sequential instructions Signal and variables assignment Only one level of if For and While loops (to be inlined) Procedure calls (to be inlined) Function calls corresponding to library elements

23 Gaut step1: Source code transformation Control dependence elimination Loop unrolling y ( 0 ) := x ( 0 ) * h ( 0 ) ;y ( 0 ) := x ( 0 ) * h ( 0 ) ; for i in 1 to n - 1 loopy ( 1 ) := y ( ) + x ( 1 ) * h ( 1 ); y ( i ) := y ( i - 1 ) + x ( i ) * h ( i ) ;y ( 2 ) := y ( ) + x ( 2 ) * h ( 2 ) ; end loop ;y ( 3 ) := y ( ) + x ( 3 ) * h ( 3 ) ; Procedure inlining Static single assignmentb := x + z ; a := b + c ; b := e + f ;b0001 := e + f ; y := b;y := b0001;

24 Gaut step1: Source code transformation Simple expression generation b := x + z * u ;tmp := z * u ; b := x + tmp ; Constant propagation Generation of GC Graph ( Data-Flow Graph Format of Synchronous Programming)

25 GAUT step 2: Scheduling/Mapping In addition to throughput and clock cycle, the user can give: Ressource constraints and mapping constraints Memory constraints I/O constraints Optimization type The result is an architecture and a GANTT charts For computations For I/O For memory

26

27 Gaut step 3: memory and communication synthesis Optimizing memory layout and minimizing buses ASIC I/O Communication unit Datapath Memory unit Control

28 Gaut: summary Advantages Advanced development status (still research tool) User guided synthesis Open library Active research team: memory optimization, communication synthesis Drawbacks Loop flattening (complexity problem) Predefined timing characteristics Hard to get out of 1D signal processing

29 Ugh: User Guided High Level Synthesis Developed at LIP6 (Paris), as part of the Disydent project (Digital System Design Environment): open source Behavioral level synthesis tool for control dominated coprocessor Emphasis on precise timing estimation Kernel technology: ressource constraint scheduling and (GNU-like) compiler construction technology Inputs are - a C or VHDL behavioral description with KPN communication primitives -a draft data-path -a cycle time constraint TC Outputs are -a synthesizable RTL VHDL model -a cycle accurate simulation model

30 Coprocessor System Environment Bus unit Coprocessor Processor R3000 ICacheDCache PI-BUS RAM Controller M/S Interface

31 UGH Structure Annotations Timing Caba simulation Data-Path + FSM Model Synthesis + Characterization UGH-FGSUGH-CGS Data-Path Draft VHDL Ugh C FSM/C VHDL Data-Path Cell Library Depends on the Synthesis tool VHDL CK Coarse grain scheduler Fine grain scheduler (Synopsys)

32 Input 1 : UGH-C LibraryIEEE; Use ieee.std_logic_arith.a ll; entity HCF is port (CK: in bit; DINA: in integer; READA: out bit; ROKA: in bit; DINB: in integer; READA: out bit; ROKA: in bit; DOUT: out integer; WRITE: out bit; WOK: int bit); end HCF; #include ugh_inChannel32 work2hcfa; ugh_inChannel32 work2hcfb; ugh_outChannel32 hcf2work; uint32 a,b; void hcf(void) { while (a != b) if (a < b) b = b - a; elsea = a - b; } int ugh_main() { while (1) { channelRead(work2hcfa,&a); channelRead(work2hcfb,&b); hcf(); channelWrite(hcf2work,&a); } C Description

33 Input 2 : Draft Data-path S b QD a DQ Subst A B work2hcfa model Hcf(sofifo hcf2work; sififo work2hcfa, work2hcfa) { DFFl a, b; SUB subst; subst.A= a.Q, b.Q; subst.B= a.Q, b.Q; a.D= subst.S, work2hcfa; b.D= subst.S, work2hcfb; hcf2work= subst.S; } work2hcfb hcf2work

34 OUTPUT 1 : Refined Data path

35 OUTPUT 2 : FSM for control IF S1S2 RESET READY READA READB START ROKA ROKB START RESET ROKA ROKB WRITE WOK WHILE

36 Ugh summary Advantages Precise timing information Multi cycle operation Almost a compiler approach (restricted target architecture) Interfacing (Integrated in a SoC design environment) Drawbacks Development status (research tool) Low level information given by the user Highly dependent on commercial tool (synopsys) Dedicated to control oriented applications

37 MMAlpha Developed in Irisa (Rennes): open source High level synthesis of highly pipelined accelerators Kernel technology: polyhedral model and systolic design methodology Emphasis on loop transformations Input : functional specification (Alpha langage) Output : RTL description of systolic-like architecture (Alpha or VHDL)

38 C For i=1:1:N For j=1:1:N Alpha FPGA host bus Uniformization RTL derivation Scheduling VHDL C C C MMAlpha design flow

39 What is polyhedral model? Abstract a loop nest by the polyhedron described by the loop indices during execution of the loop Can be used for any index-based structure : memory (arrays), communications (accesses), etc… example: convolution (FIR filter) for (i=N; i<=M; i++) { y(i)=0; for (n =0; n<=N-1; j++)) { y(i)=y(i)+H(n)x(i-n) }}

40 FIR: iteration space H(N-1) H(0) y(N) y(N+1) 00 x(N)x(N+1) i n

41 FIR polyhedral representation (MMAlpha input language) H(N-1) H(0) y(N) y(N+1) 00 x(N)x(N+1) i n

42 MMAlpha polyhedral scheduling H(N-1) H(0) y(N) y(N+1) 00 x(N)x(N+1) t=45 6 i n

43 MMAlpha space time transformation H(N-1) H(0) y(N) 00 x(N)x(N+1) t=4 5 6 t p

44 MMAlpha mapping H(N-1) H(0) y(N) 00 x(N)x(N+1) t=4 5 6 t p H 0 y x i

45 MMAlpha resulting architecture

46 MMAlpha current features Tool box for designers: Powerful analyze tools Pipelining, Change of basis, multi-dimensionnal scheduling, control signal generation. Code generation (C, VHDL) Hierarchical design methodology Work in progress: Ressource constraint scheduling (extention to Z-polyhedra) Multi-dimensionnal scheduling and memory synthesys

47 MMAlpha summary Advantages Design tool integrating loop transformation Parameterised design (N: size of the filter not fixed until VHDL generation) Formal approach for refinement (functional to operational) A real language that syntactically captures HLS input restriction Drawbacks Does not yet handle resource constraints A language (Alpha) and design methodology very different from designers habits Implementation status (research tool)

48 Some Design results Ugh compares IDCT with CoWare and Gaut but the results are highly dependent upon design parameters MMAlpha demonstrates real implementation on FPGA co-processor board (DLMS algorithm) Ck period (ns)#cycle execution Exec time (µs) Area (mm^2)Area (#inverter) Manual (time optimised) N-A242.1 CoWare Gaut Ugh tap DLMS filterAreaClk cycleSynthesis time MMAlpha2600 slices35MHz112 s

49 Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Conclusion and on-going projects

50 HLS conclusion HLS tools are not mature enough to produce the famous « C-to-VHDL » magic tool Most tool designer agree that a highly « user guided » approach is mandatory CAD tools are still actively developping tools (Mentor: Catapult-C, CoWare: Cocentric….) Some progress have been made Domain specific constraints are more clearly identified (control oriented or data flow) Interfacing is studied together with the synthesis Fast simulation is an important issue addressed by HLS tools

51 On-going project: Data-Flow IP interface Gaut (Lester) and MMAlpha (Irisa, Lip) are developing a common interface for their IPs (data-flow Ips)

52 On-going project: SocLib SocLib environment Public domain systemC simulation models for SoC IP: - Cycle-accurate hardware simulation -TLM Simulation VCI interconnection standard French open academic initiative (should become European through EuroSoc): Typical platform: prog VCI Cache MIPS RAM TTY DMA prog.c GCC-MIPS MIPS exec progboot VCI Cache MIPS VCI Cache MIPS VCI ASIC VCI Cache MIPS Bus / Network on chip (SPIN)

53 On-going project: Loop transformation for compilation Unified loop nest transformation framework for optimization of compute/data intensive programs (Alchemy Inria project: rocq.inria.fr/~acohen/software.html). rocq.inria.fr/~acohen/software.html WRaP-IT: and Open-64/ORC Interface tool

54 Thanks Slides with Help from Lester, LIP6 Here are some tools I did not talk about: Amical, Cathedral, High 2, RapidPath, Flash, A/RT, Compaan, Syndex, Phideo, Bach, SPARK, CriticalBlue, Chinook, SCE, CodeSign, Esterel, precisionC, Polis, Atomium, Ptolemy, Handel- C, Cyber, Bridge, MCSE, Madeo, SpecC, and many more …. Any Questions ?