Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
ECE 667 Synthesis and Verification of Digital Circuits
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Courseware Integer Linear Programming approach to Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Courseware Scheduling of Distributed Real-Time Systems Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.
High-Level Synthesis Algorithms. 2 Scheduling:  Inputs: − A DFG − An architecture (i.e. a set of processing elements)  Output: − Starting time of each.
Courseware List-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Register-transfer Design n Basics of register-transfer design: –data paths and controllers.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Courseware Path-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Courseware Force-Directed Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
Saman Amarasinghe ©MIT Fall 1998 Simple Machine Model Instructions are executed in sequence –Fetch, decode, execute, store results –One instruction.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to.
ECE 2372 Modern Digital System Design
Automated Design of Custom Architecture Tulika Mitra
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
HYPER: An Interactive Synthesis Environment for Real Time Applications Introduction to High Level Synthesis EE690 Presentation Sanjeev Gunawardena March.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
M. Balakrishnan Dept of Computer Science & Engg. I.I.T. Delhi
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
ECE-C662 Lecture 2 Prawat Nagvajara
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Register Transfer Specification And Design
ECE 565 High-Level Synthesis—An Introduction
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Architectural-Level Synthesis
Architecture Synthesis
HIGH LEVEL SYNTHESIS.
ICS 252 Introduction to Computer Design
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark

SoC-MOBINET coursewareM-1 High-Level Synthesis2 Hardware synthesis P1P1 P2P2 P3P3 CPU ASIC P1P1 P 2 & P 3  Starts from an abstract behavioral description  Generates an RTL description  Need to restrict the target hardware – otherwise search space is too large

SoC-MOBINET coursewareM-1 High-Level Synthesis3 Hardware synthesis P1P1 P2P2 P3P3 CPU ASIC P1P1 P 2 & P 3  How is the behavior specified?  Natural languages  C/C++  VHDL/Verilog  What is the target architecture of the ASIC?

SoC-MOBINET coursewareM-1 High-Level Synthesis4 Hardware model - components  Most synthesis systems are targeted towards synchronous hardware  Functional units:  Can perform one or more computations  Addition, multiplication, comparison, ALU, etc.  Registers:  Store inputs, intermediate results and outputs  May be organized as a register file

SoC-MOBINET coursewareM-1 High-Level Synthesis5 Hardware model - interconnection  Multiplexers:  Select one output from several inputs  Busses:  Connection shared between several components  Only one component can write data at a specific time  Exclusive writing may be controlled by tri-state drivers

SoC-MOBINET coursewareM-1 High-Level Synthesis6 Hardware model – parameters  Clocking strategy  Single or multiple phase clocks  Interconnect  Allowing or disallowing busses  Clocking of functional units  Multicycle operations  Chaining  Pipelined units

SoC-MOBINET coursewareM-1 High-Level Synthesis7 Hardware model – example

SoC-MOBINET coursewareM-1 High-Level Synthesis8 Hardware concepts  Data path  Network of functional units, registers, multiplexers and buses  Control  Takes care of having the data present at the right place at a specific time  Takes care of presenting the right instructions to a programmable unit  Often high-level synthsis concentrates on data path synthesis

SoC-MOBINET coursewareM-1 High-Level Synthesis9 Methodology implementation design specification Physical domain Mathematical domain specification create a model of the physical problem synthesis create an alogorithm to solve the problem implementation Transform the optimized model back to the physical domain

SoC-MOBINET coursewareM-1 High-Level Synthesis10 Input format  Input  Behavior described in textual form  Conventional programming language  Hardware description language (HDL)  Has to be parsed and transformed into an internal representation  Conventional compiler techniques are used

SoC-MOBINET coursewareM-1 High-Level Synthesis11 Internal representation  Data-flow graph (DFG)  Used by most systems  May or may not contain information on control flow vertex (node): represent computation edge: represent precedence relations

SoC-MOBINET coursewareM-1 High-Level Synthesis12 Data flow x := a * b; y := c + d; z := x + y; abcd x * y + z +

SoC-MOBINET coursewareM-1 High-Level Synthesis13 DFG semantics abcd x * y + z +

SoC-MOBINET coursewareM-1 High-Level Synthesis14 Exercise 1: data flow graph of DiffEq  Solve the second order differential equation  y´´ + 3zy´+ 3y = 0  Iterative solution While (z<a) { z1 := z + dz; u1 := u – (3*z*u*dz) – (3*y*dz); y1 := y + (u*dz); z := z1; u := u1; y := y1; }

SoC-MOBINET coursewareM-1 High-Level Synthesis15 Exercise 1 - result + u1 -*-*****+< udz 3 z 3 yu z y1ctrl

SoC-MOBINET coursewareM-1 High-Level Synthesis16 High-level synthesis abcd xy + z + *

SoC-MOBINET coursewareM-1 High-Level Synthesis17 High-level synthesis  Scheduling  Determine for each operation the time at which it should be performed such that no precedence contraint is violated  Allocation  Specify the hardware resources that will be necessary  Assignment  Provide a mapping from each operation to a specific functional unit and from each variable to a register

SoC-MOBINET coursewareM-1 High-Level Synthesis18 High-level synthesis  Scheduling, allocation and assignment are strongly interrelated  But are often solved separately!  Scheduling is NP-complete – heuristics have to be used!

SoC-MOBINET coursewareM-1 High-Level Synthesis19 Scheduling  Input  DFG G(V, E)  Library of ressource types R  Mapping   : V  R,  (v i ) = r  a given operation may be mapped to different ressource type, e.g. + may be performed by an adder or an ALU  execution delay:  (v i ) = d i  ressource type cost:  (r)

SoC-MOBINET coursewareM-1 High-Level Synthesis20 Scheduling  Start time of operations  T = { t i : i = 0, 1, …, n }  Scheduling is the task of determining the start times subject to the precedence constraints of the DFG   : V  Z +   v i ) = t i such that t i  t j + d j,  i, j : (v j, v i )  E  Latency: = t n – t 0  Cost of schedule:  r  R  (r) N r (  )]

SoC-MOBINET coursewareM-1 High-Level Synthesis21 Scheduling implementation specification Physical domain Mathematical domain specification synthesis implementation C program DFG Scheduling algorithm Scheduled DFG

SoC-MOBINET coursewareM-1 High-Level Synthesis22 Scheduling – ASAP  Map operations to their earliest possible start time not violating the precedence constraints  Easy and fast to compute  Find longest path in a directed acyclic graph  No attemp to optimize ressource cost  Gives the fastest possible schedule if unlimited amount of resources are available  Gives an upper bound on execution speed

SoC-MOBINET coursewareM-1 High-Level Synthesis23 ASAP algorithm For each node v i  V do if pred(v i ) = Ø then E i = 1; V = V – { v i }; else E i = 0; endif endfor While V ≠ Ø do for each node v i  V do if all_sched(pred(v i ),E) then E i = max(pred(v i ),E) + 1; V = V – { v i }; endif endfor endwhile

SoC-MOBINET coursewareM-1 High-Level Synthesis24 DiffEq + u1 -*-*****+< udz 3 z 3 yz y1ctrl udz

SoC-MOBINET coursewareM-1 High-Level Synthesis25 Exercise 2 – latency and resources  Assume:  cycle time = 25 ns  d *, d +, d -, d < = 25 ns  What is the latency of the schedule?  How many resources are needed?  How many resources are needed, if we introduce an ALU (+,-,<)  What is the latency if we have only 1 multiplier?  What is the latency if  d * = 25ns and d ALU = 12ns

SoC-MOBINET coursewareM-1 High-Level Synthesis26 Exercise 2 – result  What is the latency of the schedule?  4*25ns = 100ns  How many resources are needed?  4*, 1+, 1-, 1<  How many resources are needed, if we introduce an ALU (+,-,<)  4*, 2ALU  What is the latency if we have only 1 multiplier?  7*25ns = 175ns  What is the latency if  d * = 25ns and d ALU = 12ns  3*25ns = 75ns (operator chaining)

SoC-MOBINET coursewareM-1 High-Level Synthesis27 Scheduling - ALAP + u1 - * - * ** * * +< udz 3 z 3 yz y1ctrl udz

SoC-MOBINET coursewareM-1 High-Level Synthesis28 Scheduling – ALAP  Map operations to their latest possible start time not violating the precedence constraints  Needs a latency constraint  Easy and fast to compute  Find longest path in a directed acyclic graph  No attemp to optimize ressource cost

SoC-MOBINET coursewareM-1 High-Level Synthesis29 Scheduling – ASAP / ALAP  Are ASAP and ALAP useful?   ASAP  v i ) = E i   ALAP  v i ) = L i  Operator flexibility = L i – E i  Also known as mobility  Mobility = 0  operator has to be scheduled at E i  otherwise latency constraint is violated  Mobility > 0 gives scheduling freedom

SoC-MOBINET coursewareM-1 High-Level Synthesis30 Scheduling – list based  Generalization of ASAP  Priority-list of ready nodes  A ready node is an operator that has all predecessors already scheduled  The priority-list is always sorted with respect to a priority function

SoC-MOBINET coursewareM-1 High-Level Synthesis31 List scheduling algorithm ins_ready_ops(V,PList r 1, PList r 2,…, PList r m ); Cstep = 0; While ((PList r 1  Ø) or … or ((PList r m  Ø)) do Cstep = Cstep + 1; for k = 1 to m do for funit = 1 to N k do if PList r k  Ø then schdule_op(first(Plist r k ),Cstep); Plist r k = delete(Plist r k,first(Plist r k )); endif endfor ins_ready_ops(V,PList r 1, PList r 2,…, PList r m ); endwhile

SoC-MOBINET coursewareM-1 High-Level Synthesis32 DiffEq + u1 -*-*****+< udz 3 z 3 yz y1ctrl udz [h,0] [g,0] [a,0][b,0] [e,0][f,1] [c,1][d,2] [i,2] [k,2] [j,2] Plist * : Plist + : Plist - : Plist < : a,b,c,d j Ø Ø ** + -< ** c,d e,c,d * Ø k * * * Ø f,d g

SoC-MOBINET coursewareM-1 High-Level Synthesis33 List scheduling  Priority may be based on other measures than mobility  Length of longest path to a node with no immediate successor  Number of immediate successor nodes  High number means high priority