HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof.

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
Berlin, Germany – January 21st, 2013 A2B: A F RAMEWORK FOR F AST P ROTOTYPING OF R ECONFIGURABLE S YSTEMS Christian Pilato, R. Cattaneo, G. Durelli, A.A.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Introductory Comments Regarding Hardware Description Languages.
RTL Processor Synthesis for Architecture Exploration and Implementation Schliebusch, O. Chattopadhyay, A. Leupers, R. Ascheid, G. Meyr, H. Steinert, M.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Dr. Turki F. Al-Somani VHDL synthesis and simulation – Part 3 Microcomputer Systems Design (Embedded Systems)
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
University of Michigan Electrical Engineering and Computer Science 1 Integrating Post-programmability Into the High-level Synthesis Equation* Scott Mahlke.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
High Performance, Pipelined, FPGA-Based Genetic Algorithm Machine A Review Grayden Smith Ganga Floora 1.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 H ardware D escription L anguages Modeling Digital Systems.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
VHDL IE- CSE. What do you understand by VHDL??  VHDL stands for VHSIC (Very High Speed Integrated Circuits) Hardware Description Language.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
1 Formal Verification of Candidate Solutions for Evolutionary Circuit Design (Entry 04) Zdeněk Vašíček and Lukáš Sekanina Faculty of Information Technology.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
ECE-C662 Lecture 2 Prawat Nagvajara
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
A High-Level Synthesis Flow for Custom Instruction Set Extensions for Application-Specific Processors Asia and South Pacific Design Automation Conference.
5-1 Logic System Design I VHDL Design Principles ECGR2181 Reading: Chapter 5.0, 5.1, 5.3 port ( I: in STD_LOGIC_VECTOR (1 to 9); EVEN, ODD: out STD_LOGIC.
EE121 John Wakerly Lecture #17
Digital Design Using VHDL and PLDs ECOM 4311 Digital System Design Chapter 1.
Genetic algorithms: A Stochastic Approach for Improving the Current Cadastre Accuracies Anna Shnaidman Uri Shoshani Yerach Doytsher Mapping and Geo-Information.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
Advanced Algorithms Analysis and Design
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Zdeněk Vašíček and Lukáš Sekanina
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
IP – Based Design Methodology
Reconfigurable Computing
Objective of This Course
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
VHDL Introduction.
Architecture Synthesis
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Presentation transcript:

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof. Fabrizio FERRANDI Correlatore: Ing. Antonino TUMEO Politecnico di Milano

Summary 2 Outlines  High-Level Synthesis  Proposed methodology  Experimental results  Some further extensions…  Conclusion and future works

High-Level Synthesis – Problem description 3 High-Level Synthesis Behavioral specification Design constraints Resource Library Datapath & Controller Objectives Scheduling AllocationBinding Controller Synthesis High-Level Synthesis tool “High-Level Synthesis means going from an algorithmic level specification of a behaviour of a digital system to a register level structure that implements that behavior”. McFarland, et al., Proc. IEEE, February 1990.

High-Level Synthesis – Problem description 4 What are the problems? All the sub-tasks are NP-complete: no efficient algorithms Interconnections have to be considered: up to 80% of final area All the tasks are closely interdependent Most of information are available only at the end of the synthesis Genetic algorithms Try non-deterministic approaches with feedback information Multi-objective optimization: reducing to single-objective (weighted average) is not efficient Non-dominated Sorting Genetic Algorithm (NSGA-II) K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA- II,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.

High-Level Synthesis and Design Space Exploration 5 The proposed methodology

Experimental results 6 Development framework Integrated in the PandA framework  an open-source C++ framework covering different aspects of the hardware-software design of embedded systems Evolutionary computation with Open BEAGLE framework Functional validation Comparison between Verilog and C simulations Estimation model validation Comparison between estimations and logic synthesis values average error equal 4.02 % standard deviance equal 2.82 % maximum error less than 10 % These values can be effectively used as fitness values

Experimental results 7 Design Space Exploration validation Population size of individuals, evolving up to a maximum of 200 generations  the best trade-off between overall execution time and solution quality. Considerations: It takes into account all elements in the design solution It can cover a good number of trade-offs between the fastest solution and the minimal area solution Better approach than existing tools to deal with area constraints Paper accepted for publication at International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Samos, Greece, July 2007 Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs”

Some extensions… 8 Some features just provided… Paper submitted to IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis” Weighted clique covering: in register allocation to reduce interconnections An higher weight is assigned to compatibility edge when the two values involve the same functional units Clique covering on a weighted graphs; results show a further reduction of overall area up to 10%. Fitness inheritance: to reduce overall execution time A fraction of expensive real evaluations is substituted with an estimation based on similar evaluated individuals It is able to reduce overall execution time over by 25% No substantial difference in the final Pareto-optimal solution

Conclusion and future works 9 The main contributions from this thesis are: An high-level synthesis flow from C specifications to HDL descriptions Integration of a model for fast estimation of synthesis results Design space exploration with a genetic algorithm: It takes into account all elements composing the design solution High fitting with real values Multi-objective concurrent optimization Future works: Optimize the results coming from the synthesis flow Further reduce the overall execution time of the proposed methodology Refine the estimation model and specialize it for different targets

Christian PILATO Matr. n Thank you!

High-Level Synthesis – Problem definition 2 High-Level Synthesis Inputs: behavioral description (e.g. C language) library of different types of resources set of constraints Output: register-transfer level (RTL) design in a hardware description language (e.g. SystemC, VHDL or Verilog) Problem: design and implementation of digital circuits Goal: minimize some figures of merit (area, performance, etc.), also called objectives

High-Level Synthesis – Problem definition 2 Objective vs. Constraint  a constraint is a design target that must be meet in order for the design to be considered successful (e.g. the area occupied has to be less than the maximum allowed)  an objective is a design target where more (or less) is better (e.g. the area occupied has to be reduced) The constraint has to be respected, otherwise the design solution is wrong (e.g. the solution occupies more than area available) An objective has to be optimized, otherwise the design solution is simply bad (e.g. the solution occupies a large area)

High-Level Synthesis – Problem description 2 Problem description Three main tasks: 1.operation scheduling: provides the cycle steps in which operations start their execution.  It determines when the operations are executed 2.resource allocation: is concerned with assigning operations and values to hardware components and interconnecting them using connection elements.  It determines where operation are executed, where values are stored and how elements are interconnected 3.controller synthesis: provides the logic to issue datapath operations, based on the control flow  It determines which operations are executed based on control flow

High-Level Synthesis – Problem description 2 NP-Completeness It is desirable to develop polynomial-time algorithms to optimally solve each sub-task Unfortunately the problems can not be optimally solved in polynomial time They belong to a class of problems known as NP-Complete  a set of problems that are difficult to be solved (can not be efficiently solved typically for large size input) Also when we prove a problem is NP-Complete, we know that solving the problem (usually for large size and for generic input) is very hard. So it gives us an excuse to explore other options to try to efficiently but sub-optimally solve the problem.

High-Level Synthesis – Problem description 2 Problem Interdependence Traditional approaches optimize latency and area occupation. Functional units and registers area were considered to be reduced Recent studies have demonstrated that interconnection costs have to be taken into account since area of multiplexers and interconnection elements has by far outweighed area of functional units and registers D. Chen and J. Cong, “ Register binding and port assignment for multiplexer optimization ”, ASP-DAC ’ 04: Proceedings of the 2004 conference on Asia South Pacific design automation, pp. 68 – 73, 2004.

The proposed High-Level Synthesis flow 5 High-Level Synthesis Flow The proposed flow is organized as follows:  From C to intermediate representation from GIMPLE to produce graph representation  High-Level Synthesis Flow 1. Partial binding and Scheduling 2. Finite State Machine creation 3. Register allocation 4. Interconnection allocation 5. Performance and area estimations  From data structures to intermediate representation in form of graph  From intermediate representation to Hardware Description Language (e.g. Verilog) ready for logic synthesis

1. Partial binding and Scheduling 6 Partial Binding and Scheduling Partial binding: force an operation to be executed on a selected functional unit instance β (+1) = A technique introduced to partially control the final area occupation It can affect scheduling, register allocation and interconnection allocation Scheduling: assign a starting control step to each operation to be executed Many scheduling algorithms are able to support partial binding (Integer Linear Programming formulation, list based algorithm, etc.) Different solutions based on the selected algorithm

2. Finite State Machine creation 7 Finite State Machine creation Scheduling gives information about operations that have to be executed in the same control step. This information is useful for:  Register allocation  Controller synthesis Finite State Machine is a good model to represent this information The Moore-FSM model has been implemented for its natural correspondence with the problem State Transition Graph is created on scheduled specification  At each cycle step, the system evolves into the next state;  Conditional operations create bifurcation based on evaluated conditional values

Index 2 Finite State Machine creation A finite-state machine can be described by: a set of primary inputs X a set of primary outputs Y a set of states S a state transition function an output function (Moore Model) Its corresponding graph-based representation is the state transition diagram. The state transition diagram is a labeled directed multi-graph G(V,E), where the vertex set V is in one-to-one correspondence with the state set S and the directed edge set E is in one-to-one correspondence with the transitions specified by Concurrent operations can represent states for the finite state machine model The finite state machine graph can be created starting from scheduled graph

3. Register allocation 8 Register allocation Register allocation: allocate elements to store values across cycle step boundaries A compiler approach has been implemented:  Liveness analysis based on dataflow equations  Conflict graph creation based on liveness information  Different heuristics to minimize number of registers: vertex coloring, clique covering, left-edge algorithm, etc. Compilers uses Control Flow Graph. It is not able to represent concurrent execution of operations This approach uses State Transition Graph as base for dataflow analysis. It represent the control flow and the concurrent operations

4-5. Interconnection allocation and result estimations 8 The final steps… *: C. Brandolese, W. Fornaciari, and F. Salice. “ An Area Estimation Methodology for FPGA Based Designs at SystemC-Level ”, DAC '04: Proceedings of the 41st annual conference on Design automation, pp. 129 – 132, Interconnection allocation: allocate elements to interconnect the hardware components Mux-based architecture: port swapping for commutative operations Glue logic: represent logic netlist to decode commands and select inputs Truth tables based on signals from controller The RTL structural description is now available and it considers all elements. Objective values could be retrieved from logic synthesis too slow! Estimation model: perform fast estimations of objective values. Area is difficult to be estimated Updated and used an existing area model*

Genetic algorithms 2 Genetic algorithm Chromosome encoding and fitness evaluation are usally the only elements that depends on the problem

Design Space Exploration by Genetic Algorithm 10 Problem dependent elements Chromosome encoding Each operation in the specification has a gene to represent a feasible partial binding Genes are added to represent algorithms used to perform high-level synthesis steps: scheduling, register allocation and interconnection optimization Fitness Evaluation Information from chromosome about partial binding and algorithms are used to perform a synthesis flow. Objective values are estimated using the proposed model

Design Space Exploration by Genetic Algorithm 11 Problem independent elements Generic operators common operators (crossover and mutation) used without modifications: no unfeasible chromosomes can be created. If the gene changed by operators is related to: operation: a new binding constraint for that operation. algorithm: a different algorithm to solve the related synthesis step Initial population created by random or starting from some interesting points to explore around them. Solution ranking ranking into different levels according to their fitness values. accelerated using the fast-non-dominated-sort algorithm available in the NSGA-II