HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof.

Slides:



Advertisements
Similar presentations
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
Advertisements

ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
Berlin, Germany – January 21st, 2013 A2B: A F RAMEWORK FOR F AST P ROTOTYPING OF R ECONFIGURABLE S YSTEMS Christian Pilato, R. Cattaneo, G. Durelli, A.A.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
RTL Processor Synthesis for Architecture Exploration and Implementation Schliebusch, O. Chattopadhyay, A. Leupers, R. Ascheid, G. Meyr, H. Steinert, M.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Dr. Turki F. Al-Somani VHDL synthesis and simulation – Part 3 Microcomputer Systems Design (Embedded Systems)
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
University of Michigan Electrical Engineering and Computer Science 1 Integrating Post-programmability Into the High-level Synthesis Equation* Scott Mahlke.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
High Performance, Pipelined, FPGA-Based Genetic Algorithm Machine A Review Grayden Smith Ganga Floora 1.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
Automated Design of Custom Architecture Tulika Mitra
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Optimization Problems
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
A High-Level Synthesis Flow for Custom Instruction Set Extensions for Application-Specific Processors Asia and South Pacific Design Automation Conference.
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
Digital Design Using VHDL and PLDs ECOM 4311 Digital System Design Chapter 1.
Meenakshi Kaul, Vinoo Srinivasan, Sriram Govindarajan, Iyad Ouaiss, and Ranga Vemuri University of Cincinnati
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
FPGA: Real needs and limits
IP – Based Design Methodology
Objective of This Course
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
VHDL Introduction.
Architecture Synthesis
Presentation transcript:

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof. Fabrizio FERRANDI Correlatore: Ing. Antonino TUMEO Politecnico di Milano

Summary 2 Outlines  High-Level Synthesis  Problem definition  Open problems  Genetic algorithm  Overview  Proposed methodology  High-Level Synthesis flow  An illustrative example  Design space exploration with genetic algorithm  Experimental results  Some further extensions…  Conclusion and future works

High-Level Synthesis – Problem definition 3 High-Level Synthesis Goal: minimize some figures of merit (area, latency, etc.), also called objectives Inputs: behavioral description (in C language) set of constraints library of different types of resources Output: register-transfer level (RTL) design in a hardware description language (e.g. SystemC, VHDL and Verilog) “High-Level Synthesis means going from an algorithmic level specification of a behaviour of a digital system to a register level structure that implements that behavior”. McFarland, et al., Proc. IEEE, February 1990.

High-Level Synthesis – Problem description 4 High-Level Synthesis tasks Three main tasks: 1.operation scheduling: when operations start their execution 2.resource allocation and binding: where operations are executed (hardware components), where values are stored and how elements are interconnected. 3.controller synthesis: how operations are issued Behavioral specification Design constraints Resource Library Datapath & Controller Objectives Scheduling AllocationBinding Controller Synthesis High-Level Synthesis tool

High-Level Synthesis – Problem description 5 What are the problems? All the sub-problems belong to a class of problems known as NP- Complete: they are difficult to be solved (can not be efficiently solved).  So it gives us an excuse to explore other options to try to efficiently but sub-optimally solve the problem. Traditional approaches are oriented to optimize latency and area occupation. Particular attention only to functional units and registers area.  Recent studies have demonstrated that interconnection costs have to be taken into account. D. Chen and J. Cong, “ Register binding and port assignment for multiplexer optimization ”, ASP-DAC ’ 04: Proceedings of the 2004 conference on Asia South Pacific design automation, pp. 68 – 73, 2004.

High-Level Synthesis – Problem description 6 What are the problems? (cont’d) All the sub-tasks are NP-complete problems, so there are not efficient algorithms to solve them All the tasks are closely interdependent and they should have to be considered together. Most of information are available only at the end of the synthesis Genetic algorithms Try in using nondeterministic approaches with feedback information Multi-objective optimization suggest not to reduce the fitness function to a weighted average. Non-dominated Sorting Genetic Algorithm (NSGA-II) K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA- II,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.

Genetic algorithm overview 7 Genetic algorithm Chromosome encoding and fitness evaluation are usally the only elements that depends on the problem

The proposed High-Level Synthesis flow 8 High-Level Synthesis Flow The proposed flow is organized as follows:  From C to intermediate representation from GIMPLE to produce graph representation  High-Level Synthesis Flow Partial binding and Scheduling Finite State Machine creation Register allocation Interconnection allocation Performance and area estimations  From datastructures to structural description in form of graph representation  From intermediate representation to Hardware Description Language (e.g. Verilog) ready for low-level synthesis

High-Level Synthesis and Design Space Exploration 9 The proposed methodology

1. Partial binding and Scheduling 10 Partial Binding and Scheduling Partial binding: force an operation to be executed on a selected functional unit instance β ( +2 ) = [ADDER,1] A technique introduced to partially control the final area occupation It can control scheduling, register allocation and interconnection allocation Scheduling: a cycle step is assigned to each operation to be executed for starting execution Different algorithms are able to support partial binding feature for scheduling: Integer Linear Programming formulation List Based algorithm Different solutions based on selected algorithm

2. Finite State Machine creation 11 Finite State Machine creation Scheduled specification gives information about operations that have to be executed in the same control step. This information is useful for: Register allocation Controller synthesis Finite State Machine model is a good representation for this situation. Moore model has been implemented for its natural correspondence with the problem State Transition Graph is created on scheduled specification

3. Register allocation 12 Register allocation Storage elements have to be allocated to store values across the cycle step boundaries A compiler approach has been implemented: Dataflow equations to compute liveness information Conflict graph creation based on liveness information Vertex coloring heuristic to minimize number of registers Compilers uses Control Flow Graph. It does not exploit parallelism (concurrent execution of operations) This approach uses State Transition Graph as base for dataflow analysis. It represent the control flow and it contains information about concurrent operations

4-5. Interconnection allocation and result estimations 13 The final steps… C. Brandolese, W. Fornaciari, and F. Salice. “ An Area Estimation Methodology for FPGA Based Designs at SystemC-Level ”, ASP-DAC ’ 04: Proceedings of the 2004 conference on Asia South Pacific design automation, pp. 129 – 132, Interconnection optimization Port swapping for commutative operations Boolean logic for decoding and selection Truth table are created from enables coming from controller to select right inputs of multiplexers Final structural description is now available and real values could be retrieved from low-level synthesis: too slow! Estimation model Interconnections and glue logic are difficult to be estimated, due to optimization made by synthesis tool Used an existing model for estimations of area occupied by solutions Linear regression to update coefficients due to different representation, synthesis tool and target devices

Example 14 An illustrative example

Design Space Exploration by Genetic Algorithm 15 Problem dependent elements Each operation in the behavioral specification has a gene associated to represent a feasible partial binding Additional genes are added to represent algorithms which the high-level synthesis sub-tasks are performed with. The implementation provides genes for: Scheduling Register allocation Interconnection allocation Fitness values is obtained retrieving information from chromosome, using them as partial binding and starting a synthesis flow. Then results are estimated using models and values returned to algorithm

Design Space Exploration by Genetic Algorithm 16 Problem independent elements Common generic operators can be used without modifications: Uniform crossover Uniform mutation If the gene changed by operators is related to: an operation: the result is a new binding for that operation. an algorithm: the algorithm used to solve the related synthesis step is changed. The initial population is created by random generation or by starting from a first admissible binding. It allows the algorithm to start from some interesting points (e.g. minimum number of functional units or minimum latency) and the explore around them. Solutions are sorted into different levels according to their fitness values. The ranking has been accelerated using the fast-non-dominated-sort algorithm available in the NSGA-II algorithm.

Experimental results 17 Experimental results Development framework Implemented and integrated in a C++ framework, named PandA (an open- source framework covering different aspects of the hardware-software design of embedded systems). Open BEAGLE framework has been used for evolutionary computation. Functional validation A simple question: does it really works? The final RTL design really implements the behavioral specification? Verilog and C can be integrated and a simple regression test can be used for functional validation Area model validation Comparison between evaluations and values coming from low-level synthesis average error equal 3.7% and maximum error equal 14% These values can be effectively used as fitness values

Experimental results 18 Experimental results Design Space Exploration validation population size of individuals, evolving up to a maximum of 200 generations has been revealed to be the better trade-off between overall execution time and solution quality. Better approach than existing tools to deal with area constraints It can explore both the fastest solution (with unconstrained number of resources) and the minimal area solution, while covering a good number of solutions in between as trade-offs Paper accepted for publication: Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs” Conference: International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Samos, Greece, July 2007

Conclusion 19 Some features just provided… Paper submitted to conference: Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis” Conference: IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September Extension provided after the thesis has been written Weighted clique covering for register allocation: Compatibility edges are weighted. An higher weight is assigned when the two values involve the same functional units (it could be reduce interconnections) Branch and bound approach to solve clique covering on a weighted graph; results show a further reduction of overall area up to 10%. Fitness inheritance substituting a fraction of expensive real evaluations with an estimation based on neighbors in an hypothetical design space: created a model for inheritance. it is able to reduce overall execution time over by 25%, without any substantial difference in the final Pareto-optimal solution set.

Conclusion and future works 20 Conclusion and future works The main contributions from this thesis are: An high-level synthesis flow from C specification to HDL description, simulable and synthetizable Area model for fast estimation of synthesis results Design Space Exploration using a genetic algorithm that integrate the synthesis flow and the area model estimation to lead the evolution to good design solution, taking into account all elements that contribute to area and time in final designes Future works: Optimize the synthesis flow and provide an efficient support to particular costructs (e.g. loops and memory access) Reduce the overall execution time of the proposed methodology (fitness inheritance or initialize population based on an assigned given Pareto-solution set) Refine the area model and allow to specialize it for different target devices on- the-fly (e.g. parameters stored in an external file and loaded at start-up)