Presentation is loading. Please wait.

Presentation is loading. Please wait.

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof.

Similar presentations


Presentation on theme: "HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof."— Presentation transcript:

1 HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof. Fabrizio FERRANDI Correlatore: Ing. Antonino TUMEO Politecnico di Milano

2 Summary 2 Outlines  High-Level Synthesis  Proposed methodology  Experimental results  Some further extensions…  Conclusion and future works

3 High-Level Synthesis – Problem description 3 High-Level Synthesis Behavioral specification Design constraints Resource Library Datapath & Controller Objectives Scheduling AllocationBinding Controller Synthesis High-Level Synthesis tool “High-Level Synthesis means going from an algorithmic level specification of a behaviour of a digital system to a register level structure that implements that behavior”. McFarland, et al., Proc. IEEE, February 1990.

4 High-Level Synthesis – Problem description 4 What are the problems? All the sub-tasks are NP-complete: no efficient algorithms Interconnections have to be considered: up to 80% of final area All the tasks are closely interdependent Most of information are available only at the end of the synthesis Genetic algorithms Try non-deterministic approaches with feedback information Multi-objective optimization: reducing to single-objective (weighted average) is not efficient Non-dominated Sorting Genetic Algorithm (NSGA-II) K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA- II,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.

5 High-Level Synthesis and Design Space Exploration 5 The proposed methodology

6 Experimental results 6 Development framework Integrated in the PandA framework  an open-source C++ framework covering different aspects of the hardware-software design of embedded systems Evolutionary computation with Open BEAGLE framework Functional validation Comparison between Verilog and C simulations Estimation model validation Comparison between estimations and logic synthesis values average error equal 4.02 % standard deviance equal 2.82 % maximum error less than 10 % These values can be effectively used as fitness values

7 Experimental results 7 Design Space Exploration validation Population size of 1.000 individuals, evolving up to a maximum of 200 generations  the best trade-off between overall execution time and solution quality. Considerations: It takes into account all elements in the design solution It can cover a good number of trade-offs between the fastest solution and the minimal area solution Better approach than existing tools to deal with area constraints Paper accepted for publication at International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Samos, Greece, July 2007 Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs”

8 Some extensions… 8 Some features just provided… Paper submitted to IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September 2007. Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis” Weighted clique covering: in register allocation to reduce interconnections An higher weight is assigned to compatibility edge when the two values involve the same functional units Clique covering on a weighted graphs; results show a further reduction of overall area up to 10%. Fitness inheritance: to reduce overall execution time A fraction of expensive real evaluations is substituted with an estimation based on similar evaluated individuals It is able to reduce overall execution time over by 25% No substantial difference in the final Pareto-optimal solution

9 Conclusion and future works 9 The main contributions from this thesis are: An high-level synthesis flow from C specifications to HDL descriptions Integration of a model for fast estimation of synthesis results Design space exploration with a genetic algorithm: It takes into account all elements composing the design solution High fitting with real values Multi-objective concurrent optimization Future works: Optimize the results coming from the synthesis flow Further reduce the overall execution time of the proposed methodology Refine the estimation model and specialize it for different targets

10 Christian PILATO Matr. n. 674373 Thank you!

11 High-Level Synthesis – Problem definition 2 High-Level Synthesis Inputs: behavioral description (e.g. C language) library of different types of resources set of constraints Output: register-transfer level (RTL) design in a hardware description language (e.g. SystemC, VHDL or Verilog) Problem: design and implementation of digital circuits Goal: minimize some figures of merit (area, performance, etc.), also called objectives

12 High-Level Synthesis – Problem definition 2 Objective vs. Constraint  a constraint is a design target that must be meet in order for the design to be considered successful (e.g. the area occupied has to be less than the maximum allowed)  an objective is a design target where more (or less) is better (e.g. the area occupied has to be reduced) The constraint has to be respected, otherwise the design solution is wrong (e.g. the solution occupies more than area available) An objective has to be optimized, otherwise the design solution is simply bad (e.g. the solution occupies a large area)

13 High-Level Synthesis – Problem description 2 Problem description Three main tasks: 1.operation scheduling: provides the cycle steps in which operations start their execution.  It determines when the operations are executed 2.resource allocation: is concerned with assigning operations and values to hardware components and interconnecting them using connection elements.  It determines where operation are executed, where values are stored and how elements are interconnected 3.controller synthesis: provides the logic to issue datapath operations, based on the control flow  It determines which operations are executed based on control flow

14 High-Level Synthesis – Problem description 2 NP-Completeness It is desirable to develop polynomial-time algorithms to optimally solve each sub-task Unfortunately the problems can not be optimally solved in polynomial time They belong to a class of problems known as NP-Complete  a set of problems that are difficult to be solved (can not be efficiently solved typically for large size input) Also when we prove a problem is NP-Complete, we know that solving the problem (usually for large size and for generic input) is very hard. So it gives us an excuse to explore other options to try to efficiently but sub-optimally solve the problem.

15 High-Level Synthesis – Problem description 2 Problem Interdependence Traditional approaches optimize latency and area occupation. Functional units and registers area were considered to be reduced Recent studies have demonstrated that interconnection costs have to be taken into account since area of multiplexers and interconnection elements has by far outweighed area of functional units and registers D. Chen and J. Cong, “ Register binding and port assignment for multiplexer optimization ”, ASP-DAC ’ 04: Proceedings of the 2004 conference on Asia South Pacific design automation, pp. 68 – 73, 2004.

16 The proposed High-Level Synthesis flow 5 High-Level Synthesis Flow The proposed flow is organized as follows:  From C to intermediate representation from GIMPLE to produce graph representation  High-Level Synthesis Flow 1. Partial binding and Scheduling 2. Finite State Machine creation 3. Register allocation 4. Interconnection allocation 5. Performance and area estimations  From data structures to intermediate representation in form of graph  From intermediate representation to Hardware Description Language (e.g. Verilog) ready for logic synthesis

17 1. Partial binding and Scheduling 6 Partial Binding and Scheduling Partial binding: force an operation to be executed on a selected functional unit instance β (+1) = A technique introduced to partially control the final area occupation It can affect scheduling, register allocation and interconnection allocation Scheduling: assign a starting control step to each operation to be executed Many scheduling algorithms are able to support partial binding (Integer Linear Programming formulation, list based algorithm, etc.) Different solutions based on the selected algorithm

18 2. Finite State Machine creation 7 Finite State Machine creation Scheduling gives information about operations that have to be executed in the same control step. This information is useful for:  Register allocation  Controller synthesis Finite State Machine is a good model to represent this information The Moore-FSM model has been implemented for its natural correspondence with the problem State Transition Graph is created on scheduled specification  At each cycle step, the system evolves into the next state;  Conditional operations create bifurcation based on evaluated conditional values

19 Index 2 Finite State Machine creation A finite-state machine can be described by: a set of primary inputs X a set of primary outputs Y a set of states S a state transition function an output function (Moore Model) Its corresponding graph-based representation is the state transition diagram. The state transition diagram is a labeled directed multi-graph G(V,E), where the vertex set V is in one-to-one correspondence with the state set S and the directed edge set E is in one-to-one correspondence with the transitions specified by Concurrent operations can represent states for the finite state machine model The finite state machine graph can be created starting from scheduled graph

20 3. Register allocation 8 Register allocation Register allocation: allocate elements to store values across cycle step boundaries A compiler approach has been implemented:  Liveness analysis based on dataflow equations  Conflict graph creation based on liveness information  Different heuristics to minimize number of registers: vertex coloring, clique covering, left-edge algorithm, etc. Compilers uses Control Flow Graph. It is not able to represent concurrent execution of operations This approach uses State Transition Graph as base for dataflow analysis. It represent the control flow and the concurrent operations

21 4-5. Interconnection allocation and result estimations 8 The final steps… *: C. Brandolese, W. Fornaciari, and F. Salice. “ An Area Estimation Methodology for FPGA Based Designs at SystemC-Level ”, DAC '04: Proceedings of the 41st annual conference on Design automation, pp. 129 – 132, 2004. Interconnection allocation: allocate elements to interconnect the hardware components Mux-based architecture: port swapping for commutative operations Glue logic: represent logic netlist to decode commands and select inputs Truth tables based on signals from controller The RTL structural description is now available and it considers all elements. Objective values could be retrieved from logic synthesis too slow! Estimation model: perform fast estimations of objective values. Area is difficult to be estimated Updated and used an existing area model*

22 Genetic algorithms 2 Genetic algorithm Chromosome encoding and fitness evaluation are usally the only elements that depends on the problem

23 Design Space Exploration by Genetic Algorithm 10 Problem dependent elements Chromosome encoding Each operation in the specification has a gene to represent a feasible partial binding Genes are added to represent algorithms used to perform high-level synthesis steps: scheduling, register allocation and interconnection optimization Fitness Evaluation Information from chromosome about partial binding and algorithms are used to perform a synthesis flow. Objective values are estimated using the proposed model

24 Design Space Exploration by Genetic Algorithm 11 Problem independent elements Generic operators common operators (crossover and mutation) used without modifications: no unfeasible chromosomes can be created. If the gene changed by operators is related to: operation: a new binding constraint for that operation. algorithm: a different algorithm to solve the related synthesis step Initial population created by random or starting from some interesting points to explore around them. Solution ranking ranking into different levels according to their fitness values. accelerated using the fast-non-dominated-sort algorithm available in the NSGA-II


Download ppt "HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof."

Similar presentations


Ads by Google