HW/SW Synthesis. 2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s.

HW/SW Synthesis

2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s Compilation

3 Aptix Board Consists of – micro of choice – FPGA’s – FPIC’s Aptix Board Consists of – micro of choice – FPGA’s – FPIC’s POLIS Methodology Graphical EFSM+ Esterel Graphical EFSM+ Esterel Java................ CFSMs Partitioning SW Synthesis SW Code + RTOS Logic Netlist HW Synthesis SW Estimation HW Estimation Physical Prototyping HW/SW Co-Simulation Performance/trade-off Evaluation Formal Verification Compilers EC

4 Hardware - Software Architecture u Hardware: s Currently: t Programmable processors (micro-controllers, DSPs) t ASICs (FPGAs) u Software: s Set of concurrent tasks s Customized Real-Time Operating System u Interfaces: s Hardware modules s Software procedures (polling, interrupt handlers,...)

5 System Partitioning CFSM1 CFSM7 CFSM6 CFSM5 CFSM4 CFSM3 CFSM2 e2 e8 e3 e2 e1 e9 e3 e5 e7 e9 port5 port1 port2 port3 HW partition 1 HW partition2 SW partition 3 Scheduler port6 port7

6 Software Synthesis u Two-level process s “Technology” (processor) independent: t best decision/assignment sequence given CFSM s “Technology” (processor) dependent: t conversion into machine code u instruction selection u instruction scheduling u register assignment (currently left to compiler) * need performance and cost analysis u Worst Case Execution Time u code and data size

7 Software Synthesis u Technology-independent phase: s Construction of Control-Data Flow Graph from CFSM (based on BDD representation of Transition Function) s Optimization of CDFG for t execution speed t code size (based on BDD sifting algorithm) u Technology-dependent phase: s Creation of (restricted) C code s Cost and performance analysis s Compilation

8 Software Implementation Problem u Input: s Set of tasks (specified by CFSMs) s Set of timing constraints (e.g., input event rates and response constraints) u Output: s Set of procedures that implement the tasks s Scheduler that satisfies the timing constraints u Minimizing: s CPU cost s Memory size s Power, etc.

9 Software Implementation u How to do it ? u Traditional approach: s Hand-coding of procedures s Hand-estimation of timing input to scheduling algorithms u Long and error-prone u Our approach: three-step automated procedure: s Synthesize each task separately s Extract (estimated) timing s Schedule the tasks u Customized RT-OS (scheduler + drivers)

10 Software Implementation u Current strategy: s Iterate between synthesis, estimation and scheduling s Designer chooses the scheduling algorithm u Future work: s Top-down propagation of timing constraints s Software synthesis under constraints s Automated scheduling selection (based on CPU utilization estimates)

11 Software Synthesis Procedure Specification, partitioning S-graph synthesis Timing estimation Scheduling, validation not feasible Code generation Compilation Testing, validation Production pass fail

12 Task Implementation u Goal: quick response time, within timing and size constraints u Problem statement: s Given a CFSM transition function and constraints s Find a procedure implementing the transition function while meeting the constraints u The procedure code is acyclic: s Powerful optimization and analysis techniques s Looping, state storage etc. are implemented outside (in the OS)

13 SW Modeling Issues u The software model should be: s Low-level enough to allow detailed optimization and estimation s High-level enough to avoid excessive details e.g. register allocation, instruction selection u Main types of “user-mode” instructions: s Data movement s ALU s Conditional/unconditional branches s Subroutine calls u RTOS handles I/O, interrupts and so on

14 SW Modeling Issues u Focus on control-dominated applications s Address only CFSM control structure optimization s Data path left as “don’t touch” u Use Decision Diagrams (Bryant ‘86) s Appropriate for control-dominated tasks s Well-developed set of optimization techniques s Augmented with arithmetic and Boolean operators, to perform data computations

15ROBDDs Reduced Ordered BDDs [Bryant 86] A node represents a function given by the Shannon decomposition f = x f x + x f x Variable appears once on any path from root to terminal Variables are ordered No two vertices represent the same function Canonical Two functions are equal if and only if their BDDs are isomorphic Þ direct application in equivalence checking f = x 1 + x 2 x 3 x1x1x1x1 x2x2x2x2 x3x3x3x3 1 0       ROBDD x1x1x1x1 f x1x1x1x1 f

16 ROBDDs and Combinational Verification u Given two circuits: s Build the ROBDDs of the outputs in terms of the primary inputs s Two circuits are equivalent if and only if the ROBDDs are isomorphic u Complexity of verification depends on the size of ROBDDs s Compact in many cases

17 ROBDDs and Memory Explosion u ROBDDs are not always compact s Size of an ROBDD can be exponential in number of variables s Can happen for real life circuits also t e.g. Multipliers Commonly known as: Memory Explosion Problem of ROBDDs

18 Technique for Handling ROBDD Memory Explosion ROBDDs Enhancements VariableOrdering Free BDDs OFDDs,OKFDDs PartitionedROBDDs RelaxOrdering Node Node Decomp. Decomp. Partitioning All the representations are canonical  combinational equivalence checking

19 b3b3b3b3 Handling Memory Explosion: Variable Ordering u BDD size very sensitive to variable ordering a 1 b 1 + a 2 b 2 + a 3 b 3 a1a1a1a1 b1b1b1b1 a2a2a2a2 b2b2b2b2 a3a3a3a3 b3b3b3b3 1 0 Good Ordering: 8 nodes 1 0 Bad Ordering: 16 nodes   a1a1a1a1 a2a2a2a2 a2a2a2a2 a3a3a3a3 a3a3a3a3 a3a3a3a3 a3a3a3a3 b1b1b1b1 b1b1b1b1 b1b1b1b1 b1b1b1b1 b2b2b2b2 b2b2b2b2

20 a1a1a1a1 b1b1b1b1 a2a2a2a2 b2b2b2b2 1 0   Handling Memory Explosion: Variable Ordering l Good static as well as dynamic ordering techniques exist l Dynamic variable reordering [Rudell 93] l Change variable order automatically during computations l Repeatedly swap a variable with adjacent variable l Swapping can be done locally l Select the best location a 1 b 1 + a 2 b 2 a1a1a1a1 a2a2a2a2 b2b2b2b2 10   a2a2a2a2 b1b1b1b1 b1b1b1b1

21 SW Model: S-graphs u Acyclic extended decision diagram computing a transition function u S-graph structure: s Directed acyclic graph s Set of finite-valued variables s TEST nodes evaluate an expression and branch accordingly s ASSIGN nodes evaluate an expression and assign its result to a variable s Basic block + branch is a general CDFG model (but we constrain it to be acyclic for optimization)

22 An Example of S-graph a := a + 1 a := 0 detect(c) a<b BEGIN END F T TF – input event c – output event y – state int a – input int b – forever if (detect(c)) if (a < b) a := a + 1 emit(y) else a := 0 emit(y) emit(y)

23 S-graphs and Functions u Execution of an s-graph computes a function from a set of input and state variables to a set of output and state variables: s Output variables are initially undefined s Traverse the s-graph from BEGIN to END u Well-formed s-graph: s Every time a function depending on a variable is evaluated, that variable has a defined value u How do we derive an s-graph implementing a given function ?

24 S-graphs and Functions u Problem statement: s Given: a finite-valued multi-output function over a set of finite-valued variables s Find: an s-graph implementing it u Procedure based on Shannon expansion f = x f x + x’ f x’ u Result heavily depends on ordering of variables in expansion s Inputs before outputs: TESTs dominate over ASSIGNs s Outputs before inputs: ASSIGNs dominate over TESTs

25 Example of S-graph Construction x = a b + c y = a b + d a b c d x := 1 y := 1 01 0 1 1 1 d 0 x := 1 y := 0 x := 0 y := 1 x := 0 y := 0 0 0 1 Order: a, b, c, d, x, y (inputs before outputs)

26 Example of S-graph Construction x = a b + c y = a b + d a b x := 1 y := 1 01 01 x := c y := d Order: a, b, x, y, c, d (interleaving inputs and outputs)

27 S-graph Optimization u General trade-off: s TEST-based is faster than ASSIGN-based (each variable is visited at most once) s ASSIGN-based is smaller than TEST-based (there is more potential for sharing) u Implemented as constrained sifting of the Transition Function BDD u The procedure can be iterated over s-graph fragments: s Local optimization, depending on fragment criticality (speed versus size) s Constraint-driven optimization (still to be explored)

28 From S-graphs to Instructions  TEST nodes  conditional branches  ASSIGN nodes  ALU ops and data moves u No loops in a single CFSM transition s (User loops handled at the RTOS level) u Data flow handling: s “Don’t touch” them (except common sub-expression extraction) s Map expression DAGs to C expressions s C compiler allocates registers and select op-codes u Need source-level debugging environment (with any of the chosen entry languages)

29 Software Synthesis Procedure Specification, partitioning S-graph synthesis Timing estimation Scheduling, validation notfeasible feasible Code generation Compilation Testing, validation Production pass fail

30 POLIS : S-graph Level Estimation

31 Problems in Software Performance Estimation How to link behavior to assembly code? -> Model C code generated from S-graph and use a set of cost parameters How to handle the variety of compilers and CPUs?

32 Software Model

33 Execution Time of a Path and the Code Size Property : Form of each statement is determined by type of corresponding node. T struct = ƒ°pi Ct( pi: pi: takes value 1 if node i is on a path, otherwise 0. Ct(n,v): Ct(n,v): execution time for node type n and variable type v. and variable type v. S struct = ƒ°Cs( node_type_of (i), variable_type_of (i)) Cs(n,v): Cs(n,v): code size for node type n and variable type v. and variable type v. node_type_of (i), variable_type_of (i))

34 Cost Parameters * Pre-calculated cost parameters for: (1) Ct(n,v), Cs(n,v): Execution time and code size for node type n Execution time and code size for node type n and variable type v. and variable type v. (2) T pp, S pp : Pre- and post- execution time and code size. Pre- and post- execution time and code size. (3) T init, S init : Execution time and code size for local variable Execution time and code size for local variable initialization. initialization.

35 Problems in Software Performance Estimation How to link behavior to assembly code? How to handle the variety of compilers and CPUs? -> prepare cost parameters for each target

36 Extraction of Cost Parameters

37 Algorithm  Preprocess: extracting set of cost parameters.  Weighting nodes and edges in given S-graph with cost parameters.  Traversing weighted S-graph.  Finding maximum cost path and minimum cost path using Depth-First Search on S-graph.  Accumulating 'size' costs on all nodes.

38 S-graph Level Estimation :Algorithm Cost C is a triple (min_time, max_time, code_size) Algorithm: SGtrace (sg i ) if (sg i == NULL) return (C(0,,0)); if (sg i == NULL) return (C(0,,0)); if (sg i has been visited) if (sg i has been visited) return ( pre-calculated Ci(*,*,0) associated with sg i ); return ( pre-calculated Ci(*,*,0) associated with sg i ); C i = initialize (max_time = 0, min_time =, code_size = 0); C i = initialize (max_time = 0, min_time =, code_size = 0); for each child sg j of sg i { for each child sg j of sg i { C ij = SGtrace (sg j ) + edge cost for edge e ij ; C ij = SGtrace (sg j ) + edge cost for edge e ij ; C i.max_time = max(C i.max_time, C ij.max_time); C i.max_time = max(C i.max_time, C ij.max_time); C i.min_time = min(C i.min_time, C ij.min_time); C i.min_time = min(C i.min_time, C ij.min_time); C i.code_size += C ij.code_size; C i.code_size += C ij.code_size; } C i += node cost for node sg i ; C i += node cost for node sg i ; return (C i ); return (C i );

39 Experiments * Proposed methods implemented and examined in POLIS system. in POLIS system. * Target CPU and compiler: M68HC11 and Introl C compiler. M68HC11 and Introl C compiler. * Difference D is defined as

40 Experimental Results : S-graph Level

41 Performance and Cost Estimation: Summary u S-graph: low-level enough to allow accurate performance estimation u Cost parameters assigned to each node, depending on: s System type (CPU, memory, bus,...) s Node and expression type u Cost parameters evaluated via simple benchmarks s Need timing and size measurements for each target system s Currently implemented for MIPS, 68332 and 68HC11 processors

42 Performance and Cost Estimation 40 26 4163 14 18 9 u Example: 68HC11 timing estimation u Cost assigned to s-graph edges (Different for taken/not taken branches) u Estimated time: s Min: 26 cycles s Max: 126 cycles u Accuracy: within 20% of profiling a := a + 1 a := 0 detect(c) a<b BEGIN END F T TF emit(y)

43 Open Problems u Better synthesis techniques s Add state variables to simplify s-graph s Performance-driven synthesis of critical paths s Exact memory/speed trade-off u Estimation of caching and pipelining effects s May have little impact on control-dominated systems (frequent branches and context switches) s Relatively easy during co-simulation

HW/SW Synthesis. 2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s.

Similar presentations

Presentation on theme: "HW/SW Synthesis. 2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HW/SW Synthesis. 2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s.

Similar presentations

Presentation on theme: "HW/SW Synthesis. 2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s."— Presentation transcript:

Similar presentations

About project

Feedback