Presentation is loading. Please wait.

Presentation is loading. Please wait.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

Similar presentations


Presentation on theme: "6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)"— Presentation transcript:

1 6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

2 6.3 HW/SW PARTITIONING 6.3.1 Introduction By hardware/software partitioning we mean the mapping of task graph nodes to either hardware or software. Applying hardware/software partitioning, we will be able to decide which parts must be implemented in hardware and which in software. 6.3.2 COOL (COdegign toOL) For COOL, the input consists of three parts: ① Target technology : This part of the input to COOL comprises information about the available hardware platform components. The type of the processors used must be included in this part of the input to COOL. ② Design constraints : The second part of the input comprises design constraints such as the required throughput, latency, maximum memory size, or maximum area for application-specific hardware. ③ Behavior : The third part of the input describes the required overall behavior. Hierarchical task graphs are used for this. COOL used two kinds of edges: communication edges and timing edges. 2 6. Application mapping (part 2)

3 For partitioning, COOL uses the following steps: ① Translation of the behavior into an internal graph model. ② Translation of the behavior of each node from VHDL into C. ③ Compilation of all C program for the selected target processor type, computation of the resulting program size, estimation of the resulting execution time. ④ Synthesis of hardware components: For each leaf node, application-specific hardware is synthesized. ⑤ Flattening the hierarchy: The next step is to extract a flat graph from the hierarchical flow graph ⑥ Generating and solving a mathematical model of the optimization problem: COOL uses integer linear programming (ILP) to solve the optimization problem. ⑦ Iterative improvements: In order to work with good estimates of the communication time, adjacent nodes mapped to the same hardware component are now merged. ⑧ Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created. 3 6. Application mapping (part 2)

4 The following index sets will be used in the description of ILP model: Index set V denotes task graph nodes. Each v  V corresponds to one task graph node. Index set L denotes task graph node types. Each l  L corresponds to one task graph node type. Index set of M denotes hardware component types. Each m  M corresponds to one hardware component type. For each of the hardware components, there may be multiple copies, or “instances”. Each instance is identified by an index j  J. Index set KP denotes processors. Each k  KP identifies one of the processors. The following decision variables are required by the model: X v,m : this variable will be 1, if node v is mapped to hardware component type m  M and 0 otherwise. Y v,k : this variable will be 1, if node v is mapped to processor k  KP and 0 otherwise. NY l,k : this variable will be 1, if at least one node of type l is mapped to processor k  KP and 0 otherwise. Type is a mapping V  L from task graph to their corresponding types. 4 6. Application mapping (part 2)

5 The cost function accumulates the total cost of all hardware units: C=processor costs + memory costs + cost of application specific hardware We can now present a brief description of some of the constraints of the ILP model: Operation assignment constraints: These constraints guarantee that each operation is implemented either in hardware or in software.  Additional constraints ensure that decision variables X v,m and Y v,k have 1 as an upper bound and, hence, are in fact 0/1-valued variables:  If the functionality of a certain node of type l is mapped to some processor k, then the processors’ instruction memory must include a copy of the software for this function: 5 6. Application mapping (part 2)

6 6  Additional constraints ensure that decision variables NY l,k are also 0/1- valued variables : Resource constraints Precedence constraints Design constraints Timing constraints

7 Example: In the following, we will show how these constraints can be generated for the task graph in Fig. 6.29. Suppose that we have a hardware component library containing three components types H 1, H 2 and H 3 with costs of 20, 25 and 30 cost units, respectively. Furthermore, suppose that we can also use a processor P of cost 5. Execution times of tasks T 1 to T 5 on components 7 6. Application mapping (part 2) T1 T2 T5 T3T4 TH1H2H3P 120100 220100 31210 41210 520100

8 The following operation assignment constraints must be generated, assuming that a maximum of one processor (P 1 ) is to be used: X 1,1 + Y 1,1 = 1 (Task 1 either mapped to H 1 or to P 1 ) X 2,2 + Y 2,1 = 1 (Task 2 either mapped to H 2 or to P 1 ) X 3,3 + Y 3,1 = 1 (Task 3 either mapped to H 3 or to P 1 ) X 4,1 + Y 4,1 = 1 (Task 4 either mapped to H 3 or to P 1 ) X 5,1 + Y 5,1 = 1 (Task 5 either mapped to H 1 or to P 1 ) Furthermore, assume that the types of tasks T 1 to T 5 are l =1, 2, 3, 3 and 1, respectively. Then, the following additional resource constraints are required: NY 1,1  Y 1,1 (6.17) NY 2,1  Y 2,1 NY 3,1  Y 3,1 NY 3,1  Y 4,1 NY 1,1  Y 5,1 (6.18) The total function is: Where #() denotes the number of instances of hardware components. This number can be computed from the variables introduced so far if the schedule is also taken into account. 8 6. Application mapping (part 2)

9 For a timing constraint of 100 time units, the minimum cost design comprises components H 1, H 2 and P. This means that tasks T 3 and T 4 are implemented in software and all others in hardware. 6.4 Mapping to heterogeneous multi-processors The different approaches for this mapping can be classified by two criteria: mapping tools may either assume a fixed execution platform or may design such a platform during the mapping and they may or may not include automatic parallelization of the source codes. The DOL tools from ETH incorporate: Automatic selection of computation templates Automatic selection of communication techniques Automatic selection of scheduling and arbitration The input to DOL consists of a set of tasks together with use cases. The output describes the execution platform, the mapping of tasks to processors together with task schedules. This output is executed to meet constraints and to maximize objectives. 9 6. Application mapping (part 2)

10 10 6. Application mapping (part 2) DOL problem graph RISC HWM1 HWM2 PTP bus shared bus RISCHWM1 PTP bus HWM2 shared bus DOL architecture graph

11 6. Application mapping (part 2) 11 DOL specification graph

12 6. Application mapping (part 2) 12 DOL implementation An allocation  :  is a subset of the architecture graph, representing hardware components allocated (selected) for a particular design. A binding  : a selected subset of the edges between specification and architecture identifies a relation between the two. Selected edges are called bindings. A schedule  :  assigns start times to each node v in the problem graph.


Download ppt "6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)"

Similar presentations


Ads by Google