Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Similar presentations

Presentation on theme: "© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems."— Presentation transcript:

1 © 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

2 © 2004 Wayne Wolf System partitioning Lagnese et al: partition a large description based on functional information, not initial allocation. Thomas et al:  developed Verilog-based simulation system for performance evaluation  assumes bus-based CPU-ASIC model  provides several types of communication primitives  design evaluation based on both static evaluation (time for single execution) and dynamic evaluation

3 © 2004 Wayne Wolf Hardware-software partitioning Partitioning methods usually allow more than one ASIC. Typically ignore CPU memory traffic in bus utilization estimates. Typically assume that CPU process blocks while waiting for ASIC. CPU ASIC mem

4 © 2004 Wayne Wolf Gupta and De Micheli Target architecture: CPU + ASICs on bus Break behavior into threads at nondeterministic delay points; delay of thread is bounded Software threads run under RTOS; threads communicate via queues

5 © 2004 Wayne Wolf Specification and modeling Specified in Hardware C. Spec divided into threads at non-deterministic delay points. Hardware properties: size, # clock cycles. CPU/software thread properties:  thread latency  thread reaction rate  processor utilization  bus utilization CPU/ASIC execution are non-overlapping.

6 © 2004 Wayne Wolf HW/SW allocation Start with unbounded-delay threads in CPU, rest of threads in ASIC. Optimization:  test one thread for move  if move to SW does not violate performance requirement, move the thread  feasibility depends on SW, HW run times, bus utilization  if thread is moved, immediately try moving its successor threads

7 © 2004 Wayne Wolf COSYMA Ernst et al.: moves operations from software to hardware. Operations are moved to hardware in units of basic blocks. Estimates communication overhead based on bus operations and register allocation. Hardware and software communicate by shared memory.

8 © 2004 Wayne Wolf COSYMA design flow C* ES graph partitioning cost estimation gnu C run time analysis CDFG high-level synthesis

9 © 2004 Wayne Wolf Cost estimation Speedup estimate for basic block b:   c(b) = w(t HW (b) - t SW (b) + t com (Z) - t com (Z + b)) * It(b)  w = weight, It(b) = # iterations taken on b Sources of estimates:  Software execution time (t SW ) is estimated from source code.  Hardware execution time (t HW ) is estimated by list scheduling.  Communiation time (t com ) is estimated by data flow analysis of adjacent basic blocks.

10 © 2004 Wayne Wolf COSYMA optimization Goal: satisfy execution time. User specifies maximum number of function units in co- processor. Start with all basic blocks in software. Estimate potential speedup in moving a basic block to software using execution profiling. Search using simulated annealing. Impose high cost penalty for solutions that don’t meet execution time.

11 © 2004 Wayne Wolf Two-phase optimization Inner loop uses estimates to search through design space quickly. Outer loop uses detailed measurements to check validity of inner loop assumptions:  code is compiled and measured  ASIC is synthesized Results of detailed estimate are used to apply correction to current solution for next run of inner loop.

12 © 2004 Wayne Wolf Vahid et al. Uses binary search to minimize hardware cost while satisfying performance. Cost and performance compete—to reduce competition, accept any solution with cost below C size. Cost function:  k perf (  performance violations) + k areaf (  hardware size). k

13 © 2004 Wayne Wolf Kalavade et al. Uses both local and global measures to meet performance objectives and minimize cost. Global criterion: degree to which performance is critically affected by a component. Local criterion: heterogeneity of a node = implementation cost.  a function which has a high cost in one mapping but low cost in the other is an extremity  two functions which have very different implementation requirements (precision, etc.) repel each other into different implementations

14 © 2004 Wayne Wolf GCLP algorithm Schedule one node at a time:  compute critical path  select node on critical path for assignment  evaluate effect of change in allocation of this node  if performance is critical, reallocate for performance, else reallocate for cost Extremity value helps avoid assigning an operation to a partition where it clearly doesn’t belong. Repellers help reduce implementation cost.

15 © 2004 Wayne Wolf D’Ambrosio et al. Use general-purpose optimizer for HW/SW assignment. Can model both hard and soft deadlines. Measure expandability of system as difference between upper and lower performance bounds. Loose upper bound on CPU utilization leads to excessive hardware cost in final result. Use simulation to estimate execution time of each process.

16 © 2004 Wayne Wolf Binary search algorithm If zero-cost solution is found for given hardware size, zero-cost solution is guaranteed to exist for larger hardware size. Therefore, can use binary search to select satisfying solution. Evaluate cost of point when it is tested, rather than generate costs of all points in advance. Sufficient to look for a zero-cost solution: 10080503010000

Download ppt "© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems."

Similar presentations

Ads by Google