Download presentation

Presentation is loading. Please wait.

Published byCarson Bettes Modified over 3 years ago

1
Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units Portugal

2
Index Introduction Temporal Partitioning Problem Definition New vs Previous Approach Algorithm Working Through an Example Experimental Results Related Work Conclusions Future Work

3
Introduction Virtual Hardware: Reuse of devices Save silicon area View unlimited resources Enabled by the dynamically reconfigurable FPGAs Two concepts: Context switching among functionalities Allowing a large function to be executed FPGA devices allowing virtualization: off-chip configurations on-chip configurations Several research efforts…

4
Introduction Answers: Temporal Partitioning Sharing of Functional Units Goal: combining the two... dx + u - u - dx + u_1 x y dx x x_1 dx u y_1 + y << 1 Size larger than the available reconfigware area?

5
Temporal Partitioning u x dx xu aux1 + x_1 dx y_1 + y << 1 time

6
Temporal Partitioning aux1 dx - u - dx + u_1 y << 1 time

7
Temporal Partitioning aux1 + u x dx x x_1 dx u y_1 + y << 1 aux1 dx - u - dx + u_1 y << 1 time

8
Temporal Partitioning Create temporal partitions to be executed by time- sharing the device Netlist level (structural) Difficulties when dealing with feedbacks Loss of Information Flat structure Intricate for exploiting sharing of functional units Behavioral level (functional) Loops can be explicitly represented Better design decisions A must for compilers for reconfigurable computing

9
Problem Definition But, if we decrease the needed area by sharing functional units? Simultaneously Temporal Partitioning and sharing of Functional Units THE PROBLEM: Given a dataflow graph (representing a behavioral description), a library of components,... Map the dataflow graph onto the available resources of the FPGA device: Considering sharing of Functional Units Considering Temporal Partitioning Decreasing the overall execution latency

10
New vs Previous Approach Previous Simultaneously Temporal Partitioning and High-Level Synthesis Component Library Constraints DFG, CDFG Circuit- generation, Logic Synthesis Temporal Partitioning High-Level Synthesis Component Library Circuit- generation, Logic Synthesis Constraints DFG, CDFG New

11
Algorithm Working Through an Example Suppose the following dataflow graph Consider: Area(+) = 1 cell Area(x) = 2 cells Delay(+) = 1 control step (cs) Delay(x) = 2 cs Total area of the DFG: 8 cells Available Area: 3 cells 0 1 2 3 4 5

12
Algorithm Working Through an Example Calculate ASAP and ALAP values Node 0 1 2 3 4 5 ASAP 0 0 1 0 2 3 ALAP 1 1 2 0 2 3 0 1 2 3 4 5

13
Algorithm Working Through an Example Identify the critical path Node 0 1 2 3 4 5 ASAP 0 0 1 0 2 3 ALAP 1 1 2 0 2 3 0 1 2 3 4 5

14
Algorithm Working Through an Example Create an initial number of TPs: suppose 3 0 1 2 3 4 5 MAX CS 1 2 3 Area

15
Algorithm Working Through an Example Map each node of the critical path on each temporal partition 0 1 2 3 4 5 MAX CS 2 cs 1 2 3 3 45 Area 1 cs

16
Algorithm Working Through an Example Try to map nodes in each temporal partition ( 1 ) 0 1 2 3 4 5 MAX CS 2 cs 1 2 3 3 45 Area 1 cs

17
Algorithm Working Through an Example 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 0 1 2 3 4 5 Try to map nodes in each temporal partition ( 1 )

18
Algorithm Working Through an Example 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 0 1 2 3 4 5 Try to map nodes in each temporal partition ( 1 )

19
Algorithm Working Through an Example 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 3 Try to map nodes in each temporal partition ( 1 ) 0 1 2 3 4 5

20
Algorithm Working Through an Example 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 2 Try to map nodes in each temporal partition ( 2 ) 0 1 2 3 4 5

21
Algorithm Working Through an Example 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area Try to map nodes in each temporal partition ( 3 ) 0 1 2 3 4 5 2

22
Algorithm Working Through an Example Relax: add 1 clock step to MAX CS 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 0 1 2 3 4 5

23
Algorithm Working Through an Example 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 0 1 2 3 4 5 3 Try to map nodes in each temporal partition ( 1 )

24
Algorithm Working Through an Example 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 0 1 2 3 4 5 Try to map nodes in each temporal partition ( 2 ) 2

25
Algorithm Working Through an Example 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 0 1 2 3 4 5 2 Try to map nodes in each temporal partition ( 2 ) 2

26
Algorithm Working Through an Example Merge Operation (1) 1 0 2 cs 1 2 3 3 45 1 cs MAX CS Area 0 1 2 3 4 5 2

27
Algorithm Working Through an Example Merge Operation (1) 1 0 1,2 3 3 45 MAX CS Area 2 0 1 2 3 4 5 4 cs 1 cs

28
Algorithm Working Through an Example Merge Operation (2) 1 0 1,2 3 3 45 1 cs MAX CS Area 2 0 1 2 3 4 5 4 cs

29
Algorithm Working Through an Example Merge Operation (2) 1 0 1,2,3 3 45 MAX CS Area 2 0 1 2 3 4 5 4 cs

30
Experimental Results Near-optimal w/o sharing vs sharing EX1SEHWAHALEWF

31
Experimental Results Near-optimal w/o sharing vs sharing FIRMAT4x4 7237

32
Experimental Results Performance vs No. of Temporal Partitions Mult4x4, R MAX =10 (no sharing of adders)

33
Experimental Results Is the algorithm good for scheduling? EWF SEHWA Comparison to some optimum results

34
Related Work List-Scheduling considering dynamic reconfiguration [Vasilko et al., FPL96] ASAP [GajjalaPurna et al., IEEE Trans. on Comp., 1999] Minimize latency taking onto account communication costs [Cardoso et al. VLSI99]: Enhanced Static-List Scheduling Iterative approach (Simulated Annealing) ILP formulation [SPARCs, DATE98; RAW98] Enhanced Force-Directed List Scheduling [Pandey et al., SPIE99] And others [see the Related Work section]

35
Conclusions Novel algorithm simultaneously doing temporal partitioning and sharing of functional units Low complexity Heuristic approach Based on gradually enlarging of time slots Permits to exploit the duality between the number of temporal partitions and resource sharing Close-to-optimum results with some examples Results proved that the algorithm is not weak when performing scheduling

36
Future Work Enhancements to the algorithm: consider functional units with pipelining consider pipelining between execution and reconfiguration Study the possibility to take into account communication and reconfiguration costs Test results with a reconfigurable computing system (comercial board)

37
Contact Author João M. P. Cardoso jmpc@acm.org http://w3.ualg.pt/~jmcardo THANK YOU!

Similar presentations

OK

Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic.

Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google