Presentation is loading. Please wait.

Presentation is loading. Please wait.

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,

Similar presentations


Presentation on theme: "DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,"— Presentation transcript:

1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California, Los Angeles This work is partially supported by the California MICRO program and the NSF Grant CCR-0306682

2 Outline Introduction Related Works Definitions and Problem Formulation Algorithm Description Cut Enumeration Delay and Area Propagation Cost Function for a Cut Global and Local Cost Adjustments Iterative Cut Selection Experimental Results Conclusions and Future Work

3 Introduction Field Programmable Gate Array (FPGA) has become increasingly popular Fast to market No or very low NRE (non-recurring expenses) The LUT-based FPGA architecture dominates the existing programmable chip industry FPGA technology mapping converts a given Boolean circuit into a functionally equivalent network comprised only of LUTs FPGA technology mapping is a crucial optimization step in the FPGA design flow

4 Related Works on FPGA Mapping Area Minimization Chortle-crf, [Francis, et al, DAC’91] MIS-pga, [Murgai, et al, ICCAD’91] Praetor, [Cong, et al, FPGA’99] Anti-fuse FPGA Mapper, [Kang, et al, ASPDAC’04] Delay Minimization DAG-Map, [Chen, et al, DTC’92] FlowMap, [Cong, et al, ICCAD’92] Edge-map, [Yang, et al, ICCAD’94] Power Minimization PowerMinMap, [Li, et al, ASPDAC’03] Emap, [Lamoureux, et al, ICCAD’03] DVmap, [Chen, et al, FPGA’04] Simultaneous Delay and Area Minimization FlowMap-r, [Cong, et al, TVLSI’94] CutMap, [Cong, et al, FPGA’95] BoolMap-D, [Legl, et al, DAC’96]

5 Definitions DAG : a Boolean network Cone C v : a sub-network rooted on a node v K-feasible cone : |input(C v )|  K Fanin Cone F v : the largest C v K-feasible cut : A K-feasible C v Occupies a K-LUT Unit delay model : One LUT contributes one unit delay No edge delay a b c d e v FvFv 3-feasible cone C v PIs Delay of 2

6 Problem Formulation Delay-optimal Area Optimization problem Given: a Boolean network; an integer K Goal: cover the network with K-feasible cones (K-LUTs), such that Optimal mapping depth Area (number of LUTs) is minimized NP-hard problem on area minimization

7 Highlights of Our Algorithm Consider potential node duplications and make mapping-area estimation close to reality Search solution space considering both global and local optimality information Carry out an iterative cut selection procedure on top of cost adjustment to further improve solution quality Each technique used is simple and intuitive The key is the right combination of them

8 New cut Cut Enumeration a b d z yx c w a b d z yx c w Combine sub-cuts on the inputs of the gate Process each gate in topological order from PIs to POs Subcut Another Subcut

9 Complexity Analysis Number of cuts on a node for the worst case is O(n K ) Practically, it is a small constant for small K Average over 20 largest MCNC benchmarks

10 Delay and Area Propagation a c d yx z b w e f g Delay 1, Area 1 Delay = 1 Area = 1 Delay = 2 Area = 2 Delay 1, Area 1 Delay 2, Area 3 Delay 2, Area 2 Delay = 1 Area = 1 Delay = 1 Area = 1 Propagation process visits cuts and nodes iteratively The longest best delay on the POs is the optimal mapping delay

11 Area Estimation A C =  [A i / f(i)] + U C i = input(C) A i : estimated area of the fanin cone on signal i f(i) : fanout number of i U c : area of the cut itself Try to estimate area considering fanout effect Praetor, [Cong, et al, FPGA’99] Can under-estimate the area because of node duplications qr s p nmo t u Cut C t Cut C u f(p) = 2 ApAp Cut C A s / 2

12 C3C3 fanin1fanin2 Cost (Area) Function of a Cut Some Key parameters I C : cutsize of C N C : number of nodes covered by C f(v): fanout number of the root node v P f : duplication cost a b c d e v C1C1 C2C2

13 Duplication Cost Adjustment Consider potential node duplications Check the sub-cuts for multiple fanouts Propagate adjusted cost globally Subcut C f2 N Cf2 = 1 Multiple fanouts New cut C I C = 4 q r s Subcut C f1 p nmo Duplication Cost:  N Cf : number of nodes the subcut C f contains  I C : cutsize of C

14 Non- critical LUT Critical LUT Cut Selection – Mapping Generation From POs to PIs Critical paths optimal delay + best area available Non-critical paths relaxed delay + better area a c d yx z b w e f g

15 Techniques for Better Cut Selection Cut selection equivalent to min-cover problem Greedy approach will not work well Use heuristics to guide the selection Iterative Cut Selection Procedure Local Cost Adjustment Input Sharing Slack Distribution Cut Probing

16 Iterative Cut Selection (ICS) Some valuable information on area is unknown until after mapping mapped LUT root nodes duplicated nodes ICS carries out multiple mapping iterations Start Mapping Iteration i, i++ Profiling data Adjust Cut Cost i < threshold Exit if i = threshold

17 Local Cost Adjustment – Input Sharing Takes advantage of existing resources Considers roots from previous iterations The more a cut shares inputs with others, the better for the cut d e f g Become LUT roots Share inputs with existing LUTs Duplicated node

18 Local Cost Adjustment – Slack Distribution Slack C = Req v – 1 – MAX (Arr i ) i  input(C) If Slack C < 0, C is not a timing_feasible cut The larger the Slack C, the better for C in terms of slack distribution effect a c d yx z b w Largest arrival time among inputs Req d : Required time of the root C

19 Local Cost Adjustment – Cut Probing Probe the amount of area gain locally before making decisions about a cut Reduce connections between LUTs Reduce potential node duplications based on previous duplication profiling Reconvergent paths handling Use C final to guide cut selection

20 Experimental Results – Settings DAOmap is implemented using C language within the UCLA RASP system Compare LUT counts and runtime to CutMap [Cong et al, FPGA’95] Use a 750 MHz SunBlade-1000 Solaris machine Test on LUT input numbers from 4 to 6 Benchmarks 20 largest MCNC benchmarks A set of large industrial benchmarks

21 Experimental Results of DAOmap over CutMap on MCNC Benchmarks Average Area ReductionAverage Run Time Improvement 4-LUT-13.98%13.2X 5-LUT-16.02%24.2X 6-LUT-12.44%4.7X After mapping After mapping + packing (daomap + mpack) vs. (“cutmap –x” + mpack) Average Area ReductionAverage Run Time Improvement 4-LUT-7.50%57.7X 5-LUT-11.31%38.7X 6-LUT-7.90%10.1X

22 Detailed Experimental Results on Industrial Benchmarks CutMap DAOmap Comparison Bench marks LUT No. Run Time (s) LUT No. Run Time (s) LUT (Reduce) Run Time (Improve) big19928301916993-7.6%3.2 big2->10H.14625708-- big310005289269031106-9.7%272.9 big4118005839364156-20.6%3.7 big5->10H.322303377-- big6390001443732028402-17.9%35.9 Ave.-13.98%78.9X After mapping into 5-LUTs

23 Individual Technique Analysis Techniques% dropped Cut Enumeration Min-cost propagation4.35% Global cost adjustment2.68% Cut Selection Input sharing4.55% Iterative cut selection (ICS)2.04% Others<1%

24 Mapping Iteration Analysis 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 123456 Mapping Iterations Improvement %  For single iteration only (the base case), use manual profiling [Chen et al, FPGA’04]  When the iteration number is more than 3, it is no longer helpful

25 Conclusions and Future Work We presented a new mapping algorithm, DAOmap, to minimize FPGA delay and area We built several cost-adjustment heuristics and used an iterative mapping procedure DAOmap gained significant amount of area and runtime reduction over a state-of-the-art algorithm CutMap Future works include adding cut-pruning techniques for mapping with larger K values


Download ppt "DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,"

Similar presentations


Ads by Google