Presentation is loading. Please wait.

Presentation is loading. Please wait.

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Similar presentations


Presentation on theme: "LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst."— Presentation transcript:

1 LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst 1

2 Overview Motivation Introduction FPGA Architecture LOPASS Synthesis Flow High level Power Estimation Power Optimization Engine Multiplexer Optimization for Interconnect Reduction Experimental Results Conclusion 2

3 Motivation Power consumption Critical constraining factor in IC design flow Field Programmable Gate Arrays(FPGA) Power inefficient due to large amount of transistors for programmability Fixed Logic and Routing Resources Difficult to optimize during physical design stage 3

4 Introduction Behavioral Level Optimization scheduling, allocation, binding Techniques for power reduction high level power estimation simultaneous scheduling allocation and binding for power optimization interconnection optimization 4

5 Previous Work Most previous high level synthesis techniques for FPGAs optimized objectives other than power reduction Dynamic reconfiguration during run time to save area, [ M. Vasilko, Int.Workshop Logic Architecture Synthesis,1995 ] Tradeoff between power and circuit speed by selecting different implementations of components Power consumption in steering logic and interconnects were not considered. [ F. G. Wolff, Proc IEEE Nat.Aerospace.Conf.,2000 ] Newer studies have looked into simultaneous resource allocation and binding algorithms for power reduction [ D. Chen, Proc. AsiaSouth Pacific Des. Autom. Conf., Jan. 2007 ] 5

6 Techniques for Power Reduction High level power estimation For effective power optimization wire capacitance, length, FPGA characteristics Power Optimization engine combined solution space Simulated Annealing based algorithm Interconnect Optimization Reduce Multiplexer(MUX) requirement 6

7 FPGA Architecture SRAM based technology Configurable Logic Block (CLB) Basic Logic Element (BLE) Look Up Table (LUT) Routing Architecture parameters Channel Width (W) Switch box flexibility (F s ) Connection box flexibility (F c ) 7

8 LOPASS Synthesis Flow Design in HDL converted to CDFG Estimated power values from power estimator Power optimization by low power optimization engine RTL synthesis using Design Compiler FPGA evaluation tool fpgEva_LP2 report delay, power and area. 8

9 High Level power Estimation Wire Length Estimation Rent’s Rule T = kN p Interconnect density function i(l) p is Rent’s exponent, α is fraction of sink terminals f.o is average fan-out, k is average input/output per CLB 9

10 High Level power Estimation cont. Switching Activity Estimation CDFG simulation C in (O,O’), input transitions when FU switches from O to O’ The switching activity Sin is given by The total switching activity of the overall design 10

11 High Level power Estimation cont. Resource library Characterization Design ware libraries from Synopsys different resource versions for implementing same operation type Resource characterization flow 11

12 High Level power estimator Static and Dynamic power need to considered Dynamic power is given by P dynamic = P LUT + P REG +P LW +P GW Static power is given by P static = P s_LUT + P s_FF + P s_LB + P s_GB P LUT = N LUT.S.E LUT.f P REG = N REG.S.E REG.f P LW, G LW = 0.5f.S.V dd 2.C wire 12

13 Power Optimization Engine FPGAs have abundance of distributed registers No efficient support for wide MUXes Uses simulated annealing based on hill climbing to gradually reduce overall power Power Optimization engine 13

14 Multiplexer Optimization for Interconnect Reduction Register binding Cofamily based algorithm Port assignment Port Assignment Algorithm Definitions DFG, G =(V,A) Compatibility Graph G c = (V c,A c ) 14

15 Register Binding Given a compatibility graph G c = (V c,A c ) find a subset of A c that covers all vertices in V c total sum of weights of all edges is minimum Calculate minimum weighted cofamilies of a partially ordered set (POSET) POSET chain, antichain, k-family, k-cofamily Theorem: Register binding on a compatibility graph G c into k registers is equivalent to finding k disjoint chains in the POSET. 15

16 Register Binding cont. Find the minimum weighted k-cofamily in POSET Convert POSET to a network flow graph, the split graph Find the minimum cost flow for this split graph Cost of each edge is given by 16

17 Cost Function Formulation A MUX occurs in two situations when more than two registers feed data to a port when more than two FUs produce results and store them into a register The cost function is defined as N mux = number of MUXes saved/wasted T r-f = total connections between registers and fan out FUs T fu = total fanout FUs involved α and β are positive scaling constants 17

18 Port Assignment Technique for reducing MUX connection Case 1 Case 2 18

19 Experimental Results Power Estimation Comparison between estimated power and those reported by fpgaEva_LP2 Wire length is 13.7% away from reality Total power is 14.1% away from reality Multiplexer Optimization Comparison between k-co family algorithm and Bipartite algorithm and Left edge algorithm 24.7 % better than Bipartite algorithm 29.6% better than Left edge algorithm 19

20 Experimental Results LOPASS Compared to SPARK 9.1 % better in terms of latency optimization LOPASS Compared to Synopsys Behavioral Compiler 57.3% reduction in CLBs 61.6% reduction in total power consumption 10.6% reduction in critical delay LOPASS Compared to Impulse C On average 77.1% reduction in multipliers and 27.9% in LEs 44.1% and 31.1% reduction in dynamic and total power 20

21 Conclusion A Low power architectural synthesis system, LOPASS for FPGA designs is presented It includes three major components a flexible high level power estimator a simulated annealing based optimization engine a k-co family based register binding algorithm LOPASS is 61.6% better on power consumption and 10.6% better on clock period compared to Synopsis BC LOPASS is 31.1% better on power consumption with 11.8% penalty on clock period compared to Impulse C 21

22 Thank You! 22


Download ppt "LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst."

Similar presentations


Ads by Google