Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,

Similar presentations


Presentation on theme: "An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,"— Presentation transcript:

1 An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department, UCLA 2 Purdue University Partially supported by NSF. Address comments to lhe@ee.ucla.edu

2 Outline Background, Motivation and Problem Formulation Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] Network Flow Based Vdd Level Assignment Formulation Experimental Results Conclusions

3 Background Existing FPGAs are power inefficient compared to ASICs. Interconnect is the dominant component of FPGA power dissipation (dynamic and leakage). [Li, TCAD‘05] Power aware FPGA architectures and CAD algorithms have been studied extensively.  CAD algorithms to minimize power-delay product [Lamoureux, ICCAD’03]  Configuration inversion for leakage reduction [Anderson, FPGA’04]  Vdd-programmable FPGA logic blocks [Li, FPGA’04] [Li, DAC’04]  Vdd-programmable FPGA interconnects [Li, ICCAD’04] [Gayasen, FPL’04] [Anderson, ICCAD’04] [Lin, DAC’05]

4 Island style and mixed wire segment length. Routing switch/connection block (Two PMOS power transistors M3 and M4 are inserted between the tri-state buffer and VddH, VddL power rails, respectively.) [Li, ICCAD’04] Level converter free in routing tree (Guarantee that no VddL switch drives VddH switches.) with LEAST area and power penalty [Lin, TCAD’06]. Vdd Programmable Interconnect Arch.

5 Limitation of Existing Approaches Uniform wire segment length was assumed, and cannot be extended to mixed wire segment directly. LP based formulation is timing consuming and computational instable. Computational instability: small size circuit uses long runtime Time consuming: runtime goes up quickly for large circuit

6 Problem Formulations [ Dual-Vdd Level Assignment Problem ] Given: placement and routing results of a FPGA design Find: A Vdd-level assignment to each interconnect switch Objective: Minimize interconnect (dynamic and leakage) power Constraints:  Meet the delay target T spec  Vdd-level converters are inserted ONLY at CLB inputs/outputs

7 Outline Background, Motivation and Problem Formulation Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06]  Interconnect Power Reduction Estimation  LP Based Vdd-level Assignment Algorithm Network Flow Based Vdd Level Assignment Formulation Experimental Results Conclusions

8 Delay and Power Model for Interconnect Delay Model IIntrinsic delay and effective driving resistance of switch has been pre-characterized using SPICE. EElmore delay is used to calculate routing delay. Interconnect Power Model DDynamic power P d (Vdd jj )=0.5f clk *C*Vdd jj 2 LLeakage power P l (Vdd jj ) is pre-characterized using SPICE Interconnect power reduction estimation is the essential part of dual-Vdd assignment algorithm.

9 Review of Vdd Level Assignment Algorithm [Lin, DAC'05] The net-level bottom-up Vdd assignment guarantees the legalization of final solutions. [Lin, DAC’05] Leverage all extra slack with VddL switches [Lin, DAC’05] VddL possibility for switches S2=3 S1=1 b1 b2 b3 b4 Timing Slack assigned at sinks S2=3 S1=1 b1 b2 b3 b4 Vdd assignment base on estimation Power reduction estimation Interconnect power reduction estimation Problem remained: How to calculate VddL possibility for mixed wire segment?

10 VddL Possibility Calculation Represent timing slack in number of switches:  s i = L i * ( S i / D i ) s i is the number of VddL switches can be inserted in the path from source to j th sink in the routing tree. L i is the number of switches along this path.  s i : how many switches can be turned to VddL along source- to-sink-i path for the given timing slack S i. VddL possiblity for switch j at sink i based on load capacity:  f(i,j) = s i * (c ij / C i )  Key idea: distribute timing slack to each switch based on cap. L 2 = 3 D 2 = 12 s 2 = 3*(10/12)=5/2 S 2 =10 S 1 =6 b1, 8x b2, 8x b3, 16x b4, 16x f(2,2) = 1 f(2,3) = 1 f(2,4) = 1/2

11 Power Reduction Estimation for Mixed Wire Segments The lower bound estimation [Y. Lin, DAC'05] for interconnect power reduction is no longer valid for mixed wire segments. Our solution: develop the upper bound estimation of VddL switch number  Consistent upper bound of power reduction  Remove the non-linear term "min" and the corresponding extra LP constraints from lower bound estimation S = 2.7 b2, 8x, need 1.0 slack b1, 16x, need 1.8 slack f n (i,1) = 0.9 f n (i,2) = 0.5 lower bound of VddL switches = 0.9 +.5 = 1.4 S = 2.7 Consume 1.0 Sum up all VddL possibility 1.7 slack left -1.8 needed! Only 1.0 VddL switch assignment Problem here: Lower bound > actual number!

12 LP formulation for dual-Vdd Level Assignment Basic timing constraints Slack constraints Objective function Dynamic power reduction upper bound Leakage power reduction upper bound Slack upper bound Slack constraints Slack non-negative Arrival time for prim-output Arrival time for prim-input Arrival time constraints

13 Outline Motivation Problem Formulations Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] Network Flow Based Vdd Level Assignment Formulation  Overview of network flow based timing slack budgeting  Primal-dual reformulation Experimental Results Conclusions

14 Network Flow Based Timing Slack Budgeting Motivated by [Ghiasi, ICCAD’04] for logic level optimization Step 1: Reorganize objective function: Step 2: Eliminate timing slack variables (by substitution):

15 Network Flow Based Timing Slack Budgeting (cont.) Step 3: Reorganize objective function by timing nodes: Step 4: Generate dual-problem: Constant terms, remove Constant coefficients Node by node Edge by edge Node by node Edge by edge

16 Link Induced Network from Timing Graph 0 0 -9 -3/3 -2/2 -4 -2 -2/2 -3 -1/1 -3 0 -4 -6 -9 9 -7 0 -9 Flow in forward arch (solid segments) Flow in backward arch (dot segments) Demand in node i No negative weight cycle exists in the induced network. A min-cost flow can be found for sure! A shortest path based algorithm is used to produce the solution for primal problem.

17 Outline Motivation Problem Formulations Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] Network Flow Based Formulation Experimental Results Conclusions

18 Experimental Setting Cluster-based Island Style FPGA Structure  Size-10 cluster and size-4 LUT  100% buffered interconnects, subset switch block  60% length-4 and 40% length-8l wire segments  25x buffer for length-4 and 10x buffer for length-8 ITRS 100nm technology, 1.3v for VddH and 0.8v for VddL Use VPR [Betz-Rose-Marquardt] for placement and routing Use fpgaEva-LP2 [Lin et al, FPGA’05] for power calculation  Considering short-circuit power, glitch power and input vector  8% average error compared to SPICE simulation 20 biggest sequential MCNC benchmarks are tested Use LPsolver to solve LP

19 Both LP-based and Netflow-based algorithm achieves 85% VddL assignment on average. Dual-Vdd Assignment for FPGAs with Mixed Wire Segments

20 Interconnect Power Reduction 52% total interconnect power reduction is achieved!

21 Runtime comparison Netflow based algorithm gets consistent speedup and stable runtime More significant speedup is expected for larger circuits.

22 Outline Motivation Problem Formulations Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments] Network Flow Based Formulation Experimental Results Conclusions

23 A min-cost network flow based timing budgeting formulation which speedups up the budgeting procedure and the overall design flow up to 6000x and 20x, respectively, compared to LP based one. Both chip-level dual-Vdd assignment algorithms are for mixed length wire segment. Experimental results show an interconnect power reduction of 53% on average compared to single-Vdd FPGA designs.

24 Thank you! Q/A


Download ppt "An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,"

Similar presentations


Ads by Google