Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.

Similar presentations


Presentation on theme: "Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants."— Presentation transcript:

1 Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles cong@cs.ucla.edu Partially supported by NSF Grants CCR-0096383, and CCR-0306682, and Altera under the California MICRO program UCLA Joint work with Deming Chen, Lei He, Fei Li, Yan Lin

2 Outline Introduction Understanding Power Consumption in FPGAs Architecture Evaluation and Power Optimization Low Power Synthesis Conclusions

3 Why? FPGA is Known to be Power Inefficient! FPGA consumes 50-100X more power Why do we care about power optimization for FPGAs ?! Source: [Zuchowski, et al, ICCAD02]

4 ASICs Become Increasingly Expensive Traditional ASIC designs are facing rapid increase of NRE and mask-set costs at 90nm and below Source: EETimes 7.5 12 40 60 $0.0 $0.5 $1.0 $1.5 $2.0 $2.5 250nm180nm130nm100nm Total Cost for Mask Set ($M) 0 $10 $20 $30 $40 $50 $60 Cost/Mask ($K)

5 FPGA Advantages Short TAT (total turnaround time) No or very low NRE

6 Our Research Power Efficient FPGAs Circuit Design Fabric Design System Design Synthesis Tools

7 Outline Introduction Understanding Power Consumption in FPGAs Architecture Evaluation and Power Optimization Low Power Synthesis Conclusions

8 FPGA Architecture Program mable IO K LUT Inputs D FF Clock Out BLE # 1 BLE # N N Outputs I Inputs Clock I N Programm able Logic Programm able Routing

9 BC-Netlist Generator Power Simulator Power BLIF Logic Optimization(SIS) Tech-Mapping (RASP) Timing-Driven Packing (TV-Pack) Placement & Routing (VPR) SLIF Delay Area fpgaEva flow [Cong, et al, ICCD’00] Arch Spec BLIF Logic Optimization(SIS) Tech-Mapping (RASP) Timing-Driven Packing (TV-Pack) Placement & Routing (VPR) SLIF Delay Area Evaluation Framework – fpgaEva-LP fpgaEva-LP [Li, et al, FPGA’03]

10 BC-Netlist Generator Mapped Netlist Layout Buffer Extraction Netlist Generation for Logic Clusters Capacitance Extraction Delay Calculation BC-Netlist Back-annotation

11 Mixed-level Power Model – Overview Dynamic power  Switching power  Short-circuit power Related to signal transitions Functional switch Glitch Dynamic Interconnect & clock Macro-model Static Switch-level model Macro-model Logic Block components power sources Static Power  Sub-threshold leakage  Gate leakage  Reverse biased leakage Depending on the input vector

12 Cycle-Accurate Power Simulator Mixed-level Power Model Post-layout extracted delay & capacitance Random Vector Generation BC-Netlist Cycle Accurate Power Simulation with Glitch Analysis All cycles finished? No Power Values Yes

13 Power Breakdown Interconnect power is dominant Cluster Size = 12, LUT Size = 4Cluster Size = 12, LUT Size = 6

14 Power Breakdown (cont’d)  Leakage power becomes increasingly important (100nm) Cluster Size = 12, LUT Size = 4Cluster Size = 12, LUT Size = 6

15 Outline Introduction Understanding Power Consumption in FPGAs Architecture Evaluation and Power Optimization  Architecture Parameter Selection  Dual-Vdd/Dual-Vt FPGA Architecture Low Power Synthesis with Dual-Vdd Conclusion

16 Total Power along LUT and Cluster Size Changes Routing architecture: segmented wire with length of 4, and 50% tri-state buffers in routing switches

17 Routing Architecture Evaluation

18 Architecture of Low-power and High-performance ApplicationsBest FPGA architectureEnergy (E) Delay (t) E3tE3tEt 3 Low-power (E 3 t) Cluster size 10, LUT size 4, wire segment length 4, 25% buffered routing switches 0.96530.99040.89091.0080 High- performance (Et 3 ) Cluster size 12, LUT size 4, Wire segment length 4, 100% buffered routing switches 1.05020.88651.02680.7865 Arch. Parameter selection leads to 10% power/delay trade-off Uniform FPGA fabrics provide limited power-performance tradeoff Need to explore heterogeneous FPGA fabrics, e.g. dual-Vt and dual-Vdd fabrics

19 Outline Introduction Understanding Power Consumption in FPGAs Architecture Evaluation and Power Optimization  Architecture Parameter Selection  Dual-Vdd/Dual-Vt FPGA Architecture [Li, et al, FPGA’04] Low Power Synthesis with Dual-Vdd Conclusion

20 Dual-Vdd LUT Design Dual-Vdd technique makes use of the timing slack to reduce power  VddH devices on critical path performance  VddL devices on non-critical paths power  Assume uniform Vdd for one LUT Threshold voltage Vt should be adjusted carefully for different Vdd levels  To compensate delay increase  To avoid excessive leakage power increase

21 Vdd/Vt-Scaling for LUTs Three scaling schemes  Constant-Vt scaling  Fixed-Vdd/Vt-ratio scaling  Constant-leakage scaling Constant-leakage scaling obtains a good tradeoff useful for both single-Vdd scaling and dual-Vdd design

22 Dual-Vt LUT Design LUT is divided into two parts  Part I: configuration cells high Vt  Part II: MUX tree and input buffers normal Vt (decided by constant-leakage Vdd-scaling) Configuration SRAM cells  Content remains unchanged after configuration  Read/write delay is not related to FPGA performance Use high Vt ~40% of Vdd  Maintain signal integrity  Reduce SRAM leakage by 15X and LUT leakage by 2.4X  Increase configuration time by 13%

23 Pre-Defined Dual-Vt Fabric FPGA fabric arch-SVDT  Dual-Vt inside a LUT  A homogeneous fabric at logic block level with much reduced leakage power Traditional design flow in VPR can be applied Power saving  11.6% for combinational circuits  14.6% for sequential circuits Circuit arch-SVST (Single Vt) arch-SVDT (Dual Vt) power (watt)power saving alu40.07988.5% apex20.1089.3% apex40.053612.3% des0.23410.7% ex10100.17917.3% ex5p0.05911.6% misex30.07539.4% pdc0.25614.7% seq0.09279.4% spla0.18012.4% Avg.11.6% Table1 Combinational circuits circuit arch-SVST (Single Vt) arch-SVDT (Dual Vt) power (watt)power saving bigkey0.14812.3% clma0.63214.8% diffeq0.039119.7% dsip0.13414.5% elliptic0.14016.3% frisc0.19019.2% s2980.073613.4% s384170.30711.7% s384840.26110.2% tseng0.035114.0% Avg.14.6% Table2 Sequential circuits

24 Dual-Vdd FPGA Fabric Granularity: logic block (i.e., cluster of LUTs)  Smaller granularity => intuitively more power saving  But a larger implementation overhead Layout pattern: pre-defined dual-Vdd pattern  Row-based or interleaved pattern  Ratio of VddL/VddH blocks is 2:1 (benchmark profiling) Interconnect uses uniform VddH L-block: VddL H-block: VddH

25 Simple Design Flow for Dual-Vdd Fabric Based on traditional design flow, but with new steps Step I: LUT mapping (FlowMap) + P & R assuming uniform VddH (using VPR) Step II: Dual-Vdd assignment based on sensitivity Setp III: Timing driven P & R considering pre- defined dual-Vdd pattern (modified VPR)

26 Comparison Between Vdd-Scaling and Dual-Vdd For high clock frequency, dual Vdd achieves ~6% total power saving (~18% logic power saving) For low clock frequency, single-Vdd scaling is better Still a large gap between ideal dual-Vdd and real case  Ideal dual-Vdd is the result without layout pattern constraint circuit: alu4 0.03 0.04 0.05 0.06 0.07 0.08 0.09 65758595105115125 Max. Clock Frequency (MHz) Power (watt) arch-SVDT (Vdd Scaling) arch-DVDT(ideal case) arch-DVDT(pre-defined Vdd) 1.3v 1.0v 0.9v 1.3v/0.8v 1.0v/0.8v 0.9v/0.8v 1.5v 1.5/1.0v 1.3/1.0v 1.0/0.9v 1.5v/1.0v 1.3/0.9v

27 Vdd-Programmable Logic Block Power switches for Vdd selection and power gating One-bit control is needed for Vdd selection, but two-bit control power gating

28 Experimental Results with Vdd- Programmable Blocks Power v.s. performance Circuit: alu4 0.03 0.04 0.05 0.06 0.07 0.08 0.09 65758595105115125 clock frequency (MHz) total power (watt) arch-SV (Vdd scaling) arch-DV (configurable Vdd) arch-DV (ideal case) arch-DV (pre-defined Vdd) 1.3 v 1.0v 1.5v/1.0v 1.3v/0.8v 1.0v/0.8v 1.5v/1.0v 1.3v/0.9v 1.0v/0.8v 1.5v/0.8v 1.3v/0.8v 1.0v/0.9v 1.5v 0.9v/0.8v 1.0v/0.8v 1.3v/0.8v 1.5v/1.0v

29 Outline Introduction Understanding Power Consumption in FPGAs Architecture Evaluation and Power Optimization Low Power Synthesis Conclusions

30 Low Power Synthesis for Dual Vdd FPGAs FPGA architecture with dual-Vdds adds new layout constraints for synthesis tools Novel synthesis tools are required to support the architecture  Technology mapping [Chen, et al, FPGA’04]  Circuit clustering [Chen, et al, ISLPED’04]

31 Conclusions FPGA power consumption  Majority on programmable interconnects  Leakage is significant FPGA architecture optimization for power  Architecture parameter tuning has a limited impact  Using high Vt for configuration SRAM cells is helpful  Using programmable dual Vdd for logic blocks is helpful Power-efficient FPGA architectures introduce interesting CAD problems  Dual-Vdd mapping  Dual-Vdd clustering Up to 20% power saving reported using these algorithms


Download ppt "Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants."

Similar presentations


Ads by Google