Power Reduction for FPGA using Multiple Vdd/Vth

Slides:



Advertisements
Similar presentations
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 10: IC Technology.
Advertisements

FPGA (Field Programmable Gate Array)
1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Programmable Logic Devices
1 Dual Threshold Voltage Domino Logic Synthesis for High Performance with Noise and Power Constraint Seong-Ook Jung, Ki-Wook Kim and Sung-Mo (Steve) Kang.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
EECE579: Digital Design Flows
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Yan Lin, Fei Li and Lei He EE Department, UCLA
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Simultaneous Time Slack Budgeting and Retiming for Dual-Vdd FPGA Power Reduction Yu Hu 1, Yan Lin 1, Lei He 1 and Tim Tuan 2 1 EE Department, UCLA 2 Xilinx.
Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
Evolution of implementation technologies
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
Programmable logic and FPGA
Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs.
Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability 1 Lerong Cheng, 1 Yan Lin,
1 Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization TingTing Hwang Tsing Hua University, Hsin-Chu.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Robust Low Power VLSI R obust L ow P ower VLSI Finding the Optimal Switch Box Topology for an FPGA Interconnect Seyi Ayorinde Pooja Paul Chaudhury.
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
CAD for Physical Design of VLSI Circuits
Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
J. Christiansen, CERN - EP/MIC
Programmable Logic Devices
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Han Liu Supervisor: Seok-Bum Ko Electrical & Computer Engineering Department 2010-Feb-2.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
Post-Layout Leakage Power Minimization Based on Distributed Sleep Transistor Insertion Pietro Babighian, Luca Benini, Alberto Macii, Enrico Macii ISLPED’04.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
An Improved “Soft” eFPGA Design and Implementation Strategy
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
Click to edit Master title style Literature Review Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS.
Power-Optimal Pipelining in Deep Submicron Technology
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Chapter 10: IC Technology
FPGA Glitch Power Analysis and Reduction
Chapter 10: IC Technology
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
Kejia Li, Yang Fu University of Virginia
Measuring the Gap between FPGAs and ASICs
Chapter 10: IC Technology
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Programmable logic and FPGA
Presentation transcript:

Power Reduction for FPGA using Multiple Vdd/Vth Cecille Freeman Monday April 3, 2006

References Fei Li; Yan Lin; Lei He. “Vdd programmability to reduce FPGA interconnect power” in ICCAD 2004. International Conference on Computer Aided Design, 2004, p 760-5. Fei Li; Yan Lin; Lei He. “FPGA power reduction using configurable dual-Vdd” in Proceedings 2004. Design Automation Conference, 2004, p 735-40. Fei Li; Yan Lin; Lei He; Jason Cong. “Low-power FPGA using pre-defined dual-Vdd/dual-Vt fabrics” in ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA, v 12, 2004, p 42-50.

Outline Introduction Pre-defined dual Vdd Configurable Dual Vdd Dual Vt and dual Vdd structures CAD tool flow Results Configurable Dual Vdd Structure CAD Interconnect Dual Vdd

Introduction Power consumption FPGAs are less power efficient than ASICs Reducing power loss is important if FPGAs are going to be used in embedded systems Previous approaches mostly focus on changing the design implementation This is the first “in-depth study” of dual Vdd/Vt techniques for FPGA This technique is fairly common in ASIC More common in ASIC because each designer has the ability to adjust the circuit as they see fit. In FPGA, the board design is constrained by what is available for from the company.

Introduction Power consumption Power loss from switching and leakage Leakage is dominant in submicron (<100nm) Both leakage and switching are reduced by reducing Vdd Leakage is reduced by increasing Vt Programmable Vdd/Vt – 40-45% power reduction in ASIC ASIC is better because don’t have overhead of programmability

Introduction Dynamic Power f=clock frequency E= Effective transition density C=load capacitance Vdd=supply voltage Switching power is quadratically proportional to supply voltage

Introduction Leakage Power Ilkg=leakage current Vdd=supply voltage Ilkg increases as Vt decreases

Introduction Dual Vdd theory Lower supply power is slower, but results in less power loss Not all paths in the circuit need to be equally fast Critical path has high Vdd for speed Non-critical path has low Vdd for power Makes use of timing slack

Predefined Dual Vdd/Vt Design in 3 stages Determine a good Vdd/Vt scaling from a normal LUT design Dual Vt within each LUT Dual Vdd across the chip

Predefined Dual Vdd/Vt Single Vdd/Vt LUT (normal) SRAM cell, MUX tree SRAM holds the configuration bits MUX is attached to inverse and regular versions of the inputs Bits in SRAM determine which minterms are OR’d

Predefined Dual Vdd/Vt Single Vdd/Vt scaling Scaling across all LUTs Reduction in switching power (quadratic as reduce supply voltage) Large delay penalties as supply is reduced Examined 3 scaling schemes Constant Vt Fixed Vdd/Vt ratio Constant leakage power What is currently done

Predefined Dual Vdd/Vt Scaling Vdd to constant leakage is best

Predefined Dual Vdd/Vt Dual Vt within a single LUT SRAM can have a high Vt because they are configured at the start, and are only read during operation (ie, no switching delay) Increasing Vt increases the time taken to program the FPGA

Predefined Dual Vdd/Vt

Predefined Dual Vdd/Vt Vt of SRAM set to get 15X SRAM leakage reduction Increases configuration time by 13% MUX (region II) Vdd set using constant leakage scaling Vdd of SRAM set to be same as MUX (constant in LUT)

Predefined Dual Vdd/Vt High and Low Vdd LUTs Need a level converter Need to determine how the high and low voltage LUTs will be placed on the chip Need a tool to determine What should be in low and what should be in high How the placement and routing should be done

Predefined Dual Vdd/Vt Level Converter Basically 2 inverters with a level restore

Predefined Dual Vdd/Vt FPGA Fabric – 2 choices

Predefined Dual Vdd/Vt CAD tool Assignment of high/low LUTs based on “power sensitivity” LUT that will cause most power reduction when moved to low VDD is changed If timing constraints are met, keep, otherwise change back Routing done using simulated annealing, with extra cost function for matching the high and low LUT assignment

Predefined Dual Vdd/Vt Tested on 20 MCNC benchmarks Dual Vt 11.6% power reduction for combinational 14.6% power reduction for sequential Dual Vdd/Vt 13.6% combinational, 14.1% sequential Not as much as expected – routing and placement issues because predefined Layout Average 75% to low Vdd LUTs No significant difference with fabric layout

Configurable Dual Vdd/Vt Pre-defined did not get good power reduction from dual Vdd because of routing and placement issues Solution: make each LUT able to be either a high or a low Vdd LUT, so don’t have the extra constraint

Configurable Dual Vdd/Vt Configurable LUT Attached by P-MOS transistor to both rails SRAM configuration bits to determine which rail supplies power 3 possible configurations VddL, VddH, Power gated (both off) Configuration bits also determine if output goes through a level converter

Configurable Dual Vdd/Vt

Configurable Dual Vdd/Vt Problem: AREA Normally sleep transistors have high Vt, but this means they are larger Instead use normal Vt transistors for switches Normal Vt gives higher leakage Gate boosting When a switch is off, apply gate voltage one vt higher than Vdd at the source Gate boosting is used in Xilinx boards already

Configurable Dual Vdd/Vt Problem: AREA Apply switches with a larger granularity Clusters of 10 Logic blocks for one switch configuration Problem: Leakage from extra SRAM SRAM can have high Vt because not written during operation Vt set so have 15X leakage reduction over normal, increase in configuration time of 13%

Configurable Dual Vdd/Vt FPGA fabric Compared fabric with all programmable to one with VddH, VddL and programmable

Configurable Dual Vdd/Vt CAD tools Same as for predefined, except the matching cost now includes programmable blocks as being able to be assigned as either high or low LUTs in the placement algorithm

Configurable Dual Vdd/Vt Results: Compared to single Vdd FPGAs with Vdd optimized for the same target clock frequency Full supply programmability Logic power reduction of 35.5% Logic block area increased by 24% Partial supply programmability (1/1/3 H/L/P) Logic power reduction of 28.62% Logic block area increased by 14% Logic area increase is not very significant when compared to area of routing

Configurable Interconnect Global interconnect power is very high Becomes more dominant as apply power reduction to logic blocks Solution: make the interconnect programmable as well

Configurable Interconnect Only a small portion of the interconnect is ever being used (avg 11.9% on their tests Would be good to power gate the unused 1 configuration bit VddH, VddL 2 configuration bits VddH, VddL, power gated

Configurable Interconnect Configuration for routing switches and connection to logic block

Configurable Interconnect Power considerations for SRAM Additional SRAM means additional leakage power Only program SRAM once before use Use same high-Vt SRAM as for configurable logic blocks Delay considerations Longer delay though routing switch Bound delay increase to 6% by properly sizing the tri-state buffer

Configurable Interconnect CAD tools Similar to tools as the configurable Vdd/Vt Use only full programmable block fabric No placement and routing constraints

Configurable Interconnect Results One bit configuration (no power gating) = 22.21% power reduction Two bit configuration (power gating) = 50.55% power reduction 56.1% reduction to interconnect power Power gating reduces FPGA interconnect power by 32% - many unused routing resources can be gated

Summary Using a Dual Vt LUT decreases power by ~13% Predefined dual Vdd has very little effect on power because of routing Fully programmable Vdd logic cells reduces power by 28.6% Fully configurable Vdd logic cells and interconnects with power gating reduces power by 50.55% Tradeoffs: increase in area, increase in delay, increase in configuration time

Future Work Reduction of SRAM cells required for programmability Design of a good power supply network for the chip

Conclusions Excellent power reduction overall Excellent design if power reduction is a concern – no changes required to the design itself Might introduce some timing issues because of extra delay through chip Might be expensive due to extra area required on the chip

Thanks, Questions?