Yan Lin, Fei Li and Lei He EE Department, UCLA

Slides:



Advertisements
Similar presentations
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Advertisements

Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule G. Parthasarathy Malgorzata Marek-Sadowska Arindam Mukherjee Amit Singh University of California,
Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation Keheng Huang 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System.
EECE579: Digital Design Flows
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
August 12, 2005Uppalapati et al.: VDAT'051 Glitch-Free Design of Low Power ASICs Using Customized Resistive Feedthrough Cells 9th VLSI Design & Test Symposium.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Simultaneous Time Slack Budgeting and Retiming for Dual-Vdd FPGA Power Reduction Yu Hu 1, Yan Lin 1, Lei He 1 and Tim Tuan 2 1 EE Department, UCLA 2 Xilinx.
Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA
Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs.
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
 Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD  Y. Hu,
Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
Changbo Long ECE Department, UW-Madison Lei He EDA Research Group EE Department, UCLA Distributed Sleep Transistor Network.
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability 1 Lerong Cheng, 1 Yan Lin,
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Robust Low Power VLSI R obust L ow P ower VLSI Finding the Optimal Switch Box Topology for an FPGA Interconnect Seyi Ayorinde Pooja Paul Chaudhury.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Power Reduction for FPGA using Multiple Vdd/Vth
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
CAD for Physical Design of VLSI Circuits
Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Julien Lamoureux and Steven J.E Wilton ICCAD
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
An Improved “Soft” eFPGA Design and Implementation Strategy
1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28,
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Xiao Patrick Dong Supervisor: Guy Lemieux. Goal: Reduce critical path  shorter period Decrease dynamic power 2.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
An Automated Design Flow for 3D Microarchitecture Evaluation
An Active Glitch Elimination Technique for FPGAs
FPGA Glitch Power Analysis and Reduction
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction Yan Lin, Fei Li and Lei He EE Department, UCLA Partially supported by NSF grant CCR-0306682. Address comments to lhe@ee.ucla.edu.

Outline Review and Motivation Interconnect Leakage Power Reduction using Power-gating Interconnect Dynamic Power Reduction using Dual-Vdd Conclusions and Ongoing Work

Power Limitation of FPGAs Existing FPGAs are HIGHLY power inefficient (> 100X more than ASIC) E.g. [Kusse, ISLPED’98] Power is likely the largest limitation for FPGAs Design Example Vdd Energy Xilinx XC4003A 5v 4.2mW/MHz Static CMOS ASIC 3.3v 5.5uW/MHz It is well known that FPGAs are highly power inefficient compared to ASIC. The previous work has shown that for the same circuit implemented on FPGA consumes more than 100X power than that implemented on Static CMOS. Thus power is likely the largest limitation for FPGAs.

FPGA Power Reduction Power aware FPGA CAD algorithms for existing FPGA architectures CAD algorithms to minimize power-delay product [Lamoureux et al, ICCAD’03] Configuration inversion for leakage reduction [Anderson et al, FPGA’04] Power efficient FPGA circuits and architectures Dual-Vdd and Vdd-programmable FPGA logic blocks [Li et al, FPGA’04][Li et al, DAC’04] Vdd-programmable FPGA interconnects [Li et al, ICCAD’04] [Anderson et al, ICCAD’04] As far as FPGA power reduction is concerned, Previous work has studied power aware FPGA CAD algorithms without changing the current FPGA architectures. A suite of CAD algorithms are proposed to minimize power-delay product. Also, the configuration inversion was proposed to reduce lkg. Other previous work design power efficient FPGA circuits and architectures. Dual Vdd and Vdd-prog FPGAs have been proposed these Papers. This paper mainly focuses on FPGA interconnect power reduction. In the next couple of slides, I will review the FPGA architecture background and the Vdd-programmable interconnects proposed in that paper.

Overall FPGA Structure Cluster-based Island Style FPGA Structure Logic blocks are embedded into routing resources Wire segment connectivity is programmable Here, we show the overall structure of a cluster-based island style FPGA. The logic blocks are surrounded by horizontal and vertical routing channels which consist of wire segments. The wire segments can be connected to each other via a switch block at each intersection of vertical and horizontal routing channel. The input/output pins of a logic block can be programmed to connect to wire segments in routing channels.

FPGA Routing Structure Subset Programmable switch block An incoming track can be connected to different outgoing tracks with the same track number Programmable connection block Subset programmable switch block, dash line -> possible connection, bi-direction, Subset -> connect to same track number Programmable connection block, multiplexer-based, select one wire segment to be connected to logic block input pin, buffer between wire segment and multiplexer is connection switch

Vdd-programmable Interconnects [Li et al, ICCAD’04] Conventional routing switch Vdd-programmable switch Vdd selection for used switch Power-gating unused switch Configurable Vdd-level conversion Avoid excessive leakage when low Vdd switch drives high Vdd switches Power transistor Here we review Vdd-programmable switch, based on conventional routing switch implemented by tri-state buffer, two power transistors are inserted between the dual power supply rails and the buffer. Turning off one of the power transistor can perform Vdd selection for an used one, turning off both can perform power-gating for unused one. Configurable level converter is inserted in front of each interconnect switch to avoid excessive leakage when VddL drives VddH

Limitation of Vdd-programmable Interconnects [Li et al, ICCAD’04] Fine-grained Vdd-level converter insertion Area overhead 54% area overhead for circuit s38584 Leakage overhead 36% leakage overhead for circuit s38584 SRAM cell overhead 300% SRAM cell overhead for each switch Area/SRAM efficient low-power interconnects are needed However, the fine-grained Vdd-level converter introduces large area and leakage overhead. Analysis shows for circuit s38584, the area and leakage overhead is … respectively. Also, to achieve Vdd programmability and Configurable Vdd-level conversion, configuration SRAM cell overhead is 300%. It will increase configuration signal routability and SRAM is vulnerable from soft error. Therefore, … is needed

Outline Review and Motivation Interconnect Leakage Power Reduction using Power-gating Interconnect Dynamic Power Reduction using Dual-Vdd Conclusions and Ongoing Work

Low Utilization Rate of Interconnects 78.15% of total power is consumed by global interconnect power [Li et al, DAC’04] 47% of global interconnect power is leakage Why? Extremely low utilization rate (~12% w/ minimum array) Circuit # of total interconnect switches # of unused interconnect switches Utilization rate (%) alu4 apex4 bigkey clma des diffeq dsip elliptic ex5p frisc 36478 43741 63259 653181 87877 42746 75547 140296 45404 2388523 31224 37703 54017 593343 79932 36974 70138 125800 39288 216993 14.40% 13.80% 9.87% 9.16% 9.04% 13.50% 7.16% 10.33% 13.47% 9.15% Average 11.90% The previous work shows that global interconnect power become more significant (around 78.15%) after applying programmable dual-Vdd to logic blocks. And 47% of global interconnect power is leakage. Considering two factors, we can power-gate unused interconnect switches to reduce fpga power. This is due to the extremely low interconnect utilization rate. (animation) We customize the FPGA chip size for each application and use minimum array which just fits the application. The utilization rate is about 12% and will be even lower in the real world.

Interconnect Utilization Rate is Intrinsically Low Programmable switch block no more than 25% Programmable connection block Only one is used (for 64 tracks) In fact, it is because of programmability and not related to application and architecture. Programmable switch block – use one direction, get 50%, use 3 out of 6 dash lines, further get 25% Programmable connection block – can only use one out of 64 for example Thus, it is intrinsic and power-gating…is necessary Power-gating unused interconnects is necessary

Vdd-gateable Routing Switch Conventional routing switch Vdd-gateable routing switch Only two states for a routing switch High Vdd Power-gating Enable power-gating capability w/o extra SRAM cells Power transitor Here we shows the circuit of routing switch with power-gating capability, we call it Vdd-gateable routing switch. Based on conventional routing switch, one power transistor is inserted ( animation) When used, turning on both power transistor and pass transistor When unused, turning off both 4. Keep pass transistor M1 to prevent sneak leakage path Enable power-gating w/o extra SRAM

Vdd-Gateable Connection Block Conventional connection block Vdd-gateable connection block Here we shows … For connection switch, replace buffer with Vdd-gateable switch Replace multiplexer with decoder to select one wire segment to connect to logic block input and power-gate other switches Add one extra SRAM to disable decoder and power-gate all switches when the whole block is unused Need a low decoder to avoid leakage overhead Enable power-gating capability w/ only one extra SRAM for a connection block Only n+1 SRAM cells for 2n connection switches A low leakage decoder is needed

Power and Delay of Vdd-gateable Switch Vdd-gateable switch compared to conventional switch Dynamic power is almost the same >300X leakage power reduction ~6% delay increase Vdd Routing switch delay (ns) Energy per switch (Joule) w/o power-gating w/ power-gating 1.3v 5.90E-11 6.26E-11(6%) 3.3E-14 3.25E-14 1.0v 6.99E-11 7.42E-11(6.1%) 1.63E-14 1.65E-14 Here shows Power&Delay of Vdd-gateable switch Almost no dynamic power overhead—power transistor doesn’t switch, no charge discharge of drain/source capacitance Achieve 300X leakage reduction at the cost of 6% delay increase

Power Reduction by Power-gating Unused Interconnects Circuit Single-Vdd (baseline) Total Power Saving Interconnect power (W) Total power (W) [Li et al, ICCAD04] Vdd-gateable Interconnects alu4 0.0657 0.0769 25.13% 29.09% apex4 0.0437 0.0500 21.83% 30.70% bigkey 0.1044 0.1375 33.38% 24.89% clma 0.4918 0.5450 23.42% 45.69% des 0.1688 0.2136 36.71% 31.79% diffeq 0.0292 0.0360 17.50% 45.20% dsip 0.1003 0.1280 34.34% 43.66% Avg. -- 25.19% 38.18% Use cycle accurate simulation Reduce 38% on average In contrast Vdd-programmable reduces 25% due to Vdd-level converter overhead although it has flexibility in Vdd-selection Vdd-programmable interconnects Vdd-gateable interconnects

Outline Review and motivation Interconnect Leakage Power Reduction using Power-gating Interconnect Dynamic Power Reduction using Dual-Vdd FPGA fabrics and algorithms Design flow and quantitative evaluation Conclusions and Ongoing Work

Pre-Defined Dual-Vdd Routing Architecture Reduce dynamic power with dual-Vdd by making use of timing slack Partition routing channel into VddH and VddL regions Vdd-gateable interconnect switch is used Ratio of VddH/VddL track is an architectural parameter Dual-Vdd technique makes use of the timing slack in the circuit to minimize power. High Vdd (VddH) is applied to devices on the critical paths to maintain the performance while low Vdd (VddL) is applied to devices on non-critical paths to reduce power. FPGA applications usually have large amount of surplus timing slack. We may apply dual-Vdd technique to FPGA interconnect fabric and leverage the surplus timing slack to reduce interconnect dynamic power. Here we shows the dual-Vdd routing structure. We partition the routing channel into two region. Use Vdd-gateable switch In different region, use different supply voltage Ratio is parameter How to determine the parameter?

Ratio of VddH to VddL Track Determine ratio using dual-Vdd assignment profile without considering layout constraint Sensitivity-based dual-Vdd assignment Assignment unit --- a routing tree Power sensitivity --- ΔP/ ΔVdd Power difference for a routing tree between VddH and VddL Greedy algorithm --- sensitivity based Initial: uniform VddH assignment Procedure: assign VddL to routing tree with largest power sensitivity (but without increasing critical delay) Use assignment– sensitivity-based, greedy

Profile of Dual-Vdd Assignment Assignment with no critical path delay increase (VddH:VddL=1.5v:1.0v) Circuits #of routing trees # of logic blocks # of I/O blocks VddL routing trees (%) VddL logic blocks (%) alu4 782 162 22 49.74 82.10 apex4 849 134 28 35.45 78.36 bigkey 1542 294 426 67.77 85.03 clma 7995 1358 144 69.74 89.84 s38417 5426 982 135 64.17 80.05 seq 1138 274 76 20.74 61.62 spla 2091 461 122 54.52 88.47 Avg. 54.54 80.28 54% low –Vdd routing trees, use 1:1 Set the ratio of VddH/VddL track to 1:1

Level Converter is NOT Needed B A Subset---wire segment can only be connected to the segment with the same track number Suppose route A B, either use High Vdd track 0 or low Vdd track 2(animation) In other words, wire segment can only be connected to the segment with same Vdd level Thus, does not need level converter (animation) Wire segment can only be connected to another wire segment with the same track number via a subset switch block

Level Converter is NOT Needed B A Wire segment can only be connected to another wire segment with the same track number via a subset switch block No level converter is needed in switch block

Layout Constraint Due to Dual-Vdd Dual-Vdd introduces performance degradation due to layout constraint Insufficient routing resources for Vdd-matched routing trees May introduce detours Solutions Vdd-programmable interconnects [Li et al, ICCAD’04] Provide sufficient routing tracks for Vdd-matched routing trees Control leakage by power-gating unused interconnects Dual-Vdd introduces performance degradation Insufficient routing resourse Introduce detours to match Vdd type Solution Previous work---Vdd programmable In this paper, provide sufficient routing resource by increasing channel width Control leakage using Vdd-gateable

Design Flow for Dual-Vdd Interconnects Tech Mapped Netlist (Single-Vdd) Timing Driven Layout (Single-Vdd) Arch Spec Dual-Vdd Assignment for Routing Trees Double Channel width Timing Driven Layout (Dual-Vdd) Power-gating Unused Switches Here is the design flow considering dual-vdd… Single Vdd P&R Dual-Vdd Assignment Dual-Vdd routing guided by assignment Power delay evaluation Another design path when channel is duplicated Achieve effective Vdd-programmability for each routing tree, skip dual-Vdd routing Delay/Power Model (dual-Vdd) Delay/Power Estimation Delay Power

Dual-Vdd Routing Algorithm Based on the maze routing algorithm in VPR Modify the cost function TotalCost(n): the cost of routing tree T through wire segment n to the target sink j PathCostDv(n): the cost of the path from the current partial routing tree to wire segment n ExpectedDv(n,j): the estimated cost from wire segment n to the target sink j Matched(T,n): boolean function describing Vdd-matching status Dual-Vdd routing algorithm Based on maze routing used in VPR Modify original cost function To incorporate dual-Vdd Add one boolean function Match to penalize the routing tree with un-matched Vdd type

Outline Review and motivation Interconnect Leakage Power Reduction using Power-gating Interconnect Dynamic Power Reduction using Dual-Vdd FPGA fabrics and algorithms Quantitative evaluation Conclusions and Ongoing Work

Comparison of Low Power Architectures arch-SV 1.3v 1.0v 0.9v 1.5v arch-PV 1.5v/0.8v 1.3v/1.0v 0.9v/0.8v 1.0v/0.8v 0.27 arch-PV+PG 1.5v/0.8v 1.3v/1.0v 1.0v/0.8v 0.9v/0.8v arch-DV+PG(1.5W) 1.5v/0.8v 1.3v/0.9v 1.0v/0.8v 0.9v/0.8v 0.22 power (watt) 0.17 0.12 Circuit: S38584 0.07 60 70 80 90 100 110 120 130 clock frequency (MHZ) Dual-Vdd interconnects with fine-grained power gating May have performance degradation due to layout constraint Can reduce more power than purely power-gating unused switches Achieve 9.78% interconnect dynamic power reduction, 38.68% total power saving with 1.5W channel width W is the nominal routing channel width in single-Vdd FPGA Power performance tradeoff curves Single Vdd scaling, achieve power saving by scaling down Vdd at cost of performance lost Apply programmable Vdd to logic block, previous work Further power-gate unused interconnects Further introduce dual-Vdd by increasing routing channel by 50% Compared two low-power curves 3 & 4 we can see Compared the maximum clock frequency, there is performance lost due to layout constraint Dual-Vdd can reduce power comparing to purely power-gating The power reduction (gap between 3 and 4) decreases at lower clock freq, indicates timing slack is smaller at smaller clock freq

Impact of Routing Channel Width 30% 35% 40% 45% 50% 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 channel width power saving 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 normalized clock frequency 0.955 0.838 0.743 45.00% 38.68% 34.86% clock frequency We get the power reduction percentage at the maximum clock frequency achieved by dual-Vdd interconnects Channel width increases from 1.0W to 2.0W Power saving increases from 34.86% to 45% Normalized clock frequency increases from 0.743 to 0.955 Here we show the impact of routing channel X-axis channel width Left Y-axis power saving Right Y-axis normalized clock freq Red curve– power saving Blue curve—clock freq Clear that increasing channel width --more power reduction --higher performance --due to sufficient routing resources -----increase Vdd-matched routing tree rate -----more dynamic power reduction and similar leakage by power-gating unused

Area Overhead of Vdd-gateable Interconnects Device area is dominant Single-Vdd (baseline) Dual-Vdd w/ Power-gating (1.0W) Dual-Vdd w/ Power-gating (1.5W) Dual-Vdd w/ Power-gating (2.0W) [Li et al, ICCAD’04] Total FPGA area 7077044 11092744 15420197 20249865 22678225 Area overhead (%) - 57% 118% 186% 220% Area overhead is mainly due to power transistors for power-gating capability Track duplication with power-gating vs Vdd-programmable interconnects [Li et at, ICCAD’04] More power reduction (45% vs 25%) & less area overhead Mainly due to Vdd-level converter removal High Vdd interconnects with power gating is BEST considering area However, larger channel width - larger area overhead Device area is dominant compared to wiring area Geo mean of MCNC benchmarks and area overhead shown in the table Compared duplicated channel width and Vdd-programmable interconnects Less area overhead --- 2 power transistors for Vdd-programmability and Vdd-level converter More power reduction --- no Vdd-level converter Considering Area&power tradeoff, SingleVdd and Vdd-gateable interconnect is best

Outline Review and motivation Interconnect Leakage Power Reduction using Power-gating Interconnect Dynamic Power Reduction using Dual-Vdd Conclusions and Ongoing Work

Conclusions and Ongoing Work Developed power-gateable interconnects w/ virtually no extra SRAM cell Achieved 38.18% total power reduction using Vdd-gateable interconnects Achieved 24.78% interconnect dynamic power reduction, 45.00% total power reduction with duplicated (2W) channel width Ongoing work Power-ground design to support dual-Vdd Optimal mix of Vdd-programmable and Vdd-gateable interconnects Architecture evaluation considering Vdd programmability [Lin et al, to appear in FPGA’05]