Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA

Slides:



Advertisements
Similar presentations
Device and Architecture Co-Optimization for FPGA Power Reduction Lerong Cheng, Phoebe Wong, Fei Li, Yan Lin, and Prof. Lei He EE Department, UCLA Partially.
Advertisements

Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
Yan Lin, Fei Li and Lei He EE Department, UCLA
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Dec. 6, 2005ELEC Glitch Power1 Low power design: Insert delays to eliminate glitches Yijing Chen Dec.6, 2005 Auburn university.
Simultaneous Time Slack Budgeting and Retiming for Dual-Vdd FPGA Power Reduction Yu Hu 1, Yan Lin 1, Lei He 1 and Tim Tuan 2 1 EE Department, UCLA 2 Xilinx.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.
Changbo Long ECE Department, UW-Madison Lei He EDA Research Group EE Department, UCLA Distributed Sleep Transistor Network.
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
Lecture 7: Power.
ECE 331 – Digital System Design Tristate Buffers, Read-Only Memories and Programmable Logic Devices (Lecture #16) The slides included herein were taken.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability 1 Lerong Cheng, 1 Yan Lin,
1 Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization TingTing Hwang Tsing Hua University, Hsin-Chu.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Timepix2 power pulsing and future developments X. Llopart 17 th March 2011.
Robust Low Power VLSI R obust L ow P ower VLSI Finding the Optimal Switch Box Topology for an FPGA Interconnect Seyi Ayorinde Pooja Paul Chaudhury.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
CSET 4650 Field Programmable Logic Devices
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Circuit design for FPGAs: –Logic elements. –Interconnect.
Power Reduction for FPGA using Multiple Vdd/Vth
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Circuit design for FPGAs n Static CMOS gate vs. LUT n LE output drivers n Interconnect.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,
An Improved “Soft” eFPGA Design and Implementation Strategy
FPGA CAD 10-MAR-2003.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
EE222 Winter 2013 Steve Kang Lecture 5 Interconnects and Clock Signaling Open systems interconnect (
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Placement study at ESA Filomena Decuzzi David Merodio Codinachs
EE345: Introduction to Microcontrollers Memory
Multiple Drain Transistor-Based FPGA Architectures
Topics Circuit design for FPGAs: Logic elements. Interconnect.
FPGA Glitch Power Analysis and Reduction
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
Kejia Li, Yang Fu University of Virginia
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
FIGURE 5-1 MOS Transistor, Symbols, and Switch Models
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA Partially supported by NSF.

Overview  FPGA architecture evaluation Area and delay [Rose et al, JSSC’90] Power [Poon et al, FPLA’02][Li et al, FPGA’03]  Vdd programmability for power reduction Concept in [FPGA’03] Application to logic [FPGA’04][DAC’04] Application to interconnects [ICCAD’04][Anderson et al, ICCAD’04]  Novel circuits and Architecture evaluation for FPGAs with Vdd-programmability  Reduce power by 50% with 17% area and 3% delay increase

Outline  Power modeling and architecture evaluation methodology  FPGA Circuits for Vdd Programmability  Architecture Evaluation with Vdd programmability  Conclusions and Ongoing Work

Framework fpgaEva-LP Parasitic Extraction Cycle-accurate Power Simulator Power Arch Spec Logic Optimization(SIS) Tech-Mapping (RASP) Timing-Driven Packing (TV-Pack) Placement & Routing (VPR) Delay Area Benchmark circuits

FPGA Structure and Models  Cluster-based Island Style FPGA Structure 100% buffered interconnects, subset switch block input fc = 50%, output fc = 25%  Area and delay models similar to [Betz-Rose- Marquardt] But based on layout and SPICE for 100nm and below  Mixed-level power model from [FPGA’03] Dynamic power Capacitive power Short-circuit power ( transition time) Capacitive power Functional switch Glitch Static Power  Sub-threshold leakage  Reverse biased leakage  Gate leakage

New Power Model in fpgaEva-LP2  Short-circuit power  switching time * switching power  fpgaEva-LP used average signal transition time  fpgaEva-LP2 calculates transition time for each buffer as, the buffer delay  is NOT a constant 2 as in literature due to input slew  is pre-characterized by SPICE buffer delay<0.012 ns< 0.03 ns>0.03 ns α24.47

Validation Using SPICE  Validate by comparison for each power-component  High fidelity with average absolute error of 8%

Impact of Random Seeds in VPR Critical Path Delay (ns) FPGA Energy (nJ/cycle) circuit: s % +12%  12% delay variation and 5% energy variation  Min-delay solution among 10 runs is used

Evaluation of Single-Vdd FPGAs  Architectures explored Cluster size N = {6, 8, 10, 12} LUT size k = {3, 4, 5, 6, 7}  Energy-delay (ED) dominant architectures Architecture with smaller delay or less energy (compared to any other architecture)  Relaxed ED dominant set may be also valuable Critical Path Delay (ns) Total FPGA Energy (nJ/cycle) (8, 7) (6, 7) (6, 6) (10, 5) (8, 5) (12, 4) (6, 5) (8, 4) (6, 4) (10, 4) (8, 6) (12, 5) (10, 6) (12, 6) (10, 7) (12, 7) (10, 3) (12, 3) (8, 3) (6, 3)

Energy versus Delay Current commercial architecture  For 100nm ITRS technology Min-Energy arch (N,k)=(10,4) or (8.4) Min-Delay arch (N,k)=(8,7)  0.8x delay but 1.7x power Critical Path Delay (ns) Total FPGA Energy (nJ/cycle) (8, 7) (6, 7) (6, 6) (10, 5) (8, 5) (12, 4) (6, 5) (8, 4) (6, 4) (10, 4) (8, 6) (12, 5) (10, 6) (12, 6) (10, 7) (12, 7) (10, 3) (12, 3) (8, 3) (6, 3)

Outline  Power modeling and evaluation methodology  FPGA Circuits for Vdd Programmability  Architecture Evaluation with Vdd programmability  Conclusions and Ongoing Work

Vdd-programmable FPGA [DAC’04][ICCAD’04]  Vdd-programmable logic block Vdd selection Power-gating unused blocks

Vdd-programmable FPGA [FPGA’04][ICCAD’04]  Vdd-programmable logic block Vdd selection Power-gating unused blocks  Vdd-programmable switch  Vdd-level conversion is needed when VddL drives VddH To avoid excessive leakage

Vdd-programmable Routing Switch  Conventional routing switch  Vdd-programmable routing switch Brute-force design [ICCAD’04]  Two extra SRAM cells for each routing switch New design  One extra SRAM cell  NAND2 gate –- minimum size & high-Vt transistor

Vdd-Programmable Interconnect Connection Block  New design Only TWO extra SRAM cells for n connection switches Control logic includes 2n NAND2 and a decoder  Brute-force design [ICCAD’04] 2n extra SRAM cells for n connection switches

Power and Delay  Vdd-programmable switch uses 4X PMOS power transistor for 7X routing switch 1X PMOS power transistor for 4X connection switch  Compared to conventional switch 1000X less leakage power  Connection box is 28% faster and has 18% less dynamic power By moving mux from critical path of connection box (Vdd=1.3v) Type Switch delay (ns)Energy per switch (Joule) w/o power transistor w/ power transistor w/o power transistor w/ power transistor Routing5.9E-116.5E-11(+11%)3.3E-143.2E-14 (-2%) Connection2.9E-102.1E-10(-28%)3.8E-143.1E-14(-18%)

Vdd-gateable Routing Switch  Vdd-gateable two states  Normal Vdd or Power-gating  Enable power-gating capability w/o extra SRAM cells  Can be replaced by tri-state buffer  Conventional Power transitor

Vdd-gateable Connection Block  Enable power-gating capability w/ only one extra SRAM and a low leakage decoder  Conventional  Vdd-gateable

Outline  Power modeling and evaluation methodology  FPGA Circuits for Vdd Programmability  Architecture Evaluation with Vdd programmability  Conclusions and Ongoing Work

FPGA Architecture Classes Architecture ClassLogic BlockInterconnect Class0 (baseline)single-Vdd Class1programmable dual-Vdd programmable dual-Vdd, level converters in routing Class2programmable dual-Vdd VddH and Vdd-gateable Class3programmable dual-Vdd Class 1, but no level converters in routing  High-Vt is applied to configuration SRAM cells for all the classes

Vdd-level Converters  Class3 removes Vdd-level converters from interconnects in Class1 With constraints that no VddL drives VddH  We developed a routing that one routing tree has a single Vdd level But trees with different Vdd-levels can share the same wire track  Alternative approaches: Combined vdd-level converter and buffer [ Anderson et al, ICCAD’04 ] Our new work [DAC’05] allows dual vdd in a tree with a chip level time slack budgeting for extra power reduction

Energy versus Delay  ED-product reduction 20% by Class1 (Vdd-programmable interconnects w/ level converters) 45% by Class2 (Vdd-gateable interconnects) 50% by Class3 (class1 minus level converters)  Performance degrades 3% due to Vdd programmability Critical Path Delay (ns) Total FPGA Energy/Cycle (nJ) Class 0 (8, 7) (6, 7) (6, 6)(8, 6) (10, 5) (8, 5) (12, 4) (8, 4) (6, 5) (6, 4) (10, 4) Class 1 (8, 7) (6, 6) (10, 5) (12, 4) (8, 4) (6, 4) (6, 7) (8, 5) (8,7) (6,7) (8,5) (10,6) (6,6)(8,6) (10,5) (12,4) Class 2 (8,7) (6,7) (10,6) (6,6) (8,6) (10,5) (8,5) (12,4) Class 3 LUT 4 Low Energy LUT 7 High Performance

Min-area  Min-energy Energy versus Area E E E E E E E E E E E+07 Total FPGA Device Area Total FPGA Energy/Cycle (nJ) Class0 (8,7) (6,7) (8,6) (6,6) (10,5) (8,5) (12,4) (6,5) (6,4) (8,4) (10,4) Class2 (8,7) (6,7) (10,6) (6,6) (8,6) (10,5) (8,5) (12,4)(8,4) (10,4) Class1 (8,7) (6,7) (6,6) (10,5) (8,5)(12,4) (6,4) (8,4) Class3 (8,7) (6,7) (10,4) (8,4) (12,4) (10,5) (8,5) (6,6) (10,6) (8,6)  Average area overhead 118% for Class1 (Vdd-programmable interconnects w/ level converters) 17% for Class2 (Vdd-gateable interconnects) 52% by Class3 (Vdd-programmable interconnects w/o level converters)  Class2 is the best considering both energy and area

Energy Breakdown  Class2 and Class3 dramatically reduce global interconnect leakage  But class1 fails due to leakage in Vdd-level converters Class0 Class1 Class2 Class3 FPGA Architecture (N,k) = (12,4) Total FPGA Energy (nJ/Cycle) Logic Leakage Energy Logic Dynamic Energy Local Interconnect Leakage Energy Local Interconnect Dynamic Energy Global Interconnect Leakage Energy Global Interconnect Dynamic Energy 2.94% 3.71% 16.03% 8.09% 49.89% 19.33% 2.70% 3.04% 26.22% 7.43% 42.84% 17.77% 4.07% 3.92% 39.69% 9.81% 4.88% 37.62% 4.40% 4.32% 42.93% 10.81% 5.85% 31.70%

0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Class2: Vdd-gateable interconnects + Vdd-programmable CLBs(12, 4) FPGA Area Overhead 3.87% 0.60% 4.96% 4.82% 1.80% 1.39%Power Transistors & SRAMs (CLBs) Vdd-level Converters (CLBs) Control (Connection Blocks) Power Transistors (Connection Blocks) SRAMs (Connection Blocks) Power Transistors (Routing Switches) Routing Switches 3.87% Connection Blocks 10.38% Logic Blocks 3.19% Area Overhead  17% = 9% for power transistors + 5% for control + 2% for SRAM

Conclusions and New Results  Field programmability is needed for fine-grained dual-vdd and Vdd-gating in FPGA  Vdd-gating offers a better area-power tradeoff than Vdd- selection  45% energy-delay product reduction with 17% area overhead  Architecture with Vdd-programmability LUT size 4  low energy and area LUT size 7  best performance  New results [dac’05] Time slack allocation for Vdd-programmable interconnects Device and architecture co-optimization for 77% energy- delay reduction

References and Download  All references and tools at  Results in the slides have been updated compared to the paper in ISFPGA’05