156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.

Slides:



Advertisements
Similar presentations
Spartan-3 FPGA HDL Coding Techniques
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.
Power Reduction Techniques For Microprocessor Systems
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
12004 MAPLD/1002??? When Should You and When Should You Not Use VHDL? Richard B. Katz NASA Office of Logic Design 2004 MAPLD International Conference September.
Synchronous Digital Design Methodology and Guidelines
Institute of Applied Microelectronics and Computer Engineering © 2014 UNIVERSITY OF ROSTOCK | College of Computer Science and Electrical Engineering.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Dec. 6, 2005ELEC Glitch Power1 Low power design: Insert delays to eliminate glitches Yijing Chen Dec.6, 2005 Auburn university.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
Programmable logic and FPGA
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Power Reduction for FPGA using Multiple Vdd/Vth
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal.
J. Christiansen, CERN - EP/MIC
Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han Brian L. Evans Earl E. Swartzlander, Jr.
Heterogeneous FPGA architecture and CAD Peter Jamieson Supervisor: Jonathan Rose.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Han Liu Supervisor: Seok-Bum Ko Electrical & Computer Engineering Department 2010-Feb-2.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
Basics of Energy & Power Dissipation
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
French 207 MAPLD 2005 Slide 1 Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California,
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
Introduction to Clock Tree Synthesis
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
March 28, Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
ELEC Digital Logic Circuits Fall 2015 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
LOW POWER DESIGN METHODS
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Floating-Point FPGA (FPFPGA)
Automated Extra Pipeline Analysis of Applications mapped to Xilinx UltraScale+ FPGAs
Evaluation of Power Costs in Triplicated FPGA Designs
FPGA Glitch Power Analysis and Reduction
Pipelined Array Multiplier Aldec Active-HDL Design Flow
Presentation transcript:

156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical and Computer Engineering Brigham Young University Provo, UT This work was supported by the NASA Earth-Sun System Technology Office as sub-contract with USC-ISI

156 / MAPLD 2005 Rollins 2 FPGAs’ High Power Consumption Flexibility and reprogrammability result in greater power consumption relative to ASICs Static power is insignificant compared to dynamic power consumption Dynamic power consumption: P avg = ½ Σ C n ·f n ·V 2 n є nets

156 / MAPLD 2005 Rollins 3 FPGAs’ High Power Consumption f n term represents the net switching activity Some net switching activity is unproductive: glitches Large amount of dynamic switching power wasted in glitches Goal: Lower energy by reducing the amount of glitching

156 / MAPLD 2005 Rollins 4 FPGA Glitching Example LUT 4 ABCDOUT A B C D 0 Glitching caused by unequal logic and interconnect delays

156 / MAPLD 2005 Rollins 5 FPGA Glitching Example LUT 4 ABCDOUT A B C D 1 1 Glitching caused by unequal logic and interconnect delays

156 / MAPLD 2005 Rollins 6 FPGA Glitching Example LUT 4 ABCDOUT A B C D Glitch Glitching caused by unequal logic and interconnect delays

156 / MAPLD 2005 Rollins 7 FPGA Glitching Example LUT 4 ABCDOUT A B C D Glitch 1 Glitching caused by unequal logic and interconnect delays

156 / MAPLD 2005 Rollins 8 FPGA Glitching Example LUT 4 ABCDOUT A B C D Glitch 1 1 Glitching caused by unequal logic and interconnect delays

156 / MAPLD 2005 Rollins 9 Power Classification Design Static Power: divide the total static power of the device by the relative size of the circuit Total Static Power / (Circuit LUTs / Total LUTs) Dynamic Glitching Power: % of signal glitches to total transitions is used to divide dynamic power into dynamic glitching and useful dynamic power Useful Dynamic Power: the “useful” transitions of the circuit

156 / MAPLD 2005 Rollins 10 Reduce Glitches with Pipelining Pipelined designs have less logic and interconnect between registers Pipelining causes long routes to be broken up Pipelining in FPGAs can come at little additional cost

156 / MAPLD 2005 Rollins 11 Pipelined Multiplier Long carry chain paths of multiplier stages are ideal for pipelining Pipelining gradually inserted in multipliers of different bit widths: –4x4 –8x8 –16x16 –32x32

156 / MAPLD 2005 Rollins 12 Multiplier Power Classification 12.5% 0.2% 87.3% 4-Bit 46.6% 0.2% 53.2% 8-Bit 16-Bit 68.2% 0.1% 31.7% 32-Bit 75.9% 0.0% 24.1% Dynamic Glitch Power Useful Dynamic Power Static Power

156 / MAPLD 2005 Rollins 13 Reduce Glitches with Pipelining Pipelining reduces glitching and lowers power

156 / MAPLD 2005 Rollins 14 Extreme Pipelining: Digit-Serial In an FPGA an NxN array multiplier can have N pipeline stages A digit-serial multiplier provides pipelining at a smaller granularity Digit-serial operations can increase throughput – but also increase latency Different digit sizes of digit-serial multiplier used: 1, 2, 4, 8, 16, 32

156 / MAPLD 2005 Rollins 15 Pipelined vs. Digit-Serial Multiplier: Total Power Consumption Digit-serial multiplier has almost no glitching - dynamic glitching power accounts for < 1% of total power Array Multipliers Digit-serial Multipliers

156 / MAPLD 2005 Rollins 16 Operation Energy Most studies focus on quantifying circuit design power only – often energy is a more useful metric Four metrics can be used for energy consumption –Energy per Operation –Energy Delay –Energy Throughput –Energy Density

156 / MAPLD 2005 Rollins 17 Pipelined vs. Digit-Serial Multiplier: Energy Per Operation Array Multipliers Digit-serial Multipliers Quantifies the amount of energy required to complete a single operation (in nJ) E op = P·t clk ·n

156 / MAPLD 2005 Rollins 18 Pipelined vs. Digit-Serial Multiplier: Energy Delay Array Multipliers Digit-serial Multipliers Combines the energy efficiency and speed of an operator into a single parameter (in nJ ns) E delay = P·t clk ·t min ·n

156 / MAPLD 2005 Rollins 19 Pipelined vs. Digit-Serial Multiplier: Energy Throughput Array Multipliers Digit-serial Multipliers Operation pipelined version of energy delay E thput = P·t clk ·t min ·δ

156 / MAPLD 2005 Rollins 20 Pipelined vs. Digit-Serial Multiplier: Energy Density Array Multipliers Digit-serial Multipliers Normalizes the amount of energy used to perform a single operation to the logic resources used E density = P·t clk /Area

156 / MAPLD 2005 Rollins 21 Pipelined vs. Digit-Serial Multiplier: Clock Energy Increase Array Multipliers Digit-serial Multipliers In contrasts to an ASIC, there is very little or no increase in clock energy as pipeline depths or digit sizes are increased

156 / MAPLD 2005 Rollins 22 Conclusions and Future Work Glitch power is often a significant percentage of total consumed power Up to 76% in an array multiplier Reducing glitching is essential for low power designs

156 / MAPLD 2005 Rollins 23 Conclusions and Future Work Pipelining is an effective way of reducing glitches Digit-serial multiplier almost eliminates glitches Reducing glitching by pipelining reduces power consumption Up to 96% in an array multiplier

156 / MAPLD 2005 Rollins 24 Conclusions and Future Work More information that just raw power consumption is required for effective low- power designs Different energy metrics can provide this extra information A high-level synthesis tool can use this information to produce low power designs