March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.

Slides:



Advertisements
Similar presentations
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Advertisements

1 A latch is a pair of cross-coupled inverters –They can be NAND or NOR gates as shown –Consider their behavior (each step is one gate delay in time) –From.
Selected Design Topics. Integrated Circuits Integrated circuit (informally, a chip) is a semiconductor crystal (most often silicon) containing the electronic.
An International Technology Roadmap for Semiconductors
ECE C03 Lecture 71 Lecture 7 Delays and Timing in Multilevel Logic Synthesis Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
1/48 ENERGY OPTIMIZATION TECHNIQUES: FPGA GLITCH REDUCTION Patrick Cooke and Elizabeth Graham.
Introduction to Sequential Logic Design Latches. 2 Terminology A bistable memory device is the generic term for the elements we are studying. Latches.
Flip-Flops Computer Organization I 1 June 2010 © McQuain, Feng & Ribbens A clock is a free-running signal with a cycle time. A clock may be.
CHAPTER 3 Sequential Logic/ Circuits.  Concept of Sequential Logic  Latch and Flip-flops (FFs)  Shift Registers and Application  Counters (Types,
June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.
Digital Logic Design Lecture # 17 University of Tehran.
1 CS 151: Digital Design Chapter 5: Sequential Circuits 5-3: Flip-Flops I.
EKT 124 / 3 DIGITAL ELEKTRONIC 1
Chapter 6 –Selected Design Topics Part 2 – Propagation Delay and Timing Logic and Computer Design Fundamentals.
G. Alonso, D. Kossmann Systems Group
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
A Look at Chapter 4: Circuit Characterization and Performance Estimation Knowing the source of delays in CMOS gates and being able to estimate them efficiently.
Synchronous Digital Design Methodology and Guidelines
Practically Realizing Random Access Scan By Anand Mudlapur ECE Dept. Auburn University.
Enhanced Dual-Transition Probabilistic Power Estimation with Selective Supergate Analysis Fei Huand Vishwani D. Agrawal Department of ECE, Auburn University,
ECE C03 Lecture 61 Lecture 6 Delays and Timing in Multilevel Logic Synthesis Prith Banerjee ECE C03 Advanced Digital Design Spring 1998.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
A Probabilistic Method to Determine the Minimum Leakage Vector for Combinational Designs Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri Department of.
Contemporary Logic Design Sequential Logic © R.H. Katz Transparency No Chapter #6: Sequential Logic Design Sequential Switching Networks.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Digital Integrated Circuits for Communication
ECE 331 – Digital System Design Power Dissipation and Additional Design Constraints (Lecture #14) The slides included herein were taken from the materials.
ECE 331 – Digital System Design Power Dissipation and Propagation Delay.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 11 – Design Concepts.
June 10, Functionally Linear Decomposition and Synthesis of Logic Circuits for FPGAs Tomasz S. Czajkowski and Stephen D. Brown University of Toronto.
Power Reduction for FPGA using Multiple Vdd/Vth
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
A comparison between different logic synthesis techniques from the digital switching noise viewpoint G. Boselli, V. Ciriani, V. Liberali G. Trucco Dept.
XOR-XNOR gates are investigated in this article, Design Methodologies for High-Performance Noise- Tolerant XOR–XNOR Circuits with Power, Area and Time.
Logic Synthesis For Low Power CMOS Digital Design.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Chapter 07 Electronic Analysis of CMOS Logic Gates
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
EEE2243 Digital System Design Chapter 7: Advanced Design Considerations by Muhazam Mustapha, extracted from Intel Training Slides, April 2012.
4. Combinational Logic Networks Layout Design Methods 4. 2
Introduction to Clock Tree Synthesis
1/8/ L16 Timing & Concurrency III Copyright Joanne DeGroat, ECE, OSU1 Timing & Concurrency III Delay Model foundations for simulation and.
Sequential Logic Computer Organization II 1 © McQuain A clock is a free-running signal with a cycle time. A clock may be either high or.
Solid-State Devices & Circuits
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Static Timing Analysis
Synchronous Sequential Circuits by Dr. Amin Danial Asham.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Fast Synthesis of Clock Gating from Existing Logic Aaron P. Hurst Univ. of California, Berkeley Portions In Collaboration with… Artur Quiring and Andreas.
Xiao Patrick Dong Supervisor: Guy Lemieux. Goal: Reduce critical path  shorter period Decrease dynamic power 2.
Overview Part 1 - Storage Elements and Sequential Circuit Analysis
Clocks A clock is a free-running signal with a cycle time.
Overview Part 1 – The Design Space
Sequential Circuits: Flip-Flops
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Timing Analysis 11/21/2018.
触发器 Flip-Flops 刘鹏 浙江大学信息与电子工程学院 March 27, 2018
FPGA Glitch Power Analysis and Reduction
Hazard-free Karnaugh Map Minimisation
On the Improvement of Statistical Timing Analysis
ECE 352 Digital System Fundamentals
Hazard-free Karnaugh Map Minimisation
Measuring the Gap between FPGAs and ASICs
Power Estimation Dr. Elwin Chandra Monie.
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Presentation transcript:

March 28, Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D. Brown

March 28, Outline  Motivation  Power Model  Glitch Reduction Algorithm  Results  Conclusion

March 28, Motivation  Glitches: Undesirable logic transitions that occur due to delay imbalance in the logic circuit Waste power and do not provide any useful functionality Can increase the average toggle rate of a net by as much as a factor of 2  Glitches can be filtered out by strategically inserting negative edge triggered FFs

March 28, Glitches in FPGAs  Due to unequal arrival time of signals at the inputs of LUTs  Glitches can be propagated through LUTs 4LUT Generated Propagated

March 28, Reducing Glitches  Insert a negative edge triggered FF after a LUT that produces or propagates glitches 4LUT Generated clock No glitches

March 28, Alternatives  Gated D-latch Implement a gated D-latch in a LUT Input signal is transparent during the latter half of the clock period  Gated LUT Gate the output of a LUT with the clock input using an AND or an OR gate Similar effect as gated D-latch Can generate glitches too  When implemented Gated D-latch consumes 50% more power than a FF and double that of a gated LUT Neither alternative is very effective

March 28, Background on Dynamic Power  Average Net Dynamic Power Dissipation P avg is average power V is supply voltage f clock is the clock frequency s i is the average per cycle toggle rate of a net C i is the capacitance of a net

March 28, Power Model  Goal To be able to compute the change in dynamic power dissipation in the logic elements affected by a negative edge triggered FF insertion  Power dissipated by a LUT and a FF  Toggle Rate of logic signals (s i )  Net capacitance (C i )

March 28, LUT Power  The LUT itself dissipates an non- trivial amount of power when its inputs toggle  We look at how the power dissipated by a LUT relates to the frequency of its output transitions

March 28, LUT Power Model

March 28, FF Power  How much power would it cost to insert a FF into a circuit?  What about the power cost of alternatives to a FFs? Gated LUT Gated D-latch

March 28, Clocked Element Power Comparison

March 28, Wire Properties NameDescriptionNotation Static Probability Probability that a wire assumes the logic value 1 in any given clock cycle. P[y] Transition Probability The average number of state transitions, excluding glitches. P t (y) Low to High Transition Probability Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. P[y’=1 | y=0] High to Low Transition Probability Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. P[y’=0 | y=1] Transition Density The average number of logic value transitions per cycle. Includes glitches. D(y) Average Number of Glitches per cycle The average number of useless transitions per clock cycle D(y)-P t (y)

March 28, Examples of Wires P[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) D(y) – P t (y) ½11110 ½½≈0.4 ½0 1/8¼ 1¼0 ¼ 1½¼ Clock A B C D

March 28, Example 1 x1x1 x2x2 y NameP[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) x1x1 ½½½½½ x2x2 ½½½½½ Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 ))

March 28, Static Probability  Let y = f(x 1,x 2 )=x 1 ∙x 2

March 28, Probability of a specific Transition  Compute the probability of a specific transition by using the static probability, 1 → 0 and 0 → 1 transition probability of each wire

March 28, Transition Probability Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 ))

March 28, Transition Density Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 ))

March 28, →1 Transition Probability Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 ))

March 28, →0 Transition Probability Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 ))

March 28, Properties of wire y in Example 1 NameP[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) y¼3/8¼¾½ x1x1 x2x2 y

March 28, Example 2 NameP[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) x3x3 ½½½½½ y¼3/8¼¾½ x1x1 x2x2 y x3x3 z 3 1 4

March 28, Computing Properties of wire z  Same computations as in Example 1.  Increase D(z) to account for glitches that occur on wire y (D glitch (z)). Do so only when x 3 remains at constant 1 for the duration of the clock cycle.

March 28, Minimum Pulse Width  When using the table to compute # of transition on a wire given initial and final state of LUT inputs we can compute intermediate transitions and their duration  Some intermediate pulses will be too short to cause a full logic change at the logic output  This parameter depends on the target device used  We remove those pulses from computation Any pulse with duration less than.25ns is removed

March 28, Estimate Error

March 28, Particular Example: mux64_16bit

March 28, Particular Example: des_perf_opt

March 28, Particular Example: cf_fir_24_8_8

March 28, Particular Example: huffman

March 28, Net Capacitance  We need to be able to estimate net capacitance to figure out the difference in dynamic power dissipation due to a change in the transition density of a net  Relate net capacitance (unavailable directly) to net delay (available through timing report) Distinguish between nets of different fanout

March 28, Fanout 1 Net Capacitance

March 28, Fanout 2 Net Capacitance

March 28, Fanout 3 Net Capacitance

March 28, Fanout 4 Net Capacitance

March 28, Higher Fanout Net Capacitance  In our benchmark set fewer than 5% of the nets had fanout greater than 4 Clock net is excluded from calculation  Approximate capacitance of net with fanout n>4 as:  Not exact, but supports the fact that glitches on nets with high fanout are bad Average estimate error of +22%

March 28, Algorithm 1. Scan all nets in a logic circuit to determine if negative edge FF insertion can be applied 2. Analyze the resulting set of nets to determine the benefit of applying the optimization to each net (determined by the cost function) 3. Apply the optimization to a net on which the most power could be saved 4. Repeat until no beneficial choices are found

March 28,  Compute change in power (∆P) + cost of adding a FF - power saved on the modified net - power saved on nets and LUTs in the transitive fanout of the added FF  Compute the change in the minimum clock period (∆T) Specify ∆T allowed (∆T a )  where u(x) is the step function  Accept change when ∆C < 0 Cost Function

March 28, Example LUT Some logic network LUT FF

March 28, Example: Inserted FF LUT Some logic network LUT FF Neg FF

March 28, Example: Compute change in the # of glitches LUT Some logic network LUT FF Neg FF

March 28, Example: Compute change in the # of glitches LUT Some logic network LUT FF Neg FF

March 28, Example: Compute change in LUT power dissipation LUT Some logic network LUT FF Neg FF

March 28, Experimental Results  8 benchmark circuits taken from QUIP package  Synthesize, place, route and analyze timing of a circuit using Quartus II 5.1  Apply algorithm to reduce glitches in a circuit Aim to decrease the minimum clock period by no more than 5%  Perform timing analysis once the circuit has been modified  Use ModelSIM-Altera 6.0c for simulation Simulate a circuit both pre- and post- modification using the same clock frequency  Use PowerPlay Power analyzer to estimate the average dynamic power dissipation of each circuit

March 28, Experimental Results Circuit name Simulation Clock Frequency (MHz) Minimum Clock PeriodDynamic Power Dissipation Initial (ns) Final (ns) Change (%) Initial (mW) Final (mW) Change (%) Barrel64* mux64_16bit fip_cordic_rca oc_des_perf_opt oc_video_compression_ systems_huffman_enc cf_fir_24_8_ aes128_fast rsacypher Average

March 28, Observations (1)  oc_des_perf_opt Large number of XOR gates present Removing glitches from one node removes a lot of glitches on the nodes in its transitive fanout (up to the next FF)  mux64_16bit The cost function determined that no net was a good candidate for optimization Very few glitches were present in the circuit and the power they dissipate was not large enough to warrant the insertion of FFs

March 28, Observations (2)  cf_fir_24_8_8 Overestimated toggle rate caused the algorithm to apply negative edge triggered FF insertion too excessively Need to include spatial correlation in the toggle rate model  aes128_fast Toggle rate is 50% higher than in oc_des_perf_opt Most nets use local LAB connections, causing little power dissipation Insertion of 173 FFs only achieved 1% power reduction  Saved mW in routing alone, because toggle rate on all affected wires was reduced by 50-70%  Added 24.6 mW due to FF insertion  Added 1.86 mW to the power dissipated by the clock network, because new LABs were connected to the clock network  Net win of 8.68 mW

March 28, Conclusion  Negative edge triggered FF insertion can work well to reduce glitches in a circuit  Unlike retiming, our approach only needs to ensure that exactly one negative edge triggered FF is on any given combinational path Retiming may require the translation of more than a single FF to be valid

March 28, Future Work  Better toggle rate prediction algorithm that includes spatial correlation  Having FFs that can be negative edge triggered without using an additional LAB clock line would make the cost of this optimization lower Silicon area cost vs. frequency of use trade-off

March 28, Acknowledgement  We’d like to express our gratitude to Altera for funding this research  We’d like to thank Altera Toronto in particular for dedicating some of their time to answer our questions and provide insight throughout the course of this work

March 28, Questions?