Presentation is loading. Please wait.

Presentation is loading. Please wait.

 C. H. Ziesler et al., 2003 Energy Recovering Computers 1 Advanced Computer Architecture Laboratory University of Michigan Conrad H. Ziesler 1 Joohee.

Similar presentations


Presentation on theme: " C. H. Ziesler et al., 2003 Energy Recovering Computers 1 Advanced Computer Architecture Laboratory University of Michigan Conrad H. Ziesler 1 Joohee."— Presentation transcript:

1  C. H. Ziesler et al., 2003 Energy Recovering Computers 1 Advanced Computer Architecture Laboratory University of Michigan Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 1 2 Marios C. Papaefthymiou 1 2 T. J. Watson Research Center IBM Research Division

2  C. H. Ziesler et al., 2003 Conventional CMOS Operation _+_+ Vdd Vss R C R time Constant supply voltage One-way street to ground High voltage drops across conducting devices Energy dissipation grows with CV 2 Vdd IRIR VRVR PRPR VCVC ERER

3  C. H. Ziesler et al., 2003 Charge Recovery Vpc R C time Time-varying voltage supply Vpc Charge recovered from load C Charge transfer through resistor is spread over available time Energy dissipation grows with (RC/T) CV 2 ~ Vpc IRIR VRVR PRPR ERER Vdd VCVC

4  C. H. Ziesler et al., 2003 Quick Glance at History ● Physics (1970s) – Logical reversibility of computation – Connection to thermodynamics (“adiabatic” computing) – No absolute minimum to energy dissipation, if computing is arbitrarily slow. (Does not have to be slow, however.) ● Engineering (1990s) – Logic circuitry – Energy recycling circuitry – VLSI prototyping

5  C. H. Ziesler et al., 2003 The Design Space Complex web of intertwined issues 1. Which capacitance to recover from? 2. How to store/reuse recovered energy? 3. What circuits to do the recovery? 1. 2. 3.

6  C. H. Ziesler et al., 2003 The case for reversible computation P. Solomon and D. Frank – SLPE'94 Asymptotically zero-energy split-level charge recovery logic S.G. Younis and T. F. Knight, Jr. – SPLE’94 Clock-powered CMOS: A hybrid adiabatic logic style for energy-efficient computing N. Tzartzanis and W. C. Athas – ARVLSI'99 And many, many others.... Few real working chips, however. Sample Design Points from’90s

7  C. H. Ziesler et al., 2003 Our Contributions Multiplier chip (2001) Custom dynamic circuitry with fine-grain recovery from outputs of individual gates Discrete wavelet transform chip (2003) Standard-cell ASIC design with recovery of clock power Guiding principles – Simplicity – Minimal control and hardware overhead Key attributes – Single-phase sinusoidal waveform for power and synchronization – Integrated power-clock generators with inductive elements – Relatively high operating speeds (200MHz+ in silicon) – Higher energy efficiency than conventional counterparts

8  C. H. Ziesler et al., 2003 Minimalist approach – Simple tools: magic and spice – Low-cost standard CMOS process: HP 0.5  m, 40-pin DIP package, through MOSIS. Operational chip demonstrates practicality of energy-recovering circuit design – Non-trivial size (8-bit operands, on-chip clock, self-test) – High throughput – Low energy dissipation Multiplier Chip Suhwan Kim (while still at UM) and Conrad Ziesler received First Prize in VLSI Design Contest, DAC 2001.

9  C. H. Ziesler et al., 2003 Working 1 st silicon Simple yet effective experimental setup. Correct operation verified @ 130 MHz. Energy efficient operation validated by measurements. Low-Energy High-Speed Operation

10  C. H. Ziesler et al., 2003 Correct Function 130MHz signature output

11  C. H. Ziesler et al., 2003 Energy Comparison 2.7V 1.9V 2.2V 1.9V 1.6V 2.9V 50 100200 Frequency (MHz) 100 200 300 400 500 Dissipation per Cycle (pJ) 0 140 Energy recovery 4 stage static CMOS 2.3V 3.0V 2.0V 8 stage static CMOS 2 stage static CMOS 4x

12  C. H. Ziesler et al., 2003 Simulations vs. Measurements 0 20 40 60 80 4050607090100110120130 Frequency (MHz) Discrepancy (%) Good correspondence up to 100MHz Simulations gave consistently higher dissipation

13  C. H. Ziesler et al., 2003 Test Chip Overview Two multipliers with self-test per chip (minimum size die) Integrated power-clock generator Resonant LC oscillator

14  C. H. Ziesler et al., 2003 Input BILBO (self-test) Product array Multiplicand buffers Result summation Result buffers Self-test Control Output BILBO (self-test) Multiplier and Self-Test 9,048 devices in multiplier array, 2,806 devices in self-test circuitry Implemented entirely in energy-recovering dynamic logic family.

15  C. H. Ziesler et al., 2003 Energy-Recovering Dynamic Logic Sense amplifier Precharge diodes Current switches Evaluation tree Current tail Power clock Vss bias at bt afbf af ot NMOS NAND gate Power clock Single-phase sinusoidal power-clock Minimum-size low-swing evaluation tee Built-in state element Dual-rail noise-tolerant design

16  C. H. Ziesler et al., 2003 Cascade Operation NMOS and PMOS gates are cascaded like domino logic. Animation shows tokens propagating through buffer chain.

17  C. H. Ziesler et al., 2003 Zero-voltage switching LC- resonant clock generation External/bondwire inductor L Resistive/capacitive adiabatic load Compact: 170 x 115  m Single-Phase Power-Clock Generator Power Clock Pulse Gen Ring Osc Gate Drive Power Switches: S1, S2 25 tr. 19 tr. 10 tr. Ring Osc. Pulse Gen. Gate Drive _+_+ _+_+ load S2 S1 PC L Vdd Vss Vbp Vbn

18  C. H. Ziesler et al., 2003 Switch Timings Inductor current builds linearly when switches are on. Peak switch current less than peak inductor current. Switch S1 turned on at positive voltage peak. Switch S2 turned on at negative voltage peak. Fixed “on-window” controlled by pulse generator. Inductor current Output voltage

19  C. H. Ziesler et al., 2003 Single-phase sinusoidal waveform @140MHz ~60pF load, ~10nH external inductor One DC supply (Vdd, Vss), two DC biases (NMOS, PMOS) Power-Clock Waveform Vdd Vss PMOS Bias NMOS Bias Power-Clock

20  C. H. Ziesler et al., 2003 The Automation Question

21  C. H. Ziesler et al., 2003 Resonant Clock ASIC Compatible with ASIC flow Synthesized by Conrad Ziesler and Joohee Kim using in-house standard-cells library and commercial tools Energy recovering clock tree and SRAM word/bit lines Low-cost bulk CMOS process: TSMC 0.25  m, 108-pin PGA package, through MOSIS High frequency (300MHz) Low voltage (1.0-1.5V)

22  C. H. Ziesler et al., 2003 Chip Microphotograph

23  C. H. Ziesler et al., 2003 System Overview

24  C. H. Ziesler et al., 2003 The Energy Recovering Flip-Flop Clock signal: Single-phase resonant sinusoid Probe activates state element only if next state differs from present state. Low voltage operation at high speeds Delay similar to conventional flip-flops Fully compatible with standard-cell ASIC flow 16 transistors 84  m 2 probestate element

25  C. H. Ziesler et al., 2003 Flip-Flop Operation

26  C. H. Ziesler et al., 2003 Flip-Flop Power Characterization Order of magnitude difference between idle (D, Q constant) and active (D, Q changing) dissipation. frequency (MHz) Flip-Flip Energy per Cycle 1000 100 10 1 energy (fJ) 200 300 500

27  C. H. Ziesler et al., 2003 Resonant Clock Generator Resonate entire clock capacitance with small inductor Pump resonant system with NMOS switch at appropriate times NMOS switch only conducts incremental losses whenever “on” NMOS Switch Control Pre-driver Driver

28  C. H. Ziesler et al., 2003 Clock Generator Operation

29  C. H. Ziesler et al., 2003 Recovering vs. Conventional Hardware Synthesized dual-mode ASIC – conventional – energy recovery Dual-mode flip-flop cell Conventional clock tree with conventional flip-flop Resonant clock tree with energy-recovering flip-flop Direct comparison of dissipation at target throughput using identical hardware structures

30  C. H. Ziesler et al., 2003 ASIC Statistics Discrete wavelet transform (DWT) 3,897 gates, 413 ffs 15,571 transistors 400  m x 900  m 13.6 pF, 21 nH 300 MHz, 1.5V 0.25  m logic process Dual-mode DWT Clock generator

31  C. H. Ziesler et al., 2003 Simulation Results Total system dissipation Conventional mode includes clock tree Energy recovery mode includes on-chip clock generator 300MHz, 1.5VEnergy RecoveryConventional Idle6.74pJ 29.72pJ Active 68.47pJ 78.28pJ Per Cycle Energy Dissipation

32  C. H. Ziesler et al., 2003 Lab Setup

33  C. H. Ziesler et al., 2003 Correct Function 115MHz 5nH high-Q external inductor signature output (conventional) signature output (energy recovery)

34  C. H. Ziesler et al., 2003 Self-Resonance at 225MHz No external inductor! Clock resonates using parasitic inductance of package Lossy parasitics limit power savings 200,000 samples @200ps intervals signature output (energy recovery) buffered power clock 8ns/div 4  s/div

35  C. H. Ziesler et al., 2003 Preliminary Power Measurements 115MHz, 1.5VEnergy RecoveryConventional Idle2.6mW 4.3mW Active 8.9mW 11.7mW Total system measurements by monitoring DC supplies – Conventional mode includes clock tree – Energy recovery mode includes on-chip clock generator Not included: I/O supply, test circuitry, ring oscillator Energy recovery data is conservative (obtained when simultaneously driving two DWT cores, one always in idle)

36  C. H. Ziesler et al., 2003 Working silicon that realizes the potential of energy recovery – power savings over conventional CMOS – operating speeds in the 100s of Mhz – low control and hardware overhead – integrated power-clock generators Multiplier chip in 0.5  m CMOS – Fine-grain recovery at the level of individual gate output – True singe-phase energy recovering dynamic logic – 2—4x energy savings over static CMOS in 50—200MHz DWT chip in 0.25  m CMOS – ASIC-compatible design flow – Correct operation all the way up to self-resonance at 225MHz – Direct hardware comparison with dual-mode chip: 60—75% of conventional at 115MHz (measurement) 1.2—4x savings at 300MHz (simulation) Summary

37  C. H. Ziesler et al., 2003 Ongoing Research Continue chip testing –Break 300MHz barrier by bonding chip onto PCB –Test SRAM Experimentally assess EM noise –Clock EM power is centered around the resonant frequency of the sinusoid Scaling –How large a clock network with single-phase sinusoid? –Skew/jitter? Ultra high speed (multiple GHz) –How fast with current clocking scheme? –Recovery circuitry with self-resonating clock network.

38  C. H. Ziesler et al., 2003 Links and Acknowledgments Web Site http://www.eecs.umich.edu/acal/energyrecovery Energy recovery group Suhwan Kim (emeritus) Conrad Ziesler Joohee Kim Juang-Ying Chueh Visvesh Sathe Funded in part by U.S. Army Research Office and DARPA DAAG-55-97-1-0250, DAAD-19-99-1-0340, N66001-02-C-8059


Download ppt " C. H. Ziesler et al., 2003 Energy Recovering Computers 1 Advanced Computer Architecture Laboratory University of Michigan Conrad H. Ziesler 1 Joohee."

Similar presentations


Ads by Google