Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and.

Similar presentations


Presentation on theme: "Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and."— Presentation transcript:

1 Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor Advisor: Prof. Marios Papaefthymiou September 28 th, 2005

2 2 Outline Power dissipation in conventional CMOS Standard approaches to reduce power dissipation Introduction to energy recovery circuits Background - Boost Logic operation, reported simulation results pros and cons from an energy standpoint Description of 3 new circuits designed Comparison of different circuits energy dissipation power supply variation Conclusion and future work

3 3 Power dissipation in conventional CMOS designs Streaming applications small amount of logic large number of Buffers Long wires – Large capacitance C Driving this C wastes energy  Throughput-limited datapaths Strict requirement on throughput Longer latencies can be tolerated (DSP applications) [ATMEL76C120 78MHz]

4 4 Conventional approaches to reducing power: voltage scaling and pipelining Voltage scaling can result in significant energy gains Lower dissipation Lower leakage Limitations: Limited by threshold voltages V th scaling limited by manufacturing processes Overhead of flip-flops Increasing the delay, limited scalability Unpipelined 2-stage pipeline 6-stage pipeline Voltage (V) Delay (ns)

5 5 Reduced voltage drivers and voltage converters Limited by V TH Delay in level conversion Requirements for efficient operation Energy efficient level conversion No throughput impact due to level conversion delay Point of diminishing returns! Voltage converter vdd vdd L vddLow swing High swing out

6 6 Energy dissipation in CMOS DC source input E=C.V.V=CV 2 E=C.V.0=0 (1/2)CV 2 no energy recovered back into the supply point of diminishing returns in scaling V dd Reducing V decreases E diss, but eventually will make the devices go into sub-threshold region Delay increases exponentially as V is decreased

7 7 Energy Recovery Circuits Switching energetics different from vanilla CMOS DC supply replaced by an AC supply Energy required to swing the voltage on a node is much less than the energy stored Use of inductors to supply and recover charge Resonate current through inductors from power clock to load capacitance Energy recovery gates can be used as timing elements Latency overhead does not translate to a throughput penalty

8 8 Energy recovery charging/discharging Source t V t V t V t V N  easy to generate as T  E diss 

9 9 Energy Recovery: A Brief History Reversible computing proposed as a method of achieving asymptotically zero energy computation Early circuit design (Inverter chains) Maksimovic, Oklobdzija (1-clock / 2-phase, 1.2 µm process, 40MHz) Dickinson and Denker (4 phase, 0.9 µm process, 250MHz) Athas et. al (Graphics Processor, 0.5 µm process, 15MHz) Kim et. al (True Single Phase Logic ) 8-bit multiplier @ 140MHz (0.5 µm process) Fundamental requirement of gradual power clock transitions Use of diodes to recover energy (Delay and Energy inefficient) Tracking power clock at it fastest transition only Pfet evaluation trees

10 10 Background Type 1: Boost logic [Sathe: ISLPED ’05] hybrid energy recovery family with high gate overdrive and voltage scaling no diodes, data-independent capacitance acts as a timing element; no throughput penalty less sensitive to power supply variation compared to vanilla CMOS differential outputs for data-independent capacitance seen by power clock 65% energy saving compared to conventional voltage scaled pipelined CMOS design  high energy dissipation at low frequencies (50MHz- 200MHz) 0.13  m process Sim post layout: upto 1.6GHz Chip: 750MHz- 1.3GHz

11 11 Structure and operation of Boost Logic Boost stage evaluation compl. eval Reduced potential evaluation Energy recovery sense- amplification

12 12 Energy Dissipation in Type 1 (Boost)  Increasing crowbar at lower frequencies  Energy dissipation keeps on increasing How do we decrease this? V dd 0 1 always a fight between weak pull-up and pull down! Sim. With 32- bit RC adder 0.13  m

13 13 Circuit Configurations Investigated Type 2: static CMOS in the evaluation stacks Type 3: use of static CMOS stack and an inverter to create differential outputs with lesser area overhead Type 4: A new domino CMOS logic in the evaluation stage and a modified energy recovery sense amplifier

14 14 Type 2 circuit: CMOS stacks in evaluation tree Complementary CMOS stacks differential outputs driven to full rails (V dd ’ and V ss ’ ) reduces crowbar significantly Sim. With 32- bit RC adder with clock generator 0.13  m

15 15 Type 2: Energy Dissipation  significant area overhead (6N+10) compared to Type 1(2N+10)  limited fan-in  slow operation of PMOS

16 16 Type 3: CMOS stack with complementary inverter Use inverter to create output differential lesser energy diss. at low frequencies 3N+10 area overhead Sim. With 32- bit RC adder with clock generator 0.13  m

17 17 Type 3: Limitations due to sub-threshold operation of inverter due to limited drive, the inverter operates in sub-threshold region shrinks with increasing frequency, fanout reliable operation (wrt. ∆V) only till ~ 50MHz how can we increase the inverter drive? at 10MHzat 100MHz Sim. With 32- bit RC adder with clock generator 0.13  m

18 18 Type 3: with low-threshold devices in the inverter stack Improvement obtained for lower frequencies  Sensitive to  coupling noise  process variation  Operation not robust for f>100MHz

19 19 A New Structure Need to create a good differential voltage with minimum area overhead and energy dissipation Need to modify the “Boost” sense amplifier stage to make the output voltage differential independent of fan-out loading Need to have good tolerance for power supply variations

20 20 Type 4: Domino CMOS with transmission gates evaluation sense amplification precharge n1,n2 transmission gate proxy output lines (low C) enables low-swing pulldown mask high C lines equalization

21 21 Operation: Evaluation/hold Phase  dual N-tree evaluates and pulls down one proxy output line transmission gates transfer charge to low lines No crowbar because headers are switched off Transistor M7 in the sense amplifier stage keeps equalized at approx. V dd /2 weak 0 weak 1

22 22 Operation: Precharge/amplify phase  outputs pulled to rails in a recovery fashion by the cross coupled inverters transmission gates isolate evaluate circuit from sense amp transfer charge to Transistor M7 in the sense amplifier stage is cut-off n1 and n2 pre-charge high to V dd ’

23 23 Type 4: Simulation Results evaluate/ hold precharge/ amplify evaluate/ hold Sim. With 32- bit RC adder with clock generator 0.13  m

24 24 Type4: Energy Dissipation 32-bit adder simulations with clock generator Shows substantial energy savings wrt Type 1 (Boost) Voltage differential independent of fan-out loading Works between 10MHz-200MHz Sim. With 32- bit RC adder with clock generator 0.13  m

25 25 Energy Comparison of Different Topologies Energy savings in Type 4 coming from: low-Cap. proxy output lines small charge-up of internal nodes isolation of eval. stage from sense amplifier elimination of crowbar 25%-65% reduction in energy over operating range of frequencies with small area overhead Type 4 Type 1 Type 2 Type 3 Sim. With 32- bit RC adder with clock generator 0.13  m

26 26 Robustness to Variations in Power Supply Delay variation is less than 5% for a 10% variation in power supply Type 4 circuit seen to be relatively insensitive to power supply variation compared to CMOS

27 27 Conclusions and Future Work Conclusions: Design of 3 structures to improve energy recovery efficiency at low frequencies without use of diodes, multiple clock domains A new domino style topology resulting in substantial energy savings with minimal area overhead Relatively insensitive to power supply variations Future work: Improve resonance of the Type 4 circuit Redesign on the clock generator to investigate potential power savings Performance of the circuit post-layout and comparisons Continuing investigations into other kinds of logic structures


Download ppt "Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and."

Similar presentations


Ads by Google