Download presentation
Presentation is loading. Please wait.
Published byDaniela Strickland Modified over 9 years ago
1
A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December 2006 Based on : A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Sean Kao, Radu Zlatanovici, Borivoje Nikolić University of California, Berkeley
2
2 What Is an Optimal Adder? Optimal adder: Minimum delay for given energy Minimum energy for given delay 64-bit Adders on IEEE Xplore 1995-2005 [1]
3
3 This Work Multi-issue 64-bit microprocessor environment: Optimize a set of representative 64-bit adders in the energy – delay space Analyze the design tradeoffs Implement the optimal adder in 1.0V 90nm GP CMOS
4
4 Outline Energy – delay optimization Design tradeoffs for 64-bit adders Test chip implementation Measured results Summary
5
5 Energy – Delay Optimization Delay Energy Domino CLA Adder Goal: obtain the energy – delay optimal adder CAD tool: optimize custom digital circuits in the energy – delay space [3] Static CLA Adder [1]
6
6 Circuit Optimization Framework Optimizer (Matlab) Delay, Energy Static timer (C++) ModelsNetlistOptimization Goal Optimal Design Variables Design Variables Static timer (C++) Optimization Core [1]
7
7 Adder Optimization Setup Minimize DELAY subject to Maximum ENERGY [1]
8
8 CLA: Full Tree Comparison 6 stages Moderate branching 3 stages Larger branching Radix- 4 closer to optimum number of stages Radix-2 Radix-4 [1]
9
9 CLA vs. Ling Conventional CLA Higher stack in first stage Simple sum precompute Ling CLA Lower stack in first stage Complex sum precompute Higher speed [1] [2]
10
10 Full vs. Sparse Comparison FULL SP2Ling CLA [1]
11
11 Full vs. Sparse Comparison FULL SP2Ling CLA SP2 R2+ R4+ [1]
12
12 Full vs. Sparse Comparison Sparseness benefits adders with large carry trees FULL SP4Ling CLA SP2SP4 R2++ R4+– [1]
13
13 Optimal Adder Ling’s equations Radix-4 sparse-2 Domino carry tree Static sum-precompute Delay of fastest adder: 7.3 FO4 [1]
14
14 Radix-4 Sparse-2 Carry Tree Computes every other Ling pseudo-carry: H0, H2, H4 … Each output selects two sums [1]
15
15 Adder Core Block Diagram Critical paths implemented in clock-delayed domino Non-critical paths implemented in static At-speed BIST [1]
16
16 Timing Diagram 20 ps margin on all edges; Adjustable hard edges Delay spread places precharge in critical path [1]
17
17 Layout Floorplan Bitslice height: 24 metal tracks Aligned clock lines Sum precompute occupies space freed by sparse carry tree [1]
18
18 90 nm Test Chip 1.7 mm 1.6 mm 90 nm GP 7M 1P SVT transistors V DD = 1V 8 adder cores + test circuitry Core 1: this work Cores 2-8: Supply noise measurements and supply grid experiments [4]. Adder core size: 417 x 75 m 2 [1]
19
19 [1]
20
20 Chip Packaging Chip-on-board: Bond wires 60% shorter Cleaner supply 10 ps shorter delays Advance ProgramDigest [1]
21
21 Measured Results: Delay CHIP-ON-BOARD: V DD = 1 V –Average: 240 ps –Fastest: 226 ps V DD = 1.3 V –Average: 180 ps D avg = 7.5 FO4 [1]
22
22 Measured Results: Power V DD = 1V:P max = 260 mW V DD = 1.3V:P max = 606 mW Adder core Clk gen BIST Leakage [1]
23
23 Conclusion 90 nm GP 7M 1P SVT transistors V DD = 1V 8 adder cores + test circuitry Adder core size: 417 x 75 m 2
24
24 64-bit Adders on IEEE Xplore 1995-2005 Summary Ling radix-4 sparse-2 domino carry tree 90nm GP CMOS: 240ps, 260mW @1V [1]
25
25 References [1]. S. Kao, R. Zlatanovici, B. Nikolic, “A 240ps 64-bit Carry-Lookahead Adder in 90nm CMOS,” ISSCC2006, Feb.2006. [2]. H. Ling, “High Speed Binary Adder,” IBM J. R&D, vol. 25, no. 3, pp.156-166, May, 1981. [3]. R. Zlatanovici, B. Nikolic, “Power – Performance Optimization for Custom Digital Circuits,” Proc. PATMOS, pp. 404-414, Sept., 2005. [4] V. Abramzon, E. Alon, M. Horowitz Stanford University
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.