Download presentation
Presentation is loading. Please wait.
Published byDora Johns Modified over 5 years ago
1
A New Design Approach for High-Throughput Arithmetic Circuits for Single-Flux-Quantum Microprocessors Masamitsu Tanaka, Nagoya Univ., JSPS Co-workers: Y. Yamanashi2, Y. Kamiya1, A. Akimoto2, N. Irie1, H. Park2, A. Fujimaki1, N. Yoshikawa2, H. Terai3, S. Yorozu4 1Nagoya Univ., 2Yokohama National Univ., 3 NICT, 4ISTEC-SRL Acknowledgment: This work was supported by the NEDO through ISTEC as Collaborative Research and Superconductors Network Device Project.
2
Introduction The single-flux-quantum (SFQ) logic circuits use impulse-shaped voltage pulses as signals. Ultra-high throughput performance is achieved in applications with a unidirectional data flow. 1mm A cross/bar switch demonstrated up to 50 Gbps/ch* * Y. Kameda et al, IEEE Trans. Appl. Supercond., vol. 15, issue 1, pp. 6-10, 2005.
3
Problem in SFQ Circuits with Loop Paths
Microprocessors have very complex interconnects including loops of data in the datapath. The loops spoil the high-throughput nature of SFQ logic. 1mm ALU Typical bit-serial adder CORE1β v6 [see 3EY01] 10,927 JJs, 3.3 mW 4-stage pipeline, 1500 MOPS
4
Purpose of This Study A simple approach: to reduce junctions in the loops. Optimize wiring and physical pin alignments of logic gates. However, the removable junctions are limited. We present a new design approach for high-throughput computation in SFQ complex arithmetic circuits.
5
Conventional Implementation
In the conventional implementation of a sequential logic circuit, the feedback loop is required. The state is stored in latches with destructive readout. decoder inputs output state feedback loop to update the state latches (destructive readout gates)
6
Our Approach Based on State Transitions
In our design approach, we use nondestructive readout gates such as NDROs to store the state. Calculations are once decoded into state transitions. The circuit has no loops, and can be fully pipelined. pre-decoder post- decoder decoded transitions inputs output state SFQ NDROs (nondestructive readout )
7
Implementation of Bit-Serial Adder
We select the carry signal as the internal state. The state transition is killing (k), propagating (p), or generating (g) the carry according to inputs X and Y. NDRO is controlled by the condition k and g. (Nondestructive readout operation corresponds to p.) store the carry as internal state X Y Carry to the Next Bit killed (k) 1 propagated (p) generated (g)
8
Benefits of New Approach
We achieve the high throughput performance. The loop path of the carry signal is eliminated. It is easily possible to control the carry externally by inserting confluence buffers just before the NDRO, WITHOUT the throughput decreased. This scheme is applicable to bit-slice adders. Calculate p, g as well as carry lookahead adders*: pi:j = pi:k-1 • pk:j, gi:j = gi:k-1 + pk:j • gk:j. Finally, k is obtained using p and g (k = p NOR g). * P. Bunyk et al, IEEE Trans. Appl. Supercond., vol. 9, no. 2, pp , 1999.
9
Demonstration of Bit-Serial Adder
designed adder shift registers ladder oscillator Fabricated using the NEC 2.5 kA/cm2 Nb standard process II
10
Dc bias current [%] (normalized by the designed value)
Experimental Results We confirmed correct operations up to 36 GHz. The throughput will not decrease even if we add the circuitry to control the carry externally. Limit of conventional approach is ~20GHz Dc bias current [%] (normalized by the designed value) Operating region Max. 36 GHz Frequency [GHz]
11
Summary We have proposed a new design approach for high-throughput SFQ arithmetic circuits without loops. We use NDROs to store the internal state. By translating the calculations into transitions of the state, we can eliminate the loops and achieve high throughput. We have implemented a bit-serial adder using the new approach, and demonstrated it up to 36 GHz. The high throughput is also expected even if we add some functions such as controller of carry propagation to it.
12
Thank You!
13
Implementation of 4-Bit-Slice Adder
decode inputs (calculate p, g, k) output update the state
14
Implementation of 1-bit ALU with a Buffer
Cin Din Cout B Dout 80µm 80µm Op. Din Cin State Trans. AND X reset 1 OR set XOR invert ADD (invert x2) test result of nondestructive resettable/settable TFF
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.