Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel.

Similar presentations


Presentation on theme: "High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel."— Presentation transcript:

1 High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California

2 USC Asynchronous CAD/VLSI Group2 Key to High-Speed Async Design Completion detection demands 2-D pipelining Latches Bundle-data pipeline Datapath Control logic 2-D pipeline Pipeline stages Async. channels

3 USC Asynchronous CAD/VLSI Group3 Asynchronous Channels Ack 1-of-N 1 2 3 4 12 Sender Receiver 1-of-N data Acknowledge 1-of-N channel Sender Receiver 1-of-N data Acknowledge 1-of-N single-track channel Control Data Data stable Req 12 Ack GasP bundle-data channel Sender Receiver Single-rail data Latches Control channel

4 USC Asynchronous CAD/VLSI Group4 GasP (Sutherland et al.’01) B A L R L Latches R GasP Pulse to data latches Datapath Staticizer Self-resetting NAND fw = 4  = 6 Includes latch setup time and delay Bundled-data pipeline using single-track control

5 USC Asynchronous CAD/VLSI Group5 Precharge Half-Buffer (Lines’98) NMOS transistor stack Pc Eval Schematic for each output rail Rx L Sx R Eval Pc Le R L L LCD RCD Re fw = 2  = 14+ Precharge Half-Buffer Template C 2-D pipeline using 1-of-N delay-insensitive channels and QDI cells

6 USC Asynchronous CAD/VLSI Group6 Single-Track Asynchronous Pulsed Logic (Nyström’01) R L Re RCD Re R4 L R STAPL template Pulse generator Reset S Pulse generator xv L0 1 L0 n R0 S0S1 R1 re R0 R1 NMOS transistor stack L1 1 L1 n Schematic for dual-rail output xv R4 L0 1 L1 1 … L0 n L1 n xv STAPL uses pulse generators to control drivers activation timing fw = 2  = 10

7 USC Asynchronous CAD/VLSI Group7 Single-Track Full-Buffer (Ferretti’02) R L S B RCD B SCD A Reset L R L0 1 L0 n R0 S0S1 R1 B B B A R0 R1 S0 S1 L0 1 L1 1 … L0 n L1 n NMOS transistor stack L1 1 L1 n C Schematic for dual-rail output Block diagram Timing Diagram LSABRLSABR fw = 2  = 6 Small and fast

8 USC Asynchronous CAD/VLSI Group8 STFB: Tradeoff Speed for Robustness Features of STFB  3x faster than QDI and about half the size  Smaller and faster than STAPL  Smaller forward latency and less timing assumptions than GasP performance GasP robustness QDI (Lines - Caltech) STFB (Ferretti - USC) (Sutherland - Sun) STAPL (Nyström - Caltech)

9 USC Asynchronous CAD/VLSI Group9 Motivation and Goals Develop a methodology to design STFB-based asynchronous circuits using conventional CAD tools  Create a STFB standard cell library  Make the library publicly-available Design and fabricate a demonstration test chip Evaluate the results Ultimate Goal: Full-custom Performance with ASIC Design Times

10 USC Asynchronous CAD/VLSI Group10 Outline STFB standard-cell design Backend design flow Demonstration test chip Conclusions

11 USC Asynchronous CAD/VLSI Group11 STFB channels are point to point (no forked wires) One size per cell in the library is adequate STFB Standard-Cell Design Transistor sizing

12 USC Asynchronous CAD/VLSI Group12 STFB Standard-Cell Design Transistor sizing 2x min. size N-stack strength 1:4-5 drive ratio 2x 8x L Sx Rx B RCD NMOS transistor stack C 2.8 10 Wn A 5 SCD L ≤ 1mm Sx Rx B RCD NMOS transistor stack C 2.8 10 Wn A 5 SCD TSMC 0.25  m, widths in  m and all lengths 0.24  m Up to 1mm long wire

13 USC Asynchronous CAD/VLSI Group13 STFB Standard-Cell Design Balanced response SCD/RCD Data-independent timing assumptions S1 S0 A 2.8 1.2 SCD balanced NAND (2x) TSMC 0.25  m, widths in  m and all lengths 0.24  m R1 R0 1.4 1.2 1.4 B RCD balanced NOR (1x)

14 USC Asynchronous CAD/VLSI Group14 STFB Standard-Cell Design STFB_POUT sub-cell Yields less load on B and faster S reset S R B NR 0.62.8 0.6 10 1.2 1.4/0.6 0.3 TSMC 0.25  m, widths in  m and all lengths 0.24  m staticizer fights charge–sharing fast S reset fights leakage current STFB_POUT sub-cell layout

15 USC Asynchronous CAD/VLSI Group15 STFB Standard-Cell Design Reset transistors 2-input NAND → less load on S TSMC 0.25  m, widths in  m and all lengths 0.24  m Reset transistors, reset inverter and NAND layout (from STFB_XOR2 cell) A1A1 S0 S1 L0 1 L1 1 … A2A2 /Reset 1-of-2 cell 2-input NAND + inverter A S0 /Reset S1 L0 1 L1 1 … Initial idea 3-input NAND S0 S1 L0 1 L1 1 … A1A1 A2A2 /Reset S2 1-of-3 cell two 2-input NAND

16 USC Asynchronous CAD/VLSI Group16 STFB Standard-Cell Design Direct-path current analysis V in M1 M2 V out V DD V DD -Vtp Vtn 0V I peak 0A t t I dp V in I dp Sx A M1 M2 I dp Average direct-path current is similar to inverter I dp V DD V DD -Vtp Vtn 0V I peak1 I peak2 0A t t V A V Sx

17 USC Asynchronous CAD/VLSI Group17 Outline STFB standard cell design Backend design flow Demonstration test chip Conclusions

18 USC Asynchronous CAD/VLSI Group18 Standard-Cell Library Development (Ozdag’04) Cell specifications Layout (Virtuoso) Symbol, Schematic and Functional (Virtuoso, Emacs) Simulation (Verilog, Hspice) Layout Cell Abstract (Envisia) Asynchronous Cell Library Symbol Schematic Functional Abstract Template specifications Standard cell specifications Same tools and flow as synchronous LVS/DRC (Dracula/Diva)

19 USC Asynchronous CAD/VLSI Group19 Asynchronous ASIC Design Flow (Ozdag’04) Symbol Schematic Functional Schematic (Virtuoso) Design specifications Layout Chip Assembly (Virtuoso) Chip Fabrication Place & Route (Silicon Ensemble) Abstract Asynchronous Cell Library LVS/DRC (Dracula/Diva) Simulation (Verilog, Nanosim) Same tools and flow as synchronous

20 USC Asynchronous CAD/VLSI Group20 Cell Layout Example: STFB2_XOR2 Each cell comprises an entire STFB pipeline stage A A0 A1 B0 B1 Reset A0 A1 B0 B1 /Reset S0 S1 RCD SCD BSBS R BSBS R STFB_POUT R0 R1 R1 R0 S1S0 B C a1a1 C b1 a1a1 b0b0 a0a0 b1b1 a0a0 b0b0 S0S1 S0

21 USC Asynchronous CAD/VLSI Group21 Outline STFB standard cell design Backend design flow Demonstration test chip Conclusions

22 USC Asynchronous CAD/VLSI Group22 Prefix Adder a0a0 b0b0 c -1 a1a1 b1b1 a2a2 b2b2 a3a3 b3b3 a4a4 b4b4 a5a5 b5b5 a6a6 b6b6 a7a7 b7b7 s7s7 s6s6 s5s5 s4s4 s3s3 s2s2 s1s1 s0s0 c7c7 3 +  log 2 n  2* n + 1 STFB2_FORK (fork stage) STFB2_BUFFER (buffer stage) STFB2_XOR2 (2-input xor stage) STFB3_AB_KPG and STFB3_AB_KPG2 STFB3_KPG2_KPG and STFB3_KPG2_KPG2 STFB3_KPGC_C and STFB3_KPGC_C2 (Goldovsky’99)

23 USC Asynchronous CAD/VLSI Group23 64-bit Adder Block Silicon Ensemble P&R Schematic (Virtuoso) Place & Route (Silicon Ensemble) Floor plan 129 rows 70% area utilization Plan power M4 and M5 power grid Pins and cell placement Input pins on the left (A 64, B 64 and C) Output pins on the right (S 64 and C) Filler cell Routing

24 USC Asynchronous CAD/VLSI Group24 Input Generator Block Flexible and fast input generation a0…a3 d0…d7 4 levels STFB2_SPLIT 88 4 4 8x8 STFB2_SRST Carry in 9-stage ring 1 64 A B Cin 64x9-stage ring 12x STFB2_SRST Single-rail to single-track converter 1 data address

25 USC Asynchronous CAD/VLSI Group25 Output Sampler Block 65 65x STFB2_BUCKET BB 65x STFB2_SPLIT 65 65x STFB2_BUCKET BB 65x STFB2_SPLIT 65 65x STFB2_BUCKET BB 65x STFB2_SPLIT 65 64 bit sum + Cout 30-stage ring 30-stage ring 30-stage ring 1:101:1001:1000 1000000000 = 1,10,…= 1,100,… = 1,1000,… 001000000000001000000000000100 = 3,13,…= 43,143,… = 843,1843,… Flexible and fast output sampler 1 0 1 0 1 0

26 USC Asynchronous CAD/VLSI Group26 Simulation Results: Loading Nanosim Carry in Sampler: 10x4x4 = 160 3x B 64 3x A 64 Go!

27 USC Asynchronous CAD/VLSI Group27 Simulation Results: Running Nanosim Go! Sum Carry out 112.9ns 112.9/160 = 0.706ns1/0.706ns = 1.4 GHz

28 USC Asynchronous CAD/VLSI Group28 Simulation Results ConditionsI av LatencyThroughput TT, 25 o C, 2.5V, 3.3V2.9 A2.1 ns1.4 GHz SS, 120 o C, 2.2V, 3.0V1.6 A3.3 ns890 MHz FF, 0 o C, 2.7V, 3.6V4.2 A1.6 ns1.9 GHz SF, 25 o C, 2.5V, 3.3V2.9 A2.2 ns1.4 GHz FS, 25 o C, 2.5V, 3.3V2.9 A2.2 ns1.4 GHz

29 USC Asynchronous CAD/VLSI Group29 Demonstration chip Top layout INPUTGEN129BY9ADDER64SAMPLER65BY1000 1700  m 801  m663  m499  m 1963  m 1.36 mm 2 105k transistors 1.3 A @ 1.4 GHz 1.13 mm 2 89k transistors 1.3 A @ 1.4 GHz 0.85 mm 2 62k transistors 0.3 A @ 1.4 GHz 3.3 mm 2 257k transistors 2.9 A @ 1.4 GHz TSMC 0.25 m MOSIS Mar/22/04 QDI Sequential Decoder (Session VI, 10:30am, Thu, Apr/22) STFB 64-bit Adder 3733  m 20.5 mm 2 132 pins 5483  m ~6 months/man Library ~6 months/man Design

30 USC Asynchronous CAD/VLSI Group30 Summary and Conclusions Performance  STFB 2-D pipelining yields ultra-high-performance Design Time  Back-end flow achieves ASIC design time Availability  Cell library has been made freely available Future work  Characterize and extend library  Static timing analysis and sign-off

31 USC Asynchronous CAD/VLSI Group31 Efharisto! (Thank you!)

32 USC Asynchronous CAD/VLSI Group32 STFB Standard-Cell Design Dynamic worst-case direct-path current analysis (STFB buffer pipeline at 2GHz) Non-overlap drive = less direct-path current than an inverter 1mm TSMC 0.25  m, widths in  m and all lengths 0.24  m L Sx R RCD A L Sx R RCD A L Sx R RCD A L Sx R RCD A

33 USC Asynchronous CAD/VLSI Group33 Input Generator Block 9-stage ring BG out in go BG STFB2_BITGEN (bit generator) STFB2_MERGENC (non-conditional merge stage) STFB2_FORK (fork stage) STFB2_BUFFER (buffer stage) STFB2_XOR2 (2-input xor stage) 1 1 11 0 0 0 0 0 0 1 1,0,0,1,0,0…

34 USC Asynchronous CAD/VLSI Group34 E2E2 Comparison STFB x WCHB STFB buffer is ~3x more efficient than WCHB buffer

35 USC Asynchronous CAD/VLSI Group35 Demonstration chip Top layout INPUTGEN129BY9ADDER64SAMPLER65BY1000 1700  m 801  m663  m499  m 1963  m 1.36 mm 2 105k transistors 1.3 A @ 1.4 GHz 1.13 mm 2 89k transistors 1.3 A @ 1.4 GHz 0.85 mm 2 62k transistors 0.3 A @ 1.4 GHz 3.3 mm 2 257k transistors 2.9 A @ 1.4 GHz TSMC 0.25 m MOSIS Mar/22/04 7 Vdd and 7 Gnd pins 12 In/Out, 8 Input and 3 pad’s supply pins 7 Vdd and 7 Gnd pins Total: 51 pins

36 USC Asynchronous CAD/VLSI Group36 Test chip design Top chip layout TSMC 0.25 m MOSIS Mar/22/04 QDI Sequential Decoder (Session VI, 10:30am, Thu) STFB 64-bit Adder 3733  m 5483  m 20.5 mm 2 132 pins


Download ppt "High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel."

Similar presentations


Ads by Google