High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel.

Slides:



Advertisements
Similar presentations
Serial Interface Dr. Esam Al_Qaralleh CE Department
Advertisements

Boolean Algebra Variables: only 2 values (0,1)
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
ECE555 Lecture 8/9 Nam Sung Kim University of Wisconsin – Madison
Sequential Logic Design
Asynchronous Circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Collège de France May 14 th, 2013.
Decoding Circuits Made by Adham Barghouti
UNIT 5: CMOS subsystem design
Jongsok Choi M.A.Sc Candidate, University of Toronto.
COE 202: Digital Logic Design Memory and Programmable Logic Devices
Chapter 4 Gates and Circuits.
Chapter 4 Gates and Circuits Nell Dale • John Lewis.
EE466: VLSI Design Lecture 7: Circuits & Layout
COMBINATIONAL LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Chapter 4 Gates and Circuits.
Addition and multiplication
COMBINATIONAL LOGIC DYNAMICS
CMOS Circuits.
Static CMOS Circuits.
Gates and Circuits Nell Dale & John Lewis (adaptation by Erin Chambers and Michael Goldwasser)
The scale of IC design Small-scale integrated, SSI: gate number usually less than 10 in a IC. Medium-scale integrated, MSI: gate number ~10-100, can operate.
Transmission Gate Based Circuits
CS105 Introduction to Computer Concepts GATES and CIRCUITS
Chapter 4 Gates and Circuits.
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Chapter 4 Gates and Circuits.
Verilog Section 3.10 Section 4.5. Keywords Keywords are predefined lowercase identifiers that define the language constructs – Key example of keywords:
Belgian C++ User Group Impact of C++11 Move Semantics on Performance Francisco Almeida.
ASIC 121: Practical VHDL Digital Design for FPGAs Tutorial 2 October 4, 2006.
Functions of Combinational Logic
Figure 10–1 A 64-cell memory array organized in three different ways.
Adders Module M8.1 Section 6.2. Adders Half Adder Full Adder TTL Adder.
Combinatorial networks- II
ECE 424 – Introduction to VLSI
Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits.
ALU Organization Michael Vong Louis Young Rongli Zhu Dan.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
San Jose State University Department of Electrical Engineering Dec 5th, Fall 2005 EE 166 PROJECT Advisor: Prof. David Parent Group Members Radhika Arora,
1 4-BIT ARITHMETIC LOGIC UNIT Motorola MC54/74F181 Heungyoun Kim Lu Gao Jun Li Advisor: Dr. David W. Parent DATE: 12/05/2005.
1 Design of 4- BIT ALU Swetha Challawar Anupama Bhat Leena Kulkarni Satya Kattamuri Advisor: Dr.David Parent 05/11/2005.
1 DESIGN OF 4-BIT ALU Fairchild Semiconductor DM74LS181 Prashanth Kommuri Akram Khan Gopinath Akkinepally Advisor: Dr. David W. Parent 5 December 2005.
Project 2: Cadence Help Fall 2005 EE 141 Ke Lu. Design Phase Estimate delay using stage effort. Example: 8 bit ripple adder driving a final load of 16.
1 DESIGN OF 8-BIT ALU Vijigish Lella Harish Gogineni Bangar Raju Singaraju Advisor: Dr. David W. Parent 8 May 2006.
1 4 BIT Arithmetic Logic Unit (ALU) Branson Ngo Vincent Lam Mili Daftary Bhavin Khatri Advisor: Dave Parent DATE: 05/17/04.
4 Bit Arithmetic Logic Unit Presented by Ipsita Praharaj, Shalaka Ghawate Advisor: Dr. David Parent Date:05/11/04.
1 8 Bit ALU EE 166 Design Project San Jose State University Roger Flores Brian Silva Chris Tran Harizo Yawary Advisor: Dr. Parent May 2006.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
GOOD MORNING.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Comparison of Two RCA Implementations Abstract Two implementations of RCA (Ripple Carry Adder) static circuit are introduced—CMOS and TG logic circuit.
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Modern VLSI Design 3e: Chapters 1-3 week12-1 Lecture 30 Scale and Yield Mar. 24, 2003.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Introduction to CMOS VLSI Design Lecture 5: Logical Effort GRECO-CIn-UFPE Harvey Mudd College Spring 2004.
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Pseudo-nMOS gates. n DCVS logic. n Domino gates. n Design-for-yield. n Gates as IP.
4. Combinational Logic Networks Layout Design Methods 4. 2
Design of a High-Speed Asynchronous Turbo Decoder Pankaj Golani, George Dimou, Mallika Prakash and Peter A. Beerel Asynchronous CAD/VLSI Group Ming Hsieh.
Advanced VLSI Design Unit 04: Combinational and Sequential Circuits.
CMOS VLSI Design MIPS Processor Example
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
Integrated Microsystems Lab. EE372 VLSI SYSTEM DESIGNE. Yoon 1-1 Panorama of VLSI Design Fabrication (Chem, physics) Technology (EE) Systems (CS) Matel.
Written by Whitney J. Wadlow
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
1 EE 382M VLSI 1 EE 360R Computer-Aided Integrated Circuit Design Lab 1 Demo Fall 2011 Whitney J. Wadlow.
4 BIT Arithmetic Logic Unit (ALU)
Written by Whitney J. Wadlow
Clockless Logic: Asynchronous Pipelines
Wagging Logic: Moore's Law will eventually fix it
Presentation transcript:

High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California

USC Asynchronous CAD/VLSI Group2 Key to High-Speed Async Design Completion detection demands 2-D pipelining Latches Bundle-data pipeline Datapath Control logic 2-D pipeline Pipeline stages Async. channels

USC Asynchronous CAD/VLSI Group3 Asynchronous Channels Ack 1-of-N Sender Receiver 1-of-N data Acknowledge 1-of-N channel Sender Receiver 1-of-N data Acknowledge 1-of-N single-track channel Control Data Data stable Req 12 Ack GasP bundle-data channel Sender Receiver Single-rail data Latches Control channel

USC Asynchronous CAD/VLSI Group4 GasP (Sutherland et al.’01) B A L R L Latches R GasP Pulse to data latches Datapath Staticizer Self-resetting NAND fw = 4  = 6 Includes latch setup time and delay Bundled-data pipeline using single-track control

USC Asynchronous CAD/VLSI Group5 Precharge Half-Buffer (Lines’98) NMOS transistor stack Pc Eval Schematic for each output rail Rx L Sx R Eval Pc Le R L L LCD RCD Re fw = 2  = 14+ Precharge Half-Buffer Template C 2-D pipeline using 1-of-N delay-insensitive channels and QDI cells

USC Asynchronous CAD/VLSI Group6 Single-Track Asynchronous Pulsed Logic (Nyström’01) R L Re RCD Re R4 L R STAPL template Pulse generator Reset S Pulse generator xv L0 1 L0 n R0 S0S1 R1 re R0 R1 NMOS transistor stack L1 1 L1 n Schematic for dual-rail output xv R4 L0 1 L1 1 … L0 n L1 n xv STAPL uses pulse generators to control drivers activation timing fw = 2  = 10

USC Asynchronous CAD/VLSI Group7 Single-Track Full-Buffer (Ferretti’02) R L S B RCD B SCD A Reset L R L0 1 L0 n R0 S0S1 R1 B B B A R0 R1 S0 S1 L0 1 L1 1 … L0 n L1 n NMOS transistor stack L1 1 L1 n C Schematic for dual-rail output Block diagram Timing Diagram LSABRLSABR fw = 2  = 6 Small and fast

USC Asynchronous CAD/VLSI Group8 STFB: Tradeoff Speed for Robustness Features of STFB  3x faster than QDI and about half the size  Smaller and faster than STAPL  Smaller forward latency and less timing assumptions than GasP performance GasP robustness QDI (Lines - Caltech) STFB (Ferretti - USC) (Sutherland - Sun) STAPL (Nyström - Caltech)

USC Asynchronous CAD/VLSI Group9 Motivation and Goals Develop a methodology to design STFB-based asynchronous circuits using conventional CAD tools  Create a STFB standard cell library  Make the library publicly-available Design and fabricate a demonstration test chip Evaluate the results Ultimate Goal: Full-custom Performance with ASIC Design Times

USC Asynchronous CAD/VLSI Group10 Outline STFB standard-cell design Backend design flow Demonstration test chip Conclusions

USC Asynchronous CAD/VLSI Group11 STFB channels are point to point (no forked wires) One size per cell in the library is adequate STFB Standard-Cell Design Transistor sizing

USC Asynchronous CAD/VLSI Group12 STFB Standard-Cell Design Transistor sizing 2x min. size N-stack strength 1:4-5 drive ratio 2x 8x L Sx Rx B RCD NMOS transistor stack C Wn A 5 SCD L ≤ 1mm Sx Rx B RCD NMOS transistor stack C Wn A 5 SCD TSMC 0.25  m, widths in  m and all lengths 0.24  m Up to 1mm long wire

USC Asynchronous CAD/VLSI Group13 STFB Standard-Cell Design Balanced response SCD/RCD Data-independent timing assumptions S1 S0 A SCD balanced NAND (2x) TSMC 0.25  m, widths in  m and all lengths 0.24  m R1 R B RCD balanced NOR (1x)

USC Asynchronous CAD/VLSI Group14 STFB Standard-Cell Design STFB_POUT sub-cell Yields less load on B and faster S reset S R B NR / TSMC 0.25  m, widths in  m and all lengths 0.24  m staticizer fights charge–sharing fast S reset fights leakage current STFB_POUT sub-cell layout

USC Asynchronous CAD/VLSI Group15 STFB Standard-Cell Design Reset transistors 2-input NAND → less load on S TSMC 0.25  m, widths in  m and all lengths 0.24  m Reset transistors, reset inverter and NAND layout (from STFB_XOR2 cell) A1A1 S0 S1 L0 1 L1 1 … A2A2 /Reset 1-of-2 cell 2-input NAND + inverter A S0 /Reset S1 L0 1 L1 1 … Initial idea 3-input NAND S0 S1 L0 1 L1 1 … A1A1 A2A2 /Reset S2 1-of-3 cell two 2-input NAND

USC Asynchronous CAD/VLSI Group16 STFB Standard-Cell Design Direct-path current analysis V in M1 M2 V out V DD V DD -Vtp Vtn 0V I peak 0A t t I dp V in I dp Sx A M1 M2 I dp Average direct-path current is similar to inverter I dp V DD V DD -Vtp Vtn 0V I peak1 I peak2 0A t t V A V Sx

USC Asynchronous CAD/VLSI Group17 Outline STFB standard cell design Backend design flow Demonstration test chip Conclusions

USC Asynchronous CAD/VLSI Group18 Standard-Cell Library Development (Ozdag’04) Cell specifications Layout (Virtuoso) Symbol, Schematic and Functional (Virtuoso, Emacs) Simulation (Verilog, Hspice) Layout Cell Abstract (Envisia) Asynchronous Cell Library Symbol Schematic Functional Abstract Template specifications Standard cell specifications Same tools and flow as synchronous LVS/DRC (Dracula/Diva)

USC Asynchronous CAD/VLSI Group19 Asynchronous ASIC Design Flow (Ozdag’04) Symbol Schematic Functional Schematic (Virtuoso) Design specifications Layout Chip Assembly (Virtuoso) Chip Fabrication Place & Route (Silicon Ensemble) Abstract Asynchronous Cell Library LVS/DRC (Dracula/Diva) Simulation (Verilog, Nanosim) Same tools and flow as synchronous

USC Asynchronous CAD/VLSI Group20 Cell Layout Example: STFB2_XOR2 Each cell comprises an entire STFB pipeline stage A A0 A1 B0 B1 Reset A0 A1 B0 B1 /Reset S0 S1 RCD SCD BSBS R BSBS R STFB_POUT R0 R1 R1 R0 S1S0 B C a1a1 C b1 a1a1 b0b0 a0a0 b1b1 a0a0 b0b0 S0S1 S0

USC Asynchronous CAD/VLSI Group21 Outline STFB standard cell design Backend design flow Demonstration test chip Conclusions

USC Asynchronous CAD/VLSI Group22 Prefix Adder a0a0 b0b0 c -1 a1a1 b1b1 a2a2 b2b2 a3a3 b3b3 a4a4 b4b4 a5a5 b5b5 a6a6 b6b6 a7a7 b7b7 s7s7 s6s6 s5s5 s4s4 s3s3 s2s2 s1s1 s0s0 c7c7 3 +  log 2 n  2* n + 1 STFB2_FORK (fork stage) STFB2_BUFFER (buffer stage) STFB2_XOR2 (2-input xor stage) STFB3_AB_KPG and STFB3_AB_KPG2 STFB3_KPG2_KPG and STFB3_KPG2_KPG2 STFB3_KPGC_C and STFB3_KPGC_C2 (Goldovsky’99)

USC Asynchronous CAD/VLSI Group23 64-bit Adder Block Silicon Ensemble P&R Schematic (Virtuoso) Place & Route (Silicon Ensemble) Floor plan 129 rows 70% area utilization Plan power M4 and M5 power grid Pins and cell placement Input pins on the left (A 64, B 64 and C) Output pins on the right (S 64 and C) Filler cell Routing

USC Asynchronous CAD/VLSI Group24 Input Generator Block Flexible and fast input generation a0…a3 d0…d7 4 levels STFB2_SPLIT x8 STFB2_SRST Carry in 9-stage ring 1 64 A B Cin 64x9-stage ring 12x STFB2_SRST Single-rail to single-track converter 1 data address

USC Asynchronous CAD/VLSI Group25 Output Sampler Block 65 65x STFB2_BUCKET BB 65x STFB2_SPLIT 65 65x STFB2_BUCKET BB 65x STFB2_SPLIT 65 65x STFB2_BUCKET BB 65x STFB2_SPLIT bit sum + Cout 30-stage ring 30-stage ring 30-stage ring 1:101:1001: = 1,10,…= 1,100,… = 1,1000,… = 3,13,…= 43,143,… = 843,1843,… Flexible and fast output sampler

USC Asynchronous CAD/VLSI Group26 Simulation Results: Loading Nanosim Carry in Sampler: 10x4x4 = 160 3x B 64 3x A 64 Go!

USC Asynchronous CAD/VLSI Group27 Simulation Results: Running Nanosim Go! Sum Carry out 112.9ns 112.9/160 = 0.706ns1/0.706ns = 1.4 GHz

USC Asynchronous CAD/VLSI Group28 Simulation Results ConditionsI av LatencyThroughput TT, 25 o C, 2.5V, 3.3V2.9 A2.1 ns1.4 GHz SS, 120 o C, 2.2V, 3.0V1.6 A3.3 ns890 MHz FF, 0 o C, 2.7V, 3.6V4.2 A1.6 ns1.9 GHz SF, 25 o C, 2.5V, 3.3V2.9 A2.2 ns1.4 GHz FS, 25 o C, 2.5V, 3.3V2.9 A2.2 ns1.4 GHz

USC Asynchronous CAD/VLSI Group29 Demonstration chip Top layout INPUTGEN129BY9ADDER64SAMPLER65BY  m 801  m663  m499  m 1963  m 1.36 mm 2 105k transistors GHz 1.13 mm 2 89k transistors GHz 0.85 mm 2 62k transistors GHz 3.3 mm 2 257k transistors GHz TSMC 0.25 m MOSIS Mar/22/04 QDI Sequential Decoder (Session VI, 10:30am, Thu, Apr/22) STFB 64-bit Adder 3733  m 20.5 mm pins 5483  m ~6 months/man Library ~6 months/man Design

USC Asynchronous CAD/VLSI Group30 Summary and Conclusions Performance  STFB 2-D pipelining yields ultra-high-performance Design Time  Back-end flow achieves ASIC design time Availability  Cell library has been made freely available Future work  Characterize and extend library  Static timing analysis and sign-off

USC Asynchronous CAD/VLSI Group31 Efharisto! (Thank you!)

USC Asynchronous CAD/VLSI Group32 STFB Standard-Cell Design Dynamic worst-case direct-path current analysis (STFB buffer pipeline at 2GHz) Non-overlap drive = less direct-path current than an inverter 1mm TSMC 0.25  m, widths in  m and all lengths 0.24  m L Sx R RCD A L Sx R RCD A L Sx R RCD A L Sx R RCD A

USC Asynchronous CAD/VLSI Group33 Input Generator Block 9-stage ring BG out in go BG STFB2_BITGEN (bit generator) STFB2_MERGENC (non-conditional merge stage) STFB2_FORK (fork stage) STFB2_BUFFER (buffer stage) STFB2_XOR2 (2-input xor stage) ,0,0,1,0,0…

USC Asynchronous CAD/VLSI Group34 E2E2 Comparison STFB x WCHB STFB buffer is ~3x more efficient than WCHB buffer

USC Asynchronous CAD/VLSI Group35 Demonstration chip Top layout INPUTGEN129BY9ADDER64SAMPLER65BY  m 801  m663  m499  m 1963  m 1.36 mm 2 105k transistors GHz 1.13 mm 2 89k transistors GHz 0.85 mm 2 62k transistors GHz 3.3 mm 2 257k transistors GHz TSMC 0.25 m MOSIS Mar/22/04 7 Vdd and 7 Gnd pins 12 In/Out, 8 Input and 3 pad’s supply pins 7 Vdd and 7 Gnd pins Total: 51 pins

USC Asynchronous CAD/VLSI Group36 Test chip design Top chip layout TSMC 0.25 m MOSIS Mar/22/04 QDI Sequential Decoder (Session VI, 10:30am, Thu) STFB 64-bit Adder 3733  m 5483  m 20.5 mm pins