Circuit Design for SRCMOS Asynchronous Wave Pipelines Oliver Hauck Circuit Design for SRCMOS Asynchronous Wave Pipelines Oliver Hauck Integrated Circuits.

Slides:



Advertisements
Similar presentations
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Advertisements

Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
Module 12.  In Module 9, 10, 11, you have been introduced to examples of combinational logic circuits whereby the outputs are entirely dependent on the.
(Neil west - p: ). Finite-state machine (FSM) which is composed of a set of logic input feeding a block of combinational logic resulting in a set.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Decoupled Pipelines: Rationale, Analysis, and Evaluation Frederick A. Koopmans, Sanjay J. Patel Department of Computer Engineering University of Illinois.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,
Synchronous Digital Design Methodology and Guidelines
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Embedding of Asynchronous Wave Pipelines into Synchronous Data Processing Stephan Hermanns, Sorin Alexander Huss University of Technology Darmstadt, Germany.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 Complex Digital Circuits Design Lecture 2: Timing Issues; [Adapted from Rabaey’s Digital Integrated.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
COMP Clockless Logic and Silicon Compilers Lecture 3
Lecture 8: Clock Distribution, PLL & DLL
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 22: Sequential Circuit Design (1/2) Prof. Sherief Reda Division of Engineering,
A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research A 1.5 GHz AWP Elliptic Curve Crypto Chip O.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 23: Sequential Circuit Design (1/3) Prof. Sherief Reda Division of Engineering,
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Digital Integrated Circuits for Communication
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Low Power – High Speed MCML Circuits (II)
The following foils are for a presentation in Munich for Siemens.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
© BYU 18 ASYNCH Page 1 ECEn 224 Handling Asynchronous Inputs.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
Reader: Pushpinder Kaur Chouhan
Reading Assignment: Rabaey: Chapter 9
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
A High-Speed & High-Capacity Single-Chip Copper Crossbar John Damiano, Bruce Duewer, Alan Glaser, Toby Schaffer, John Wilson, and Paul Franzon North Carolina.
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 A few notes for your design  Finger and multiplier in schematic design  Parametric analysis.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Clocking System Design
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 19: Timing Issues; Introduction to Datapath.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Lecture 11: Sequential Circuit Design
Other Approaches.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Clockless Logic: Asynchronous Pipelines
Presentation transcript:

Circuit Design for SRCMOS Asynchronous Wave Pipelines Oliver Hauck Circuit Design for SRCMOS Asynchronous Wave Pipelines Oliver Hauck Integrated Circuits and Systems Lab Departments of Computer Science and Electrical Engineering Darmstadt University of Technology

2 Outline n Pipelines: synchronous, asynchronous, wave pipelined, and asynchronous wave pipelined (AWP) n Comparison: AWPs vs. sync, async, and sync wave pipes n AWP Circuit Design n Conclusion

3 Pipelining n Pipelining used as premier technique to better exploit hardware and boost performance of VLSI chips n Clocking overhead presents serious threat for deeply pipelined systems built upon sub-micron CMOS processes running at GHz frequencies

4 General Framework for Pipelines Logic Latch/Reg Data Clk

5 Some Notations...

6 General Relations

7 n Throughput determined by longest logic path + clock/register overhead n Fine-grain pipelining allows high throughput at the cost of increased clock/register overhead Negative side-effects of gate-level pipelining : n Increased latency, clock load/skew, power, area, design time n More area for clocking and registers than for logic Implementation options: n Register- vs. latch-based, explicit latches vs. latchless n TSPC vs. local clocks derived from global clock n Static vs. dynamic, single-ended vs. dual-rail Synchronous Pipeline Logic Latch/Reg Data Clk

8 Asynchronous Pipeline Logic Handshake Data req_in ack_in req_out ack_out Micropipeline (Sutherland 1989) n Synchronous clock replaced by asynchronous handshaking n Elastic operation: input and output rate may differ momentarily, and pipeline will buffer n Plug & Play composability n Load on req and ack lines distributed n Used by Furber‘s group at Manchester U for AMULET1/2/3 n Operation is data dependant, saves power during idle n As with fine-grain sync pipelines, throughput can be high; handshake causes high latency and backward stall Implementation options: n 4-phase (level) vs. 2-phase (event) protocol n Bundled data (matched delay) vs. completion detection

9 Synchronous Wave Pipeline Wave Logic Latch/Reg Data Clk n Several data waves simultaneously active in the logic n Logic has to minimize delay variations over P,T,V corners n Global clock used with constructive skew to adjust phases n Wave pipelining potentially gives higher throughput as conventional pipelines at decreased latency and reduced clock load, area and power n However, tuning the logic and the delay elements is difficult

10 Wave Pipelining: A Short Outline n Wave pipelining occurs when combinational logic is clocked faster than latency would allow n Several data waves are then active in the logic without being separated by storage elements n Latency remains constant and throughput is determined by delay differences rather than absolute delay n Requirement for delay balanced logic and complicated timing are the main hurdles

11 Wave Pipelining: A Little History n Technique stems from the 60s and has had a reputation for being exotic since n Wave pipelining was long dead before being revived by W. Burleson (U. Mass.) and M. Flynn (Stanford U., PhDs by Wong, Klass, and Nowka) and C. Gray at NCSU n Some working academic chips exist, mainly datapath n Some commercial memory is wave pipelined (e.g. ULTRA-III cache), but no logic, as far as we know

12 Asynchronous Wave Pipeline (AWP) Wave Logic Wave Latch Data req_inreq_out matched delay n Data words associated with events on request line n Several data waves and protocol events simultaneously active in the logic and the matched delay element, respectively n AWP is special case of the sync wave pipeline with the constructive skew set to worst-case logic delay n It is crucial that the delay element accurately tracks the delay behaviour of the logic over P, T, V corners

13 AWPs vs. Synchronous Pipelines n No global clock, instead a local clock (request) that is fed through the pipeline and obeys a simple asynchronous protocol, i.e. data is associated with event on request n Many pipeline registers removed, thus requirements on the clock (request) relaxed n Synchronous pipelines can reach the throughput of AWPs only with excessive cost in area, power and latency

14 AWPs vs. Asynchronous Pipelines n AWPs deliberately sacrifice the ack and keep only the req to avoid protocol overhead n AWPs not elastic: data at output has to be consumed n AWPs eliminate hazards as side-effect of delay balancing n AWPs have in common with other async methodologies: data dependant operation (avoids redundant transitions), composability (though inelastic), no global clock

15 AWPs vs. Synchronous Wave Pipelines AWPs tackle two main difficulties in sync wave pipes: n Replacing the constructive skew by worst-case delay removes double-sided timing constraint, i. e. in con- trast to sync wave pipes do AWPs operate at any rate n Using dynamic self-resetting logic controls delay variation and doesn´t impact latency much

16 Wave Pipelining Combinational Logic n Overall goal: keep data wave coherent under all possible conditions (data, PTV) n Desirable architecture features: most logic paths have same depth fanin/fanout the same everywhere n First step: pad all short paths to maximum length

17 Example: 64-b Brent-Kung Parallel Adder pgPG G xorxor Buffers provide for same depth on every logic path All gates in the same column must have the same delay

18 Circuits n Logic style used has to minimize delay variation n Earlier work focused on bipolar logic (ECL, CML), but CMOS is mainstream n Static CMOS is not well suited for wave piping, fixing the problem results in more power and slower speed n Pass transistor logic gives slopy edges thereby introducing delay variation n Dynamic logic is attractive as only output high transition is data-dependant, output pulldown is done by precharge

19 Circuits (cont.) n Using dynamic logic as in Burleson´s Wave Domino jeopardizes the concept as it needs fine-grain precharge n What is needed is a dynamic logic family without precharge overhead: SRCMOS n Work done at IBM: classic paper by Chappell et al:``A 2-ns Cycle, 3.8-ns Access 512-kb CMOS ECL SRAM with a Fully Pipelined Architecture,´´ JSSC (26), 11, 1991; or, more recently: ``Implementation of a Self- Resetting CMOS 64-Bit Parallel Adder with Enhanced Testability,´´ JSSC (34), 8, 1999, by Hwang et al.

20 SRCMOS n Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced N inputs output

21 Operation of a 2-AND

22 Delay Balancing at Transistor Level n NMOS tree is designed so that the precharge node is pulled down by a constant number of series devices n Short paths are padded with dummy devices n Delay variation is minimal when exactly one path is on, i. e. wide fanin OR´s are hard to use n Every output has to see the same load n Lightly loaded outputs are given dummy cap

23 Example: Carry tree in a 64-bit adder

24 Gim Layout

25 Simulation of Gim cell n Pulses of 4 possible input situations giving ´1´ at the output are tightly matched n Note: in this case never are Pxy=Gxy=1

26 First Pulse Problem

27 Miller Effect

28 64-bit Adder Output Waveforms latching window

29 Transistor Sizing N inputs output Wpd Wkeeper Wprecharge Cdrive Cload Cfeedback Wpd / Cdrive = constCdrive / (Cload+Cfeedback+Wkeeper) = const Cfeedback / Wprecharge = const Wprecharge / Cdrive = const LINEAR SIZING

30 Interconnect: Resistive Effects n 0.9µm x 900µm MET2 parasitics: C=116fF, R=70 Ohms C only RC only R/2, R/2 R/3, R/3, R/3

31 Interconnect: Coupling Effects n 2 adjacent MET2 lines coupled by C=54fF

32 PTV Variations n SRCMOS provides some robustness by generating fresh pulses at every gate output n Pulsed operation reduces data dependancy, coupling n PTV noise is not critical when drift is in the same direction across die n Critical are: temperature gradient, supply drop, and local variations n What is needed: Rule of thumb like ``For process X, to be on the safe side, keep area between two latches < Y sqmm´´

33 Conclusion n AWPs presented as alternative approach to high-speed design, shows potential for GHz throughput without clocks n AWPs avoid some problems of conventional wave pipes and (a)synchronous systems n 64b adder + test circuit and EC crypto layout in the making n Not covered here: feedback + controllers n To do: support transistor sizing