Wagging Logic: Moore's Law will eventually fix it

Slides:



Advertisements
Similar presentations
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Advertisements

Analysis of Clocked Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
Slide 1/20IWLS 2003, May 30Early Output Logic with Anti-Tokens Charlie Brej, Jim Garside APT Group Manchester University.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Synchronous Digital Design Methodology and Guidelines
Clock Design Adopted from David Harris of Harvey Mudd College.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
COMP Clockless Logic and Silicon Compilers Lecture 3
4/20/2006ELEC7250: Alexander 1 LOGIC SIMULATION AND FAULT DIAGNOSIS BY JINS DAVIS ALEXANDER ELEC 7250 PRESENTATION.
1 8 Bit ALU EE 166 Design Project San Jose State University Roger Flores Brian Silva Chris Tran Harizo Yawary Advisor: Dr. Parent May 2006.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Digital Integrated Circuits for Communication
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
1 COMP541 Sequential Circuits Montek Singh Feb 1, 2007.
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
How does a Computer Add ? Logic Gates within chips: AND Gate A B Output OR Gate A B Output A B A B
Lecture 11: Sequential Circuit Design
Welcome To Seminar Presentation Seminar Report On Clockless Chips
Other Approaches.
CSE241A VLSI Digital Circuits Winter 2003 Recitation 2
Sequential circuit design with metastability
Introduction to Registers
Digital Decode & Correction Logic
Pipelining and Retiming 1
CS Spring 2008 – Lec #17 – Retiming - 1
EKT 221 : Digital 2 COUNTERS.
FIGURE 5.1 Block diagram of sequential circuit
Appendix B The Basics of Logic Design
Architecture & Organization 1
DESIGN AND IMPLEMENTATION OF DIGITAL FILTER
Charlie Brej APT Group University of Manchester
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
ECE CAD-Based Logic Design
Blame Passing for Analysis and Optimisation
COMP541 Sequential Circuits
Asynchronous Counters with SSI Gates
Limitations of STA, Slew of a waveform, Skew between Signals
Architecture & Organization 1
CSE Winter 2001 – Arithmetic Unit - 1
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
CS341 Digital Logic and Computer Organization F2003
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN
Lecture 15 Logistics Last lecture Today HW4 is due today
ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed.
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
High Performance Asynchronous Circuit Design and Application
TA David “The Punner” Eitan Poll
Clockless Logic: Asynchronous Pipelines
ECE 352 Digital System Fundamentals
Lecture 9 Digital VLSI System Design Laboratory
Comparison of Various Multipliers for Performance Issues
Binary Adder/Subtractor
Description and Analysis of MULTIPLIERS using LAVA
Digital Circuits and Logic
Reduction in synchronisation in bundled data systems
CMPE212 Discussion 11/21/2014 Patrick Sykes
A Quasi-Delay-Insensitive Method to Overcome Transistor Variation
Early output logic and Anti-Tokens
Instructor: Michael Greenbaum
Clockless Computing Lecture 3
Lecture 3: Timing & Sequential Circuits
Presentation transcript:

Wagging Logic: Moore's Law will eventually fix it Charlie Brej APT Group University of Manchester 14/07/2019 Group Talk

Introduction Quasi-Delay-Insensitive (QDI) approach Prove the high performance potential What is performance? Latency Throughput Why is async better? Average case performance Variability and data-dependant Bit level pipelining 14/07/2019 Group Talk

C Forward Safe Guarding Ensure all wire pairs are cycled up and down QDI C 14/07/2019 Group Talk

Behaviour Viewpoint of a single output Many inputs 14/07/2019 Group Talk

Behaviour All or nothing Synchronises inputs together 14/07/2019 Group Talk

Why is it so slow? Delays: Stage data propagation: X Gate: 1, C-element: 2 Stage data propagation: X Cycle time (times 2 for set and reset): Forward guarding: 2X C-element for each gate Acknowledge propagation: 2X C-element for each fork (fork depth ~ gate depth) About eight times slower than worst case! 14/07/2019 Group Talk

Why is four-phase so slow? Low latency Low throughput Only 1/8th of the system doing useful work Rest is resetting/completing Workie Sleepy Sleepy Sleepy Sleepy Sleepy Sleepy Sleepy Workie Sleepy 14/07/2019 Group Talk

Solutions Ultra/Hyper/Super Pipelining Faster completion detection Need 8 times finer pipelining Impossible Each latch adds to the latency Faster completion detection Balanced treeing C-elements Arranging to suit arrival order Backward guarding Not even close to 8x improvement 14/07/2019 Group Talk

Inspiration: Wagging Latches Alternate latch read/write Capacity of two latches Depth of one latch 14/07/2019 Group Talk

Wagging Logic Apply same method to the logic Alternate logic allowing one to set while the other resets (precharges) Set Reset Reset Set 14/07/2019 Group Talk

Wagging Logic Between wagging stages No need to wagg No need to synchronize Wagg only when communication with non-wagging logic 14/07/2019 Group Talk

Non FIFO Example 14/07/2019 Group Talk

Duplicate the Logic 14/07/2019 Group Talk

Connect to Complementary 14/07/2019 Group Talk

A Harder Example 14/07/2019 Group Talk

Duplicate the Logic 14/07/2019 Group Talk

Connect to Complementary 14/07/2019 Group Talk

Triplicate the Logic 14/07/2019 Group Talk

Connect to the next on the list 14/07/2019 Group Talk

Other example 14/07/2019 Group Talk

Proof of the pudding Simple gate level simulation Example circuits My own simulator Delays: C-element=2, Gate=1 Example circuits Fibonacci sequence generators Vertically pipelined 64bit ripple carry adder Non-pipelined 8bit ripple carry adder 16 input XOR Backward and Forward guarded Relative measurements of Speed, Power, Area 10,000 gate delays simulation 14/07/2019 Group Talk

64bit Fibonacci Performance Synchronous Worst Case:74 14/07/2019 Group Talk

8bit Fibonacci Performance Synchronous Worst Case:500 14/07/2019 Group Talk

XOR Performance Synchronous Worst/Best Case:1250 (8 gate delays) Inc. Flip-Flop:1000 (10 gate delays) Inc. Timing margins 14/07/2019 Group Talk

Power Consumption Synchronous:610 14/07/2019 Group Talk

Area 14/07/2019 Group Talk

Future work Larger and more complex designs Improve completion time Small CPU Layout Silicon? Improve completion time Current optimal wagging ~ 5 Target ~ 3 Fully automated flow Verilog Input & Output Partitioning 14/07/2019 Group Talk

Conclusions Matching and surpassing synchronous performance every time DI logic for performance Very Expensive 20 times more power 5 times bigger (times wagging) Fastest logic on the planet! Discounting increase in wire delays Assuming other things will be able to keep up 14/07/2019 Group Talk