1 Clockless Logic Montek Singh Tue, Mar 23, 2004.

Slides:



Advertisements
Similar presentations
Introduction to CMOS VLSI Design Sequential Circuits.
Advertisements

VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
Give qualifications of instructors: DAP
MICROELETTRONICA Sequential circuits Lection 7.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
CS 151 Digital Systems Design Lecture 19 Sequential Circuits: Latches.
Flip-Flops, Registers, Counters, and a Simple Processor
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
(Neil west - p: ). Finite-state machine (FSM) which is composed of a set of logic input feeding a block of combinational logic resulting in a set.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
Montek Singh COMP Sep 8,  Previous class: ◦ Basics of magnetism ◦ Nanomagnets and their coupling  TODAY: ◦ Challenges and Benefits  reliability.
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
COMP Clockless Logic and Silicon Compilers Lecture 3
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
1 Clockless Logic Montek Singh Tue, Mar 21, 2006.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
SYEN 3330 Digital SystemsJung H. Kim 1 SYEN 3330 Digital Systems Chapter 9 – Part 1.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Asynchronous Pipelines Author: Peter Yeh Advisor: Professor Beerel.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
1 COMP541 Sequential Circuits Montek Singh Feb 1, 2012.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
Advanced VLSI Design Unit 04: Combinational and Sequential Circuits.
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
1 Clockless Logic or How do I make hardware fast, power- efficient, less noisy, and easy-to-design? Montek Singh Tue, Jan 14, 2003.
1 COMP541 Sequential Circuits Montek Singh Feb 1, 2007.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Chapter 1_0 Registers & Register Transfer. Chapter 1- Registers & Register Transfer  Chapter 7 in textbook.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Computer Architecture & Operations I
Lecture 11: Sequential Circuit Design
Other Approaches.
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
FPGA Implementation of Multicore AES 128/192/256
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
Clockless Logic: Asynchronous Pipelines
ECE 551: Digital System Design & Synthesis
Early output logic and Anti-Tokens
Clockless Computing Lecture 3
Presentation transcript:

1 Clockless Logic Montek Singh Tue, Mar 23, 2004

2Outline  Classic static logic pipeline: Sutherland  Classic dynamic logic pipeline: Williams/Horowitz

3 A Classic Asynchronous Dynamic Pipeline Williams and Horowitz’s PS0 pipeline:  Structure  Operation  Performance

4 A Classic Approach: PS0 Pipeline Williams/Horowitz (Stanford U.) [ ]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s] successfully used in fabricated chips [Stanford ’87] [HAL ’90s] Implemented using “ dynamic logic” Processing Block Completion Detector Datain Dataout Stage 1 Stage 2 Stage 3 ack data

5 PS0 Pipeline Stage A PS0 stage consists of dynamic gates and a completion detector: Pull-downnetwork “keeper” PC data inputs data outputs Processing Block CompletionDetector ack

6 Dual-Rail Completion Detector  Combines dual-rail signals  Indicates when all bits are valid (or reset) C Done OR bit 0 OR bit 1 OR bit n  OR together 2 rails per bit  Merge results using “C-element” C-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output valueC-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output value

7 Precharge  Evaluate: another 3 events Complete cycle: 6 events indicates “done” PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation  delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging EVALUATE N: when N+1 completes precharging  accept new data: after next stage is emptied PS0 Protocol evaluates evaluates evaluates indicates “done” precharges 3 Evaluate  Precharge: 3 events N N+1 N+2

8 PS0 Performance Cycle Time =

9 Summary: PSO Pipelining Datapaths are latch-free: dynamic gates themselves provide implicit latches dynamic gates themselves provide implicit latches +: chip area savings +: extremely low latency Data items kept separate by control stage deletes data: only after next stage has copied it stage deletes data: only after next stage has copied it stage accepts new data: only if next stage is empty stage accepts new data: only if next stage is empty è distinct data items always separated by “spacers” Control is extremely simple: each controller = single wire completion detector directly controls previous stage completion detector directly controls previous stage +: chip area savings +: low control overhead

10 Comparison to a Clocked Pipeline How would you design the pipeline if you actually had a clock? 1. Replace handshaking with “magic clocking” each stage gets its own clock each stage gets its own clock successive clocks are slightly skewed successive clocks are slightly skewed  essentially, clocked simulation of asynchronous handshaking! – need multiple clock phases! 2. Use a single clock, but insert latches between stages latches are simple, level-sensitive latches are simple, level-sensitive consecutive stages receive complementary clock signals consecutive stages receive complementary clock signals latch Ck Ck’

11 Comparison … (contd.) Cycle Times?

12 Drawbacks of PSO Pipelining 1. Poor throughput: long cycle time: 6 events per cycle long cycle time: 6 events per cycle data “tokens” are forced far apart in time data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer data tokens must be separated by at least one spacer Our Research Goals: address both issues still maintain very low latency still maintain very low latency