COMP Clockless Logic and Silicon Compilers Lecture 3

Slides:



Advertisements
Similar presentations
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Advertisements

Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Clockless Logic System-Level Specification and Synthesis Ack: Tiberiu Chelcea.
Give qualifications of instructors: DAP
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
CS 151 Digital Systems Design Lecture 19 Sequential Circuits: Latches.
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 A Unique and Successfully Implemented Approach to the Synchronization Problem Based on the article.
Delay/Phase Regeneration Circuits Crescenzo D’Alessandro, Andrey Mokhov, Alex Bystrov, Alex Yakovlev Microelectronics Systems Design Group School of EECE.
Decoupled Pipelines: Rationale, Analysis, and Evaluation Frederick A. Koopmans, Sanjay J. Patel Department of Computer Engineering University of Illinois.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
Digital Integrated Circuits A Design Perspective
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick Department of Computer Science Columbia University New York,
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
1 Clockless Logic Montek Singh Tue, Mar 21, 2006.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
1 Application Specific Integrated Circuits. 2 What is an ASIC? An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 Pipeline Synchronization Continued This second part is based on the recent article Bridging Clock.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
1 Combining the strengths of UMIST and The Victoria University of Manchester Asynchronous Signal Processing Systems Linda Brackenbury APT GROUP, Computer.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Clockless Chips Date: October 26, Presented by:
1 Seminar on High-Speed Asynchronous Pipelines Montek Singh Thursdays 10-11, SN325.
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
1 H ardware D escription L anguages Modeling Digital Systems.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Reading Assignment: Rabaey: Chapter 9
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Lecture 11: FPGA-Based System Design October 18, 2004 ECE 697F Reconfigurable Computing Lecture 11 FPGA-Based System Design.
1 Clockless Logic or How do I make hardware fast, power- efficient, less noisy, and easy-to-design? Montek Singh Tue, Jan 14, 2003.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
Clockless Chips Under the esteemed guidance of Romy Sinha Lecturer, REC Bhalki Presented by: Lokesh S. Woldoddy 3RB05CS122 Date:11 April 2009.
TOPIC : Introduction to Sequential Circuits UNIT 1: Modeling and Simulation Module 4 : Modeling Sequential Circuits.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Other Approaches.
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Registers and clocking issues
ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed.
Clockless Logic: Asynchronous Pipelines
Wagging Logic: Moore's Law will eventually fix it
Instructor: Michael Greenbaum
Clockless Computing Lecture 3
Lecture 3: Timing & Sequential Circuits
Presentation transcript:

COMP290-084 Clockless Logic and Silicon Compilers Lecture 3 Montek Singh Tue, Jan 24, 2006

Handshaking Example: Asynchronous Pipelines Pipelining basics Fine-grain pipelining Example Approach: MOUSETRAP pipelines

Background: Pipelining What is Pipelining?: Breaking up a complex operation on a stream of data into simpler sequential operations A “coarse-grain” pipeline (e.g. simple processor) A “fine-grain” pipeline (e.g. pipelined adder) fetch decode execute Storage elements (latches/registers) Performance Impact: + Throughput: significantly increased (#data items processed/second) – Latency: somewhat degraded (#seconds from input to output)

Focus of Asynchronous Community A Key Focus: Extremely fine-grain pipelines “gate-level” pipelining = use narrowest possible stages each stage consists of only a single level of logic gates some of the fastest existing digital pipelines to date Application areas: general-purpose microprocessors instruction pipelines: often 20-40 stages multimedia hardware (graphics accelerators, video DSP’s, …) naturally pipelined systems, throughput is critical; input “bursty” optical networking serializing/deserializing FIFO’s string matching? KMP style string matching: variable skip lengths

MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001

MOUSETRAP Pipelines Simple asynchronous implementation style, uses… standard logic implementation: Boolean gates, transparent latches simple control: 1 gate/pipeline stage MOUSETRAP uses a “capture protocol:” Latches … are normally transparent: before new data arrives become opaque: after data arrives (“capture” data) Control Signaling: transition-signaling = 2-phase simple protocol: req/ack = only 2 events per handshake (not 4) no “return-to-zero” each transition (up/down) signals a distinct operation Our Goal: very fast cycle time simple inter-stage communication

MOUSETRAP: A Basic FIFO Stages communicate using transition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En reqN doneN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline

MOUSETRAP: A Basic FIFO (contd.) Latch controller (XNOR) acts as “protocol converter”: 2 distinct transitions (up or down)  pulsed latch enable Latch is re-enabled when next stage is “done” Latch is disabled when current stage is “done” Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN doneN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1

MOUSETRAP: FIFO Cycle Time reqN ackN-1 reqN+1 ackN Data Latch Latch Controller doneN Data in Data out Stage N Stage N-1 Stage N+1 En 3 Fast self-loop: N disables itself 2 1 2 N re-enabled to compute N computes N+1 computes Cycle Time =

Detailed Controller Operation Stage N’s Latch Controller ack from N+1 done from N to Latch One pulse per data item flowing through: down transition: caused by “done” of N up transition: caused by “done” of N+1

MOUSETRAP: Pipeline With Logic Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN reqN+1 delay delay delay doneN logic logic logic Data Latch Stage N-1 Stage N Stage N+1 Logic Blocks: can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: each “req” must arrive after data inputs valid and stable

Complex Pipelining: Forks & Joins Problems with Linear Pipelining: handles limited applications; real systems are more complex fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures Forks: distribute data + control to multiple destinations Joins: merge data + control from multiple sources Enabling technology for building complex async systems

Forks and Joins: Implementation req req2 Stage N C req1 ack req ack2 Stage N C ack1 Join: merge multiple requests Fork: merge multiple acknowledges

Performance, Timing and Optzn. MOUSETRAP with Logic: Stage Latency = Cycle Time =

Timing Analysis Main Timing Constraint: avoid “data overrun” Data must be safely “captured” by Stage N before new inputs arrive from Stage N-1 simple 1-sided timing constraint: fast latch disable Stage N’s “self-loop” faster than entire path through previous stage Stage N Data Latch Latch Controller doneN logic delay Stage N-1 reqN ackN-1 reqN+1 ackN

Experimental Results Simulations of FIFO’s: ~3 GHz (in 0.13u IBM process) Recent fabricated chip: GCD ~2 GHz simulated speed chips awaited