Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP Clockless Logic and Silicon Compilers Lecture 3

Similar presentations


Presentation on theme: "COMP Clockless Logic and Silicon Compilers Lecture 3"— Presentation transcript:

1 COMP290-084 Clockless Logic and Silicon Compilers Lecture 3
Montek Singh Tue, Jan 24, 2006

2 Handshaking Example: Asynchronous Pipelines
Pipelining basics Fine-grain pipelining Example Approach: MOUSETRAP pipelines

3 Background: Pipelining
What is Pipelining?: Breaking up a complex operation on a stream of data into simpler sequential operations A “coarse-grain” pipeline (e.g. simple processor) A “fine-grain” pipeline (e.g. pipelined adder) fetch decode execute Storage elements (latches/registers) Performance Impact: + Throughput: significantly increased (#data items processed/second) – Latency: somewhat degraded (#seconds from input to output)

4 Focus of Asynchronous Community
A Key Focus: Extremely fine-grain pipelines “gate-level” pipelining = use narrowest possible stages each stage consists of only a single level of logic gates some of the fastest existing digital pipelines to date Application areas: general-purpose microprocessors instruction pipelines: often stages multimedia hardware (graphics accelerators, video DSP’s, …) naturally pipelined systems, throughput is critical; input “bursty” optical networking serializing/deserializing FIFO’s string matching? KMP style string matching: variable skip lengths

5 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines
Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001

6 MOUSETRAP Pipelines Simple asynchronous implementation style, uses…
standard logic implementation: Boolean gates, transparent latches simple control: 1 gate/pipeline stage MOUSETRAP uses a “capture protocol:” Latches … are normally transparent: before new data arrives become opaque: after data arrives (“capture” data) Control Signaling: transition-signaling = 2-phase simple protocol: req/ack = only 2 events per handshake (not 4) no “return-to-zero” each transition (up/down) signals a distinct operation Our Goal: very fast cycle time simple inter-stage communication

7 MOUSETRAP: A Basic FIFO
Stages communicate using transition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En reqN doneN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline

8 MOUSETRAP: A Basic FIFO (contd.)
Latch controller (XNOR) acts as “protocol converter”: 2 distinct transitions (up or down)  pulsed latch enable Latch is re-enabled when next stage is “done” Latch is disabled when current stage is “done” Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN doneN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1

9 MOUSETRAP: FIFO Cycle Time
reqN ackN-1 reqN+1 ackN Data Latch Latch Controller doneN Data in Data out Stage N Stage N-1 Stage N+1 En 3 Fast self-loop: N disables itself 2 1 2 N re-enabled to compute N computes N+1 computes Cycle Time =

10 Detailed Controller Operation
Stage N’s Latch Controller ack from N+1 done from N to Latch One pulse per data item flowing through: down transition: caused by “done” of N up transition: caused by “done” of N+1

11 MOUSETRAP: Pipeline With Logic
Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN reqN+1 delay delay delay doneN logic logic logic Data Latch Stage N-1 Stage N Stage N+1 Logic Blocks: can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: each “req” must arrive after data inputs valid and stable

12 Complex Pipelining: Forks & Joins
Problems with Linear Pipelining: handles limited applications; real systems are more complex fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures Forks: distribute data + control to multiple destinations Joins: merge data + control from multiple sources Enabling technology for building complex async systems

13 Forks and Joins: Implementation
req req2 Stage N C req1 ack req ack2 Stage N C ack1 Join: merge multiple requests Fork: merge multiple acknowledges

14 Performance, Timing and Optzn.
MOUSETRAP with Logic: Stage Latency = Cycle Time =

15 Timing Analysis Main Timing Constraint: avoid “data overrun”
Data must be safely “captured” by Stage N before new inputs arrive from Stage N-1 simple 1-sided timing constraint: fast latch disable Stage N’s “self-loop” faster than entire path through previous stage Stage N Data Latch Latch Controller doneN logic delay Stage N-1 reqN ackN-1 reqN+1 ackN

16 Experimental Results Simulations of FIFO’s:
~3 GHz (in 0.13u IBM process) Recent fabricated chip: GCD ~2 GHz simulated speed chips awaited


Download ppt "COMP Clockless Logic and Silicon Compilers Lecture 3"

Similar presentations


Ads by Google