MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University, New York, NY IEEE
Agenda Review Introduction MOUSETRAP Preliminary Experiment Results Conclusions
Review Synchronous pipeline Wave pipeline Clock-delayed domino Skew-tolerant domino Self-resetting circuits Asynchronous pipeline Micropipeline GasP IPCMOS
Asynchronous circuit ’ s benefits No clock skew problem Low power consumption Faster speed (average case) Reduce global timing issues Avoid variations in fabrication,temperature, … etc. Low EMI & Noise ………
Low Power Consumption On high-performance chips Clock power consumption is a significant proportion of total power consumption. Gated clocks reduce the wastage Make clock skew worse Incur some power cost All parts of the clocked circuits run the same frequency
Performance Synchronous design must be toleranced for worst case conditions Fabrication, temperature, voltage, data values, Clock skew Asynchronous circuits self-adjust to the operating and data conditions
Agenda Review Introduction MOUSETRAP Preliminary Experiment Results Conclusions
Introduction Asynchronous Design Styles Protocol: Level signaling (four phase) Transition signaling (two phase) Logic: Bundled-data (ex: signal-rail) Self-timed (ex: dual-rail)
Level signaling ( four phase ) A send data to B (active) Step 1:A put data in bus, set req =1 Step 2:B get data from bus, set ack =1 (return-to-zero phase) Step 3:A set req =0 Step 4:B set ack =0
Transition signaling ( two phase ) A send data to B (active) Step 1:A put data in bus, set req =1 Step 2:B get data from bus, set ack =1 Step 3:A put data in bus, set req =0 Step 4:B get data from bus, set ack =0
Introduction Asynchronous Design Styles Protocol: Level signaling (four phase) Transition signaling (two phase) Logic: Bundled-data (ex: signal-rail) Self-timed (ex: dual-rail)
C-element Z next =AB+Z(A+B) When A=1,B=1 Z next =1 When A=0,B=0 Z next =0
Micropipeline 4-phase latch FIFO req ack
Bundled-data
Self-timed Generate Completion-Detection signal Delay-Insensitive (DI) Coding ex:dual-rail coding (two phase coding) 00 -> invalid value 01 -> > > no use
Self-timed (dual-rail coding)
Performance Comparison of Asynchronous Adders Mark A. Franklin & Tienyo Pan
Agenda Review Introduction MOUSETRAP Preliminary Experiment Results Conclusions
Mousetrap Minimal-Overhead Ultrahigh-SpEed Transition-signaling Asynchronous Pipeline
MOUSETRAP-FIFO Latch delay is 110 ps XNOR delay is 65 ps
MOUSETRAP with logic (bundled data)
Bundled data Bundled data scheme: Req n must arrive at stage N after the data inputs to that stage have stabilized. Worst-case delay Allow circuits to have hazards
Delay Buffer Inverter chain A chain of transmission gates Duplicate the worst-case critical path More accurate delay More area-expensive
Timing-forward latency
Timing-Cycle time
Standard synchronous pipeline Forward latency Cycle time
MOUSETRAP-Setup time
MOUSETRAP-Hold time
Clocked-CMOS (C 2 MOS) logic
C 2 MOS ’ s benefits Smaller delay Smaller area Lower power consumption
MOUSETRAP- C 2 MOS Forward latency Cycle time
Handling wide datapaths Datapath partitioning Control kiting (buffer insertion)
Optimization Sliding door Change MOS ’ s width (lower )
Non-Linear Pipeline-fork
Non-Linear Pipeline-join
experiment 0.25μm TSMC 2.5v, 300k A pass-gate implementation of an XNOR/XOR A standard 6 transistor pass-gate dynamic D-latch 0.6μm HP 3.3v,300K A pass-gate implementation of an XNOR/XOR Clocked-CMOS style latch 10 stage, 16-bit datapath pre-layout simulation (HSPICE)
result
Conclusions Use small & fast latches Low Latch controller overhead(XNOR) Transition-signaling protocol (efficient & concurrent) Without complex timing & design effort Variable-speed environment(elasticity)
comparison IPCMOS (asynchronous interlocked pipelined CMOS) 3.3~4.5GHz IBM 0.18μm Post-layout simulation