1 Clockless Logic Montek Singh Thu, Mar 2, 2006. 2 Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,

1 Clockless Logic Montek Singh Thu, Mar 2, 2006

2 Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic

3 Static CMOS logic Advantages: output always strongly driven output always strongly driven  pull-up and pull-down networks are fully-complementary; exactly one of them is “on” always  good immunity from noise and leakage both inverting and non-inverting functions implementable both inverting and non-inverting functions implementable  each gate is inverting  cascade two gates together to get non-inverting logic Disadvantages: slow/big PMOS devices needed (in addition to NMOS) slow/big PMOS devices needed (in addition to NMOS)  greater chip area  higher power consumption  slower switching speed

4 Dynamic Logic, or “domino” Key idea: only use NMOS’s to compute function only use NMOS’s to compute function use a single PMOS to reset use a single PMOS to resetAdvantages: significantly fewer transistors  smaller chip area significantly fewer transistors  smaller chip area higher speed, lower power higher speed, lower power  less “loading” on wires (drive fewer transistors) for async: no storage elements needed for async: no storage elements neededDisadvantages: need extra control input to precharge need extra control input to precharge logic is typically non-inverting only logic is typically non-inverting only more vulnerable to noise and leakage effects more vulnerable to noise and leakage effects

5 Dynamic Logic, or “domino” (contd.) Gate has 2 phases: precharge (=reset): output reset to ‘0’ precharge (=reset): output reset to ‘0’ evaluate: output computed  either stays ‘0’, or switches to ‘1’ evaluate: output computed  either stays ‘0’, or switches to ‘1’ Pull-up and pull-down must never both be simultaneously active: ensure that data inputs are reset while gate is precharging ensure that data inputs are reset while gate is precharging or, add a “footer” device or, add a “footer” device pull-downnetwork controls “evaluation” controls “precharge” PC data inputs control input data output pull-up network PC =0 ( asserted )  precharge PC =0 ( asserted )  precharge PC =1 ( de-asserted )  evaluate PC =1 ( de-asserted )  evaluate

6 Transmission Gates Key Idea: transistors used in a different configuration transistors used in a different configuration when switched on: instead of connecting output to Vdd or Gnd, they connect output to the input when switched on: instead of connecting output to Vdd or Gnd, they connect output to the inputAdvantage: very efficient for implementing switches and multiplexers very efficient for implementing switches and multiplexersDisadvantage: signal degradation unless both NFET and PFET passgates are used in a complementary configuration signal degradation unless both NFET and PFET passgates are used in a complementary configuration

7 Outline: Several Pipeline Styles  Classic static logic pipeline: Sutherland  Recent static logic pipeline: MOUSETRAP  Classic dynamic logic pipeline: Williams/Horowitz’ PS0

8 A Classic Asynchronous Dynamic Pipeline Williams and Horowitz’s PS0 pipeline:  Structure  Operation  Performance

9 A Classic Approach: PS0 Pipeline Williams/Horowitz (Stanford U.) [1986-91]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s] successfully used in fabricated chips [Stanford ’87] [HAL ’90s] Implemented using “ dynamic logic” Processing Block Completion Detector Datain Dataout Stage 1 Stage 2 Stage 3 ack data

10 PS0 Pipeline Stage A PS0 stage consists of dynamic gates and a completion detector: Pull-downnetwork “keeper” PC data inputs data outputs Processing Block CompletionDetector ack

11 Dual-Rail Completion Detector  Combines dual-rail signals  Indicates when all bits are valid (or reset) C Done OR bit 0 OR bit 1 OR bit n  OR together 2 rails per bit  Merge results using “C-element” C-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output valueC-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output value

12 Precharge  Evaluate: another 3 events Complete cycle: 6 events indicates “done” PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation  delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging EVALUATE N: when N+1 completes precharging  accept new data: after next stage is emptied PS0 Protocol 1 2 3 4 5 6 evaluates evaluates evaluates indicates “done” precharges 3 Evaluate  Precharge: 3 events N N+1 N+2

13 PS0 Performance 1 2 3 4 5 6 Cycle Time =

14 Summary: PSO Pipelining Datapaths are latch-free: dynamic gates themselves provide implicit latches dynamic gates themselves provide implicit latches +: chip area savings +: extremely low latency Data items kept separate by control stage deletes data: only after next stage has copied it stage deletes data: only after next stage has copied it stage accepts new data: only if next stage is empty stage accepts new data: only if next stage is empty è distinct data items always separated by “spacers” Control is extremely simple: each controller = single wire completion detector directly controls previous stage completion detector directly controls previous stage +: chip area savings +: low control overhead

15 Comparison to a Clocked Pipeline How would you design the pipeline if you actually had a clock? 1. Replace handshaking with “magic clocking” each stage gets its own clock each stage gets its own clock successive clocks are slightly skewed successive clocks are slightly skewed  essentially, clocked simulation of asynchronous handshaking! – need multiple clock phases! 2. Use a single clock, but insert latches between stages latches are simple, level-sensitive latches are simple, level-sensitive consecutive stages receive complementary clock signals consecutive stages receive complementary clock signals latch Ck Ck’

16 Comparison … (contd.) Cycle Times?

17 Drawbacks of PSO Pipelining 1. Poor throughput: long cycle time: 6 events per cycle long cycle time: 6 events per cycle data “tokens” are forced far apart in time data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer data tokens must be separated by at least one spacer Our Research Goals: address both issues still maintain very low latency still maintain very low latency

1 Clockless Logic Montek Singh Thu, Mar 2, 2006. 2 Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,

Similar presentations

Presentation on theme: "1 Clockless Logic Montek Singh Thu, Mar 2, 2006. 2 Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Clockless Logic Montek Singh Thu, Mar 2, 2006. 2 Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,

Similar presentations

Presentation on theme: "1 Clockless Logic Montek Singh Thu, Mar 2, 2006. 2 Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,"— Presentation transcript:

Similar presentations

About project

Feedback