Download presentation
Presentation is loading. Please wait.
Published byTimoteo Ferretti Modified over 4 years ago
1
Wagging Logic: Moore's Law will eventually fix it
Charlie Brej APT Group University of Manchester 14/07/2019 Group Talk
2
Introduction Quasi-Delay-Insensitive (QDI) approach
Prove the high performance potential What is performance? Latency Throughput Why is async better? Average case performance Variability and data-dependant Bit level pipelining 14/07/2019 Group Talk
3
C Forward Safe Guarding Ensure all wire pairs are cycled up and down
QDI C 14/07/2019 Group Talk
4
Behaviour Viewpoint of a single output Many inputs 14/07/2019
Group Talk
5
Behaviour All or nothing Synchronises inputs together 14/07/2019
Group Talk
6
Why is it so slow? Delays: Stage data propagation: X
Gate: 1, C-element: 2 Stage data propagation: X Cycle time (times 2 for set and reset): Forward guarding: 2X C-element for each gate Acknowledge propagation: 2X C-element for each fork (fork depth ~ gate depth) About eight times slower than worst case! 14/07/2019 Group Talk
7
Why is four-phase so slow?
Low latency Low throughput Only 1/8th of the system doing useful work Rest is resetting/completing Workie Sleepy Sleepy Sleepy Sleepy Sleepy Sleepy Sleepy Workie Sleepy 14/07/2019 Group Talk
8
Solutions Ultra/Hyper/Super Pipelining Faster completion detection
Need 8 times finer pipelining Impossible Each latch adds to the latency Faster completion detection Balanced treeing C-elements Arranging to suit arrival order Backward guarding Not even close to 8x improvement 14/07/2019 Group Talk
9
Inspiration: Wagging Latches
Alternate latch read/write Capacity of two latches Depth of one latch 14/07/2019 Group Talk
10
Wagging Logic Apply same method to the logic
Alternate logic allowing one to set while the other resets (precharges) Set Reset Reset Set 14/07/2019 Group Talk
11
Wagging Logic Between wagging stages
No need to wagg No need to synchronize Wagg only when communication with non-wagging logic 14/07/2019 Group Talk
12
Non FIFO Example 14/07/2019 Group Talk
13
Duplicate the Logic 14/07/2019 Group Talk
14
Connect to Complementary
14/07/2019 Group Talk
15
A Harder Example 14/07/2019 Group Talk
16
Duplicate the Logic 14/07/2019 Group Talk
17
Connect to Complementary
14/07/2019 Group Talk
18
Triplicate the Logic 14/07/2019 Group Talk
19
Connect to the next on the list
14/07/2019 Group Talk
20
Other example 14/07/2019 Group Talk
21
Proof of the pudding Simple gate level simulation Example circuits
My own simulator Delays: C-element=2, Gate=1 Example circuits Fibonacci sequence generators Vertically pipelined 64bit ripple carry adder Non-pipelined 8bit ripple carry adder 16 input XOR Backward and Forward guarded Relative measurements of Speed, Power, Area 10,000 gate delays simulation 14/07/2019 Group Talk
22
64bit Fibonacci Performance
Synchronous Worst Case:74 14/07/2019 Group Talk
23
8bit Fibonacci Performance
Synchronous Worst Case:500 14/07/2019 Group Talk
24
XOR Performance Synchronous Worst/Best Case:1250 (8 gate delays)
Inc. Flip-Flop:1000 (10 gate delays) Inc. Timing margins 14/07/2019 Group Talk
25
Power Consumption Synchronous:610 14/07/2019 Group Talk
26
Area 14/07/2019 Group Talk
27
Future work Larger and more complex designs Improve completion time
Small CPU Layout Silicon? Improve completion time Current optimal wagging ~ 5 Target ~ 3 Fully automated flow Verilog Input & Output Partitioning 14/07/2019 Group Talk
28
Conclusions Matching and surpassing synchronous performance every time
DI logic for performance Very Expensive 20 times more power 5 times bigger (times wagging) Fastest logic on the planet! Discounting increase in wire delays Assuming other things will be able to keep up 14/07/2019 Group Talk
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.