Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elastic circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona EMicro 2013.

Similar presentations


Presentation on theme: "Elastic circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona EMicro 2013."— Presentation transcript:

1 Elastic circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona EMicro 2013

2 Goals Convince ourselves that: – designing an asynchronous circuit is easy – synchronous and asynchronous circuits are similar – asynchronous circuits bring new advantages Not to cover exotic asynchronous schemes Elasticity can also be synchronous EMicro 2013Elastic circuits2

3 Clocking EMicro 2013Elastic circuits Nvidia Kepler TM GK110 How to distribute the clock? How to determine the clock frequency? How to implement robust communications? How to reduce and manage energy? 3 28nm, 7.1B transistors, 550mm 2, 2688 CUDA cores, Base clock: 836MHz, Memory clock: 6GHz

4 EMicro 2013Elastic circuits4

5 Outline Synchronous and Source-synchronous circuits Completion detection Handshaking Performance analysis Why asynchronous? Design automation Synchronous elasticity Globally-asynchronous Locally-synchronous EMicro 2013Elastic circuits5

6 Synchronous and Source-Synchronous

7 Synchronous circuit EMicro 2013Elastic circuits PLLPLL 7

8 Synchronous circuit EMicro 2013Elastic circuits CL Two competing paths: Launching path Capturing path Launching path < Capturing path + Period CLKtree + CL < CLKtree + Period CL < Period (no clock skew) 2 2PLLPLL 8

9 Source-synchronous EMicro 2013Elastic circuits CLK gen matched delay No global clock required More tolerance to PVT variations Period > longest combinational path Good for acyclic pipelines Launching path Capturing path 9

10 CLK gen ?? Source-synchronous with forks and joins EMicro 2013Elastic circuits How to synchronize incoming events? 10

11 C element (Muller 1959) EMicro 2013Elastic circuits C C A B C A B CABC000 01C 10C

12 C element (Muller 1959) EMicro 2013Elastic circuits A B C A B CABC000 01C 10C 111 MAJMAJ 12 (many implementations exist)

13 Completion detection

14 EMicro 2013Elastic circuits CLKgenCLKgen fixed delay The fixed delay must be longer than the worst-case logic delay (plus variability) Q: could we detect when a computation has completed ASAP ? 14

15 A 1 SP 0 SP 1 SP 1 SP Delay-insensitive codes: Dual Rail Dual rail: every bit encoded with two signals EMicro 2013Elastic circuits A.tA.fA 00Spacer Not used A.t A.f 15

16 Dual Rail AND gate EMicro 2013Elastic circuits ABC SP A B C A.t A.f B.t B.f C.t C.f 16

17 Dual Rail Inverter EMicro 2013Elastic circuits AZ SP A.t A.f Z.t Z.f 17

18 Dual Rail AND/OR gate EMicro 2013Elastic circuits A B C A.t A.f B.t B.f C.t C.f A B C A.f A.t B.f B.t C.f C.t A B C  18

19 Dual rail: completion detection Dual-rail logic C done Completion detection tree EMicro 2013Elastic circuits19

20 Multi-input C element EMicro 2013Elastic circuits CC CC CC CC CC CC a1 a2 a3 a4 a5 a6 a7 c 20

21 Dual rail: completion detection EMicro 2013Elastic circuits ANDOR INV AND CLKgenCLKgen 21

22 Dual rail: completion detection EMicro 2013Elastic circuits ANDOR INV AND C C CLKgenCLKgen 22

23 Dual rail: operation EMicro 2013Elastic circuits ANDOR INV AND C C CLKgenCLKgen ResetResetComputeCompute ComputeComputeComputeComputeComputeCompute all internal signals For a correct operation, all internal signals should be reset before the compute phase: Use a more complex implementation of dual-rail (e.g., DIMS), or Have internal completion detection, or Use timing assumptions 23

24 Other DI codes There are many DI codes: – k-out-of n, Berger, Knuth, … Example: 1-out-of-4 – 2 bits with 4 wires – Same wire efficiency as DR – Less power consuming – Good for communication – Bad for logic EMicro 2013Elastic circuits WiresValue 0000Spacer othersnot used 24

25 Single rail data vs. dual rail Some back-of-the-envelope estimations: EMicro 2013Elastic circuits Single rail Dual Rail Area12 Delay1<< 1 Static power12 Dynamic power< 0.22 Dual rail: Good for speed Large area High power comsumption 25

26 Handshaking

27 EMicro 2013Elastic circuits CLKgenCLKgen unknown delay Assume that the source module can provide data at any rate: When should the CLK generator send an event if the internal delays of the circuit are unknown? Solution:handshaking Solution: handshaking 27

28 Handshaking EMicro 2013Elastic circuits I have data I want data Data Request Acknowledge 28

29 Asynchronous elastic pipelineCC ReqInReqOut AckIn AckOut CC CC CC David Muller’s pipeline (late 50’s) Sutherland’s Micropipelines (Turing award, 1989) EMicro 2013Elastic circuits29

30 Multiple inputs and outputs EMicro 2013Elastic circuits30

31 Multiple inputs and outputs EMicro 2013Elastic circuits delaydelay 31

32 Mulitple inputs and outputs EMicro 2013Elastic circuits CC Req Ack Req Ack 32

33 Channel-based communication A channel contains data and handshake wires EMicro 2013Elastic circuits Single-Rail Data Req Ack Dual-Rail Data Ack 33

34 Push/pull channels Push: the sender initiates the communication Pull: the receiver initiates the communication EMicro 2013Elastic circuits SenderSender ReceiverReceiver Single-Rail Data Req (push) Ack Single-Rail Data Ack Req (pull) 34

35 Four-phase protocol Valid data on the active edge of Req Req/Ack must return to zero before the next transfer Different variations of the 4-phase protocol exist EMicro 2013Elastic circuits Data 1 Data 2 Data 3 Req Ack Data Data transfer 35

36 Two-phase protocol Every edge is active It may require double-edge triggered flip-flops or pulse generators EMicro 2013Elastic circuits Data 1 Data 2 Data 3 Req Ack Data Data transfer 36

37 How to memorize? EMicro 2013Elastic circuits Combinational Logic LL LL delay CC CC ???? 2-phase or 4-phase ? 37

38 How to memorize? EMicro 2013Elastic circuits Combinational Logic LL LL delay CC CC Pulse generator 2-phase 38

39 How to memorize? EMicro 2013Elastic circuits Combinational Logic LL LL delay CC CC 4-phase 39

40 Performance analysis

41 Ring oscillators EMicro 2013Elastic circuits C C CC C Every ring requires an odd number of inverters The cycle period is determined by the slowest ring The cycle period is adapted to the operating conditions (temperature, voltage)

42 Ring oscillators EMicro 2013Elastic circuits C C CC C Every ring requires an odd number of inverters The cycle period is determined by the slowest ring The cycle period is adapted to the operating conditions (temperature, voltage)

43 Global Rings EMicro 2013Elastic circuits43 CC CC CC CC CC CC

44 Global Rings EMicro 2013Elastic circuits Th = 1 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing,

45 Global Rings EMicro 2013Elastic circuits Th = 2 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing,

46 Global Rings EMicro 2013Elastic circuits Th = 3 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing,

47 Global Rings EMicro 2013Elastic circuits Th = 1 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing,

48 Global Rings EMicro 2013Elastic circuits 0N N/2 tokens Th 1/2 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing, Token limited Bubble limited 48

49 A latch-based view of synchronous circuits EMicro 2013Elastic circuits Filp-flop = Master + Slave 49

50 Multiple Rings EMicro 2013Elastic circuits 2 / 4 2 / 5 5 / 7 ? It’s bubble limited !!! 2 / 7 50

51 Slack matching EMicro 2013Elastic circuits 2 / 4 2 / 5 2 / 7 ? 4 / 9 We can add as many bubbles as we want (but not tokens!) Slack matching can be solved optimally in polynomial time Slack matching is conceptually equivalent to buffer (FIFO) sizing or recycling 51

52 Performance analysis EMicro 2013Elastic circuits52 CC CC CC CC CC CC (Mean Cycle Ratio)

53 Latch-based design EMicro 2013Elastic circuits L3L3L2L2L1L1L4L4 L1 L2 L3 L4 53 Launching path Capturing path

54 Matched delays can be adjustable EMicro 2013Elastic circuits L3L3L2L2L1L1L4L4 54 delay selection Delays can be adjusted: At testing/boot time (to adjust to static variability) At runtime (to compensate dynamic variability)

55 Why asynchronous?

56 Exploiting elasticity CLK Rigid clock High performance Low energy EMicro 2013Elastic circuits56

57 High performance Exploiting elasticity Voltage Performance 1 V Rigid 2 GHz 1 GHz 500 MHz Low energy 0.9 V 0.8 V 0.7 V Rigid clock High performance Low energy Voltage scaling EMicro 2013Elastic circuits57

58 Voltage scaling and power savings-24%-14% 3 ARM926 cores on the same die EMicro 2013Elastic circuits58

59 Tracking variability EMicro 2013Elastic circuits59 matched delay

60 Tracking variability delay best typ worst multi-corner matched delay critical paths Good correlation for: Process variability (systematic) Global voltage fluctuations Temperature Aging (partially) Good correlation for: Process variability (systematic) Global voltage fluctuations Temperature Aging (partially) EMicro 2013Elastic circuits60

61 Margins Gate and wire delays (typ) PPVVTTAgingAging PLL Jitter SkewSkew Rigid Clocks: Cycle period Gate and wire delays (typ) PPVVTTAgingAging Elastic Clocks: SkewSkew Cycle period Margin reduction Speed-up / Power savings EMicro 2013Elastic circuits61

62 wasted time computation time Rigid clock computation time Cycle period Elastic clock Clock elasticity EMicro 2013Elastic circuits62

63 Design Automation

64 Design automation paradigms Synthesis of asynchronous controllers – Logic synthesis from Petri nets or asynchronous FSMs Syntax-directed translation – Correct-by-construction composition of handshake components De-synchronization – Automatic transformation from synchronous to asynchronous EMicro 2013Elastic circuits64

65 Synthesis of asynchronous controllers EMicro 2013Elastic circuits Device LDS LDTACK D DSr DSw DTACK VME Bus Controller Data Transceiver Bus DSr LDS LDTACK D DTACK Read Cycle 65

66 Synthesis of asynchronous controllers EMicro 2013Elastic circuits LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller Signal Transition Graph 66

67 Synthesis of asynchronous controllers EMicro 2013Elastic circuits DTACK D DSr LDS LDTACK LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ Cortadella et al., Petrify 67

68 Syntax-directed translation EMicro 2013Elastic circuits int = type [0..255] & gcd: main proc (in? chan > & out! chan int) begin x, y: var int | forever do in? > ; do x <> y then if x < y then y:=y-x else x:=x-y fi od ; out!x od end Sources: J. Kessels and A. Peeters. DESCALE: A Design Experiment for a Smart Card Application Consuming Low Energy, in Principles of Asynchronous Circuit Design, A Systems Perspective, Eds., J. Sparso and S. Furber, Kluwer Academic Publishers, P.A.Beerel, R.O. Ozdag and M. Ferretti. A Designer’s Guide to Asynchronous VLSI, Cambridge University Press,

69 De-synchronization Strategy: substitute the clock tree by local clocks and handshakes Combinational logic and latches are not modified More tolerance to variability – Similar area, less power and/or more speed Cortadella, Kondratyev, Lavagno and Sotiriou. Desynchronization: Synthesis of asynchronous circuits from synchronous specifications. IEEE TCAD, Oct EMicro 2013Elastic circuits69

70 Synchronous operation EMicro 2013Elastic circuits CLK gen Transforming a synchronous circuit into asynchronous (automatically) 70

71 Synchronous operation EMicro 2013Elastic circuits CLK gen Transforming a synchronous circuit into asynchronous (automatically) 71

72 De-synchronization EMicro 2013Elastic circuits Transforming a synchronous circuit into asynchronous (automatically) 72

73 De-synchronization EMicro 2013Elastic circuits Transforming a synchronous circuit into asynchronous (automatically) 73

74 System-level de-synchronization EMicro 2013Elastic circuits74 CLK

75 System-level de-synchronization EMicro 2013Elastic circuits75

76 System-level de-synchronization EMicro 2013Elastic circuits76

77 System-level de-synchronization EMicro 2013Elastic circuits77

78 Synchronous elasticity

79 Different flavors of elasticity EMicro 2013Elastic circuits … … …Rigid + e 4 8 … … … 3 Elastic s … … Synchronous Elastic Carloni et al., Latency-insensitive systems.

80 Asynchronous elasticity req ack EMicro 2013Elastic circuits80

81 Synchronous elasticity valid stop Ring oscillator CLK PLLPLL EMicro 2013Elastic circuits81

82 Latch-based elasticity senderreceiver V V V V En Data Valid Stop Data Valid Stop EMicro 2013Elastic circuits82

83 Elastic netlists Fork Join Join / Fork EBEBEB EB Enable signal to data latches EMicro 2013Elastic circuits83

84 Variable Latency Units EMicro 2013Elastic circuits [0 - k] cycles done go clear 84 V/S

85 Globally-asynchronous Locally-synchronous GALS

86 SoC design with GALS Most IPs are synchronous Different components may have different operating frequencies Some components have variable latencies (e.g., cache hit/miss latency) Multiple clock domains are essential EMicro 2013Elastic circuits86 BridgeBridge CDCCDC DSPDSP PPPP PPPP Fast Bus Slow Bus BridgeBridge CDCCDCMemMem CLK2 CLK1 CLK3

87 Multiple clock domains EMicro 2013Elastic circuits CLK Single clock (mesochronous) f1/f0 f2/f0 f3/f0 CLK (f0) Rational clock frequencies CLK1 CLK2 CLK3 CLK0 Independent clocks (controllable skew) 87

88 Synchronous handshakes EMicro 2013Elastic circuits CLK1CLK2 DataData SenderSenderReceiverReceiver Valid Ack The arrival of data is unpredictable Handshakes solve the problem 88

89 The problem: metastability EMicro 2013Elastic circuits DQ ФTФT DQ ? D Q ФRФR ФRФR setup hold 89

90 How long does it take to resolve metastability? EMicro 2013Elastic circuits Metastability MTBF: Mean Time Between Failures 90

91 Classical synchronous solution EMicro 2013Elastic circuits DQDQDQDQ ФTФT ФRФR Mean Time Between Failures f Ф :frequency of the clock f D :frequency of the data t r :resolve time available W:metastability window  :resolve time constant # FFs MTBF 1 FF 15 min 2 FF 9 days 3 FF 23 years Example 91

92 Handshake with synchronizers EMicro 2013Elastic circuits CLK1CLK2 DataData SenderSenderReceiverReceiver Valid Ack Simple solution Throughput can be highly degraded: a long round trip for every transaction 92

93 Asynchronous FIFOs EMicro 2013Elastic circuits Circular buffer Valid Ack Data Clk In Clk Out FIFO control Ack is issued as soon as data has been delivered No impact on throughput (1 token/cycle) Min latency determined by the internal synchronizers Some tricky structures for the FIFO pointers (e.g. Grey encoding) 93

94 SoC design with GALS EMicro 2013Elastic circuits BridgeBridge CDCCDC DSPDSP PPPP PPPP Fast Bus Slow Bus BridgeBridge CDCCDCMemMem CLK2 CLK1 CLK3 Bridges for Clock Domain Crossing usually contain asynchronous FIFOs Latency cost only when interfacing with synchronous domains No latency penalty between asynchronous domains 94

95 Conclusions Elasticity offers flexibility in time – Modularity – Dynamic adaptability – Tolerance to variability Better optimization of power/performance Why isn’t it an important trend in circuit design? – Lack of commercial EDA support (timing sign-off) – Designers do not feel comfortable with “unpredictable” timing – Other aspects: testing, verification, … De-synchronization might be a viable solution EMicro 2013Elastic circuits95

96 Bibliography Carmona, Cortadella, Kishinevsky and Taubin, Elastic Circuits, IEEE Trans. On CAD, Oct Beerel, Ozdag and Ferreti, A Designer’s Guide to Asynchronous VLSI, Cambridge Sparso and Furber, Principles of Asynchronous Circuit Design: A Systems Perspective, Kluwer Myers, Asynchronous Circuit Design, John Wiley&Sons, 2001 EMicro 2013Elastic circuits96

97 EMicro 2013Elastic circuits97


Download ppt "Elastic circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona EMicro 2013."

Similar presentations


Ads by Google