Presentation is loading. Please wait.

Presentation is loading. Please wait.

Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel.

Similar presentations


Presentation on theme: "Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel."— Presentation transcript:

1 Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel Corp.)

2 Network of Computing Units In Out B1 B3 B2

3 Network of Computing Units In Out B1 B3 B2

4 Network of Computing Units In Out B1 B3 B2

5 Latency-insensitive (elastic) system In Out B1 B3 B2 Every block only makes one step when all inputs are valid

6 Why Scalable Modular (Plug & Play) Tolerance to variable latency –Communication –Computation Not asynchronous –Use existing design paradigms –CAD tools

7 Outline The cost of elasticity SELF: an elastic protocol –Basic implementation (linear pipelines) –General netlists (forks and joins) –Formal models and verification Synthesis of elastic architectures Related work

8 Elastic block Data Valid Stop Control Core CLK Gated clock What’s the cost of elasticity?

9 Communication channel receiversender Data Long wires: slow transmission

10 Pipelined communication senderreceiver Data

11 senderreceiver Data Pipelined communication

12 senderreceiver Data How about if the sender does not always send valid data? Pipelined communication

13 The Valid bit senderreceiver Data Valid

14 The Valid bit senderreceiver Data Valid Data Valid

15 The Valid bit sender Data Valid receiver Data Valid

16 The Valid bit sender Data Valid receiver Data Valid

17 Data Valid The Valid bit senderreceiver Data Valid How about if the receiver is not always ready ?

18 The Stop bit 00000 sender Data Valid Stop receiver Data Valid Stop

19 The Stop bit 11000 sender Data Valid Stop receiver Data Valid Stop

20 The Stop bit 11100 sender Data Valid Stop receiver Data Valid Stop

21 The Stop bit 11111 sender Data Valid Stop receiver Data Valid Stop Back-pressure

22 The Stop bit 10000 sender Data Valid Stop receiver Data Valid Stop Long combinational path

23 Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S

24 Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S

25 Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S

26 Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S

27 Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S

28 Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S

29 Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S

30 Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S

31 Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender Handshakes with short wires Double storage required V S V S V S V S

32 Proposal: an elastic protocol SELF (Synchronous ELastic Flow) Simple and provably correct Data-path with no overhead in: –Area –Latency –Energy Negligible control overhead Fine-grain elasticity

33 Flip-flops vs. latches senderreceiver 1 cycle FF

34 Flip-flops vs. latches senderreceiver 1 cycle HLHL

35 Flip-flops vs. latches senderreceiver 1 cycle HLHL

36 Flip-flops vs. latches senderreceiver 1 cycle HLHL

37 Flip-flops vs. latches senderreceiver 1 cycle HLHL

38 Flip-flops vs. latches senderreceiver 1 cycle HLHL

39 Flip-flops vs. latches senderreceiver 1 cycle HLHL

40 Flip-flops vs. latches senderreceiver 1 cycle HLHL Flip-flops already have a double storage capability, but …

41 Flip-flops vs. latches senderreceiver 1 cycle HLHL Not allowed in conventional FF-based design !

42 Flip-flops vs. latches senderreceiver 1 cycle HLLH Let’s make the master/slave latches independent

43 Flip-flops vs. latches senderreceiver HLHL ½ cycle Let’s make the master/slave latches independent Only half of the latches (H or L) can move tokens

44 Elastic buffer keeps data while stop is in flight W1R1 W2R1 W1R2 W2R2 Cannot be done with Single Edge Flops without double pumping Use latches inside MS Carloni’s relay station belongs to this class

45 Shorthand notation (clock lines not shown) D Q clk En …

46 SELF (linear communication) senderreceiver V V V V S S S S En 11 Data Valid Stop Data Valid Stop 1 1

47 SELF senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0

48 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF

49 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF

50 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF

51 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF

52 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF

53 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF

54 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF

55 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF

56 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF

57 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 1 SELF

58 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

59 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

60 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

61 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

62 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

63 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

64 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

65 senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF

66 senderreceiver V V V V S S S S En Data Valid Stop 1 0 Data Valid Stop SELF

67 senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF

68 senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF

69 senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF

70 senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF

71 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF

72 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF

73 senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF

74 The protocol SenderReceiver Data Valid Stop Idle cycle: Valid = 0 0 

75 The protocol SenderReceiver Data Valid Stop Transfer cycle: Valid = 1  Stop = 0 1 0 D

76 The protocol SenderReceiver Data Valid Stop Retry cycle: Valid = 1  Stop = 1 1 1 D Persistency: G [ V S (Data=D)  Next (V Data=D) ] Persistency: G [ V  S  (Data=D)  Next (V  Data=D) ]

77 Retry Transfer The protocol SenderReceiver Data Valid Stop Data Valid Stop * D D * C C C B * A 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 0 0

78 Elastic Half Buffer SiSiSiSi En i ViViViVi S i-1 V i-1 Data Latch EHB

79 Join EHB + V1V1 V2V2 S1S1 S2S2 V S

80 Lazy Fork V1V1 V2V2 S1S1 S2S2 V S

81 Eager Fork V1V1 V2V2 S1S1 S2S2 ^ ^ V S

82 Elastic combinational paths Fork Join Join / Fork Wire EBEBEB EB

83 Elastic combinational paths Fork Join Join / Fork Wire EBEBEB EB Enable signal to data latches

84 Elastic combinational paths Fork Join Join / Fork Wire EBEBEB EB

85 Elastic buffer: formal model … i i+1 i+k rdwr Dout Vout Sout Din Vin Sin Buffer [ 0..  ] Initial state: rd = wr = 0 Invariant: wr  rd

86 Elastic buffer: formal model … i i+1 i+k rdwr Dout Vout Sout Din Vin Sin Liveness properties (finite unbounded latencies) Finite forward latency: G (rd  wr  F Vout) Finite backward latency : G(  Sout  F  Sin)

87 Formal verification … i i+1 i+k rdwr Dout Vout Sout Din Vin Sin Din Vin Sin Dout Vout Sout Implementation 

88 Formal verification The abstract FSM model is appropriate for compositional verification Verification of implementations with model checking (1-bit abstractions of the datapath) –LTL specs + NuSMV –Buffer is a refinement of the spec –In-order data-transmission –Correct synchronization of fork/join structures –Absence of deadlocks

89 Formal verification Din Vin Sin Dout Vout Sout Abstract model (NFSM)  Din Vin Sin Abstract model (NFSM) Dout Vout Sout Abstract model (NFSM)

90 Formal verification Din Vin Sin Dout Vout Sout Abstract model (NFSM)  Din Vin Sin Abstract model (NFSM) Dout Vout Sout Abstract model (NFSM)

91 Formal verification Din Vin Sin Dout Vout Sout Abstract model (NFSM)  Din Vin Sin Abstract model (NFSM) Dout Vout Sout Abstract model (NFSM) Assuming the same initial contents (e.g. empty)

92 Observational equivalence D: a b c d e f g h i j k … Synchronous: Elastic: D: a a b b b c d e e f g g h i i i j k … D: a a b b b c d e e f g g h i i i j k … En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …

93 Elasticization Synchronous Elastic

94 CLK

95 CLK PC IF/IDID/EXEX/MEMMEM/WB JOIN JOIN FORK FORK

96 V S CLK V S V S V S V S JOINJOIN JOINJOIN FORKFORK FORK

97 1 0 CLK 1 0 1 0 1 0 1 0 JOINJOIN JOINJOIN FORKFORK

98 1 0 CLK 1 0 1 0 1 0 1 0 JOINJOIN JOINJOIN FORKFORK 0 0

99 1 0 1 0 1 0 1 0 1 0 Elastic control layer Generation of gated clocks CLK

100 Variable-latency Units [0 - k] cycles VS done go

101 Variable-latency units Telescopic units: –1 cycle for fast operations –2 cycles for slow operations Examples: –Short / long additions (carry propagation) –A × 0, A / 1 –Dynamic changes in latency (fast if cold, slow if hot)

102 Microarchitectural exploration Bubble insertion + Variable-latency units –May improve performance More bubbles but reduces cycle time –Reduce power Units designed for most frequent input data Exploration at fine-granularity

103 Some related work Asynchronous design –Micropipelines (Sutherland) –Rings (Williams, Sparso) –CHP and slack-elasticity (Martin, Burns, Manohar et al.) Latency insensitive design –Carloni and a few follow-ups (large overhead) –Wire pipelining: Svensson, Nookala, Casu, … Interlock pipelines (H. Jacobson et al.) De-synchronization –J. Cortadella et al. –V. Varshavsky Synchronous implementations of CSP –J. O’Leary et al. –A. Peeters et al.

104 Summary SELF: a specific protocol and implementation for elastic systems with very small overhead buffering Compositional theory proving correctness (Krstic et al., FMCAD’06) Library of controllers has been designed and their correctness verified Elasticization CAD in progress New micro-architectural opportunities based on bubbles and variable latency units


Download ppt "Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel."

Similar presentations


Ads by Google