Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky,

Similar presentations


Presentation on theme: "Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky,"— Presentation transcript:

1

2 Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic, USA Luciano Lavagno, Università di Udine, Italy

3 Outline I: Introduction to basic concepts on asynchronous design II: Synthesis of control circuits from STGs III: Advanced topics on synthesis of control circuits from STGs IV: Synthesis from HDL and other synthesis paradigms Note: no references in the tutorial

4 Introduction to asynchronous circuit design: specification and synthesis Part I: Introduction to basic concepts on asynchronous circuit design

5 Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

6 Synchronous circuit RRRRCL CLK Implicit synchronization

7 Asynchronous circuit RRRRCL Explicit synchronization: Req/Ack handshakes Req Ack

8 Synchronous communication Clock edges determine the time instants where data must be sampled Data wires may glitch between clock edges (set-up/hold times must be satisfied) Data are transmitted at a fixed rate (clock frequency) 110010

9 Dual rail Two wires per bit –“00” = spacer, “01” = 0, “10” = 1 n-bit data communication requires 2n wires Each bit is self-timed Other delay-insensitive codes exist 11 00 1 0

10 Bundled data Validity signal –Similar to an aperiodic local clock n-bit data communication requires n+1 wires Data wires may glitch when no valid Signaling protocols –level sensitive (latch) –transition sensitive (register): 2-phase / 4-phase 110010

11 Example: memory read cycle Transition signaling, 4-phase Valid address Address Valid data Data AA DD

12 Example: memory read cycle Transition signaling, 2-phase Valid address Address Valid data Data AA DD

13 Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

14 Asynchronous modules Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+ reqin- start- [reset] done- reqout- ackout- ackin- (more concurrency is also possible, e.g. by overlapping the return-to- zero phase of step i-1 with the evaluation phase of step i) Data INData OUT req inreq out ack inack out DATA PATH CONTROL startdone

15 Completion detection Cdone Completion detection tree

16 Asynchronous latches: C element C A B Z A B Z + 0 0 0 0 1 Z 1 0 Z 1 1 1 Vdd Gnd A A A AB B B B Z Z Z

17 Dual-rail logic A.t A.f B.t B.f C.t C.f Dual-rail AND gate Valid behavior for monotonic environment

18 Differential cascode voltage switch logic start A.t B.t C.t A.fB.f C.f Z.tZ.f done 3-input AND/NAND gate

19 Bundled-data logic blocks delay startdone logic Conventional logic + matched delay

20 Micropipelines (Sutherland 89) LLLLlogic R in A out C C C C R out A in delay

21 Data-path / Control LLLLlogic R in R out CONTROL A in A out

22 Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

23 Control specification A+ B+ A- B- A B A input B output

24 Control specification A+ B+ A- B- A B

25 Control specification A+ B- A- B+ A B

26 Control specification A+ C- A- C+ A C B+ B- B C

27 Control specification A+ C- A- C+ A C B+ B- B C

28 Control specification C C Ri Ro Ai Ao Ri+ Ao+ Ri- Ao- Ro+ Ai+ Ro- Ai- Ri Ro Ao Ai FIFO cntrl

29 A simple filter: specification y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop R in A in A out R out IN OUT filter

30 A simple filter: block diagram xy + control R in A in R out A out RxRx AxAx RyRy AyAy RaRa AaAa IN OUT x and y are level-sensitive latches (transparent when R=1) + is a bundled-data adder (matched delay between R a and A a ) R in indicates the validity of IN After A in + the environment is allowed to change IN (R out,A out ) control a level-sensitive latch at the output

31 A simple filter: control spec. xy + control R in A in R out A out RxRx AxAx RyRy AyAy RaRa AaAa IN OUT R in + A in + R in - A in - Rx+Rx+ Ax+Ax+ Rx-Rx- Ax-Ax- Ry+Ry+ Ay+Ay+ Ry-Ry- Ay-Ay- Ra+Ra+ Aa+Aa+ Ra-Ra- Aa-Aa- R out + A out + R out - A out -

32 A simple filter: control impl. R in + A in + R in - A in - Rx+Rx+ Ax+Ax+ Rx-Rx- Ax-Ax- Ry+Ry+ Ay+Ay+ Ry-Ry- Ay-Ay- Ra+Ra+ Aa+Aa+ Ra-Ra- Aa-Aa- R out + A out + R out - A out - C R in A in RxRx AxAx RyRy AyAy AaAa RaRa A out R out

33 Control: observable behavior Rx+Rx+ R in + Ax+Ax+Ra+Ra+Aa+Aa+R out +A out +z+R out -A out -Ry+Ry+ Ry-Ry- Ay+Ay+ Rx-Rx-Ax-Ax- Ay-Ay- A in - A in + Ra-Ra- R in - Aa-Aa- z- C R in A in RxRx AxAx RyRy AyAy AaAa RaRa A out R out z

34 Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?

35 Taking delays into account x+ x- y+ y- z+ z- x z y x’ z’ Delay assumptions: Environment: 3 times units Gates: 1 time unit events: x+  x’-  y+  z+  z’-  x-  x’+  z-  z’+  y-  time: 3 4 5 6 7 9 10 12 13 14

36 Taking delays into account x+ x- y+ y- z+ z- x z y x’ z’ Delay assumptions: unbounded delays events: x+  x’-  y+  z+  x-  x’+  y- time: 3 4 5 6 9 10 11 very slow failure !

37 Gate vs wire delay models Gate delay model: delays in gates, no delays in wires Wire delay model: delays in gates and wires

38 Delay models for async. circuits Bounded delays (BD): realistic for gates and wires. –Technology mapping is easy, verification is difficult Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. –Technology mapping is more difficult, verification is easy Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. –DI class (built out of basic gates) is almost empty Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). –Formally, it is the same as speed independent –In practice, different synthesis strategies are used BD SI  QDI DI

39 Motivation (designer’s view) Modularity –Plug-and-play interconnectivity Reusability –IPs with abstract timing behaviors High-performance –Average-case performance (no worst-case delay synchronization) –No clock skew (local timing assumptions instead) Many interfaces are asynchronous –Buses, networks,...

40 Motivation (technology aspects) Low power –Automatic clock gating Electromagnetic compatibility –No peak currents around clock edges Robustness –High immunity to technology and environment variations (in-die variations, temperature, power supply,...)

41 Problems Concurrent models for specification –CSP, Petri nets,... Difficult to design –Hazards, synchronization Complex timing analysis –Difficult to estimate performance Difficult to test –No way to stop the clock

42 But we have some success stories... Philips AMULET microprocessors Sharp Intel (RAPPID) IBM (interlocked pipeline) Start-up companies: –Theseus Logic, Cogency, ADD...

43 Introduction to asynchronous circuit design: specification and synthesis Part II: Synthesis of control circuits from STGs

44 Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

45 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

46 x y z x+ x- y+ y- z+ z- Signal Transition Graph (STG) x y z

47 x y z x+ x- y+ y- z+ z-

48 x+ x- y+ y- z+ z- xyz 000 x+ 100 y+ z+ y+ 101 110 111 x- 001 011 y+ z- 010 y-

49 xyz 000 x+ 100 y+ z+ y+ 101 110 111 x- 001 011 y+ z- 010 y- Next-state functions

50 x z y

51 Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

52 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

53 VME bus Device LDS LDTACK D DSr DSw DTACK VME Bus Controller Data Transceiver Bus DSr LDS LDTACK D DTACK Read Cycle

54 STG for the READ cycle LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller

55 Choice: Read and Write cycles DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK-DTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

56 Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

57 Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

58 Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-

59 Circuit synthesis Goal: –Derive a hazard-free circuit under a given delay model and mode of operation

60 Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

61 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

62 STG for the READ cycle LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller

63 Binary encoding of signals DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+

64 Binary encoding of signals DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+ 10000 10010 10110 0111001110 01100 00110 10110 (DSr, DTACK, LDTACK, LDS, D)

65 QR (LDS+) QR (LDS-) Excitation / Quiescent Regions ER (LDS+) ER (LDS-) LDS- LDS+ LDS-

66 Next-state function 0  1 LDS- LDS+ LDS- 1  0 0  0 1  1 10110

67 Karnaugh map for LDS DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 000000/1? 1 111 - - - --- ---- - ---- ---

68 Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

69 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

70 Concurrency reduction LDS- LDS+ LDS- 10110 DSr+

71 Concurrency reduction LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+

72 State encoding conflicts LDS- LDTACK- LDTACK+ LDS+ 10110

73 Signal Insertion LDS- LDTACK- D- DSr- LDTACK+ LDS+ CSC- CSC+ 101101 101100

74 Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture

75 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

76 Complex-gate implementation Under what conditions does a hazard-free implementation exist?

77 Implementability conditions Consistency –Rising and falling transitions of each signal alternate in any trace Complete state coding (CSC) –Next-state functions correctly defined Persistency –No event can be disabled by another event (unless they are both inputs)

78 Implementability conditions Consistency + CSC + persistency There exists a speed-independent circuit that implements the behavior of the STG (under the assumption that any Boolean function can be implemented with one complex gate)

79 Persistency 100000001 a- c+ b+b+ b+b+ a c b a c b is this a pulse ? Speed independence  glitch-free output behavior under any delay

80 Speed-independent implementations How can the implementability conditions –Consistency –Complete state coding –Persistency be satisfied? Standard circuit architectures: –Complex (hazard-free) gates –C elements with monotonic covers –“Standard” gates and latches

81 a+ b+ c+ d+ a- b- d- a+ c-a- 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+

82 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 ER(d+) ER(d-)

83 ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ Complex gate

84 Implementation with C elements C R S z  S+  z+  S-  R+  z-  R-  S (set) and R (reset) must be mutually exclusive S must cover ER(z+) and must not intersect ER(z-)  QR(z-) R must cover ER(z-) and must not intersect ER(z+)  QR(z+)

85 ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d

86 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d but...

87 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d Assume that R=ac has an unbounded delay Starting from state 0000 (R=1 and S=0): a+ ; R- ; b+ ; a- ; c+ ; S+ ; d+ ; R+ disabled (potential glitch)

88 ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d Monotonic covers

89 C-based implementations C S R d C d a b c a b c d weak generalized C element (gC)

90 Synthesis exercise y- z-w- y+x+ z+ x- w+ 1011 0111 0011 1001 1000 1010 0001 00000101 00100100 0110 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ Derive circuits for signals x and z (complex gates and monotonic covers)

91 Synthesis exercise 1011 0111 0011 1001 1000 1010 0001 00000101 00100100 0110 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wx yz 00011110 00 01 11 10 - - - - Signal x 1 0 1 1 1 1 1 0 0 0 0 0

92 Synthesis exercise 1011 0111 0011 1001 1000 1010 0001 00000101 00100100 0110 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wx yz 00011110 00 01 11 10 - - - - Signal z 1 0 0 0 0 1 1 1 0 0 0 0

93 Introduction to asynchronous circuit design: specification and synthesis Part III: Advanced topics on synthesis of control circuits from STGs

94 Outline Logic decomposition –Hazard-free decomposition –Signal insertion –Technology mapping Optimization based on timing information –Relative timing –Timing assumptions and constraints –Automatic generation of timing assumptions

95 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

96 No Hazards a b c x 0 abcx 1000 1100 b+ 0100 a- 0110 c+ 1 1 0 0 1 1 0 1 0 1 0 0

97 Decomposition May Lead to Hazards abcx 1000 1100 b+ 0100 a- 0110 c+ a b z c x 1 0 0 0 0 1000 1100 0100 0110 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 0 1 0

98 Decomposition Acknowledgement Generating candidates Hazard-free signal insertion –Event insertion –Signal insertion

99 Global acknowledgement a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c-

100 a b c z a b d y How about 2-input gates ? d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c-

101 a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

102 a b c z a b d y 0 0 d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

103 a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

104 c z d y a b d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?

105 Strategy for logic decomposition Each decomposition defines a new internal signal Method: Insert new internal signals such that –After resynthesis, some large gates are decomposed –The new specification is hazard-free Generate candidates for decomposition using standard logic factorization techniques: –Algebraic factorization –Boolean factorization (boolean relations)

106 y- z-w- y+x+ z+ x- w+ Decomposition example 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wxyz

107 yz=1 yz=0 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ C C x y x y w z x y z y z w z w z y

108 y- z-w- y+x+ z+ x- w+ s- s+ s- s+ s- s=1 s=0 10011011 1000 1010 0111 0011 y+ x- w+ z+ z- 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 y-

109 s- s+ s- s=1 s=0 10011011 1000 1010 0111 0011 y+ x- w+ z+ z- 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 C C x y x y w z x y z w z w z y s y-

110 C C x y x y w z x y z y z w z w z y yz=1yz=0 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ 1011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ 1001

111 s- s+ s=1 s=0 1001 1011 0111 0011 x- w+ z+ 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 y- z-w- y+x+ z+ x- w+ s- s+ z- is delayed by the new transition s- !

112 C C x y x y w z x y z w z w z yyyyyyy s- s+ s=1 s=0 1001 1011 0111 0011 x- w+ z+ 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 y-

113 F C Sr D Decomposition (Algebraic, Boolean relations) Hazard-free ? (Event insertion) NO YES C C C C Sr D D

114 F C D Hazard-free ? (Event insertion) NO YES C C Sr D until no more progress Decomposition (Algebraic, Boolean relations)

115 Signal insertion for function F State Graph F=0F=1 Insertion by input borders F- F+

116 Event insertion a b ER(x) c

117 Event insertion a b ER(x) c x x x x b SR(x) a

118 Properties to preserve a a b b a a b b a a b b x a a b b a a b b b a a b b x x a is persistent a is disabled by b = hazards

119 Boolean decomposition F x1x1 xnxn f HG x1x1 xnxn h1h1 hmhm f f = F (x 1,…,x n )f = G(H(x 1,…,x n )) Our problem: Given F and G, find H

120 C h1h1 h2h2 f state f next(f) (h 1,h 2 ) s 1 0 0 (0,-) (-,0) s 2 0 1 (1,1) s 3 1 0 (0,0) s 4 1 1 (-,1) (1,-) dc - - (-,-) This is a Boolean Relation

121 y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d F Rs y R S

122 y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs y a c d c d

123 y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs ya

124 y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs ya D d c

125 Technology mapping Merging small gates into larger gates introduces no new hazards Standard synchronous technique can be applied, e.g. BDD-based boolean matching Handles sequential gates and combinational feedbacks Due to hazards there is no guarantee to find correct mapping (some gates cannot be decomposed) Timing-aware decomposition can be applied in these rare cases

126 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow

127 Timing assumptions in design flow Speed-independent: wire delays after a fork smaller than fan-out gate delays Burst-mode: circuit stabilizes between two changes at the inputs Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design)

128 Relative Timing Circuits Assumptions: “a before b” –for concurrent events: reduces reachable state space –for ordered events: permits early enabling –both increase don’t care space for logic synthesis => simplify logic (better area and timing) “Assume - if useful - guarantee” approach: assumptions are used by the tool to derive a circuit and required timing constraints that must be met in physical design flow Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)

129 Speed-independent C-element Relative Timing Asynchronous Circuits a- before b- Timing assumption (on environment): a b c RT C-element: faster,smaller; correct only under timing constraint: a- before b- a b c

130 State Graph (Read cycle) DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+

131 Lazy Transition Systems ER (LDS+) ER (LDS-) LDS- LDS+ LDS- DTACK- FR (LDS-) Event LDS- is lazy: firing = subset of enabling

132 Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above

133 Speed-independent Netlist LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map

134 Adding timing assumptions (I) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map LDTACK- before DSr+ FAST SLOW

135 Adding timing assumptions (I) DTACK D DSr LDS LDTACK csc map LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+

136 State space domain LDTACK- before DSr+ LDTACK- DSr+

137 State space domain LDTACK- before DSr+ LDTACK- DSr+

138 State space domain LDTACK- before DSr+ LDTACK- DSr+ Two more unreachable states

139 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 000000/1? 1 111 - - - --- ---- - ---- ---

140 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 111 - - - --- ---- - ---- --- One more DC vector for all signalsOne state conflict is removed

141 Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map

142 Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK LDTACK- before DSr+ TIMING CONSTRAINT

143 Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above

144 Ordered events: early enabling a c b a a c b a b b c c F G Logic for gate c may change

145 Adding timing assumptions (II) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK D- before LDS-

146 State space domain LDS- D- Reachable space is unchanged For LDS- enabling can be changed in one state D- before LDS- Potential enabling for LDS- DSr-

147 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 111 - - - --- ---- - ---- ---

148 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 11 - - - - --- ---- - ---- --- One more DC vector for one signal: LDS If used: LDS = DSr, otherwise: LDS = DSr + D

149 Before early enabling LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK

150 Netlist with two constraints LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+ and D- before LDS- TIMING CONSTRAINTS DTACK D DSr LDS LDTACK Both timing assumptions are used for optimization and become constraints

151 Rule I (out of 6): a,b - non-input events –Untimed ordering: a||b and a enabled before b, but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b) Deriving automatic timing assumptions aaa b b b c c

152 Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect I: a state becomes DC for all signals

153 Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect II: another state becomes local DC for signal of event b

154 Backannotation of Timing Constraints Timed circuits require post-verification Can synthesis tools help ? –Report the least stringent set of timing constraints required for the correctness of the circuit –Not all initial timing assumptions may be required Petrify reports a set of constraints for order of firing that guarantee the circuit correctness

155 Timing constraints generation a b c d e d d e e b b c c d a Assumptions: d before b and c before e and a before d

156 Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c d a

157 Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c Correct behavior d a

158 Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 2 Incorrect behavior d a

159 Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 24 3 {1, 3} d before b {1} d before c d a 5 {2, 4} c before e Other possible constraints remove states from assumption domain => invalid

160 Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 24 3 {1} d before c d a 5 {2, 4} c before e Constraints for the minimal cost solution: d before c and c before e

161 Timing aware state encoding Solve only state conflicts reachable in the RT assumptions domain Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic State variables inserted concurrently with I/O events => latency and cycle time reduction

162 Value of Relative Timing RT circuits provides up to 2-3x (1.3-2x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual Back-annotation of timing constraints => minimal required timing information for the back-end tools Timing-aware state encoding allows significant area/performance optimization

163 Specification (STG + user assumptions) Lazy State Graph Lazy SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis Timing-aware state encoding Boolean minimization Logic decomposition Technology mapping Design Flow with Timing Required Timing Constraints Automatic Timing Assumptions

164 FIFO example FIFO li lo ro ri li- li+ lo+ lo- ro+ ro- ri+ ri-

165 Speed-Independent Implementation without concurrency reduction 3 state signals are required

166 SI implementation with concurrency reduction li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- + gC + -

167 RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x-

168 RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- To satisfy the constraint: Delay(x- ) < Delay (ri+ ) and Delay(lo+) + Delay(x- ) < Delay(ro+ ) + Delay (ri+ ) All constraints are either satisfied by default or easy to satisfy by sizing

169 Introduction to asynchronous circuit design: specification and synthesis Part IV: Synthesis from HDL Other synthesis paradigms

170 Outline Synthesis from standard HDL (Verilog) [L. Lavagno et al Async00] –Subset for asynchronous specification –Data-path/control partitioning –Circuit architecture. Control generation Synthesis from asynchronous HDL (CSP, Tangram) –CSP for control generation [A. Martin et al, Caltech] –Tangram for silicon compilation [K. van Berkel et al, Philips] Control synthesis using FSMs [K. Yun, S. Nowick] –Burst-mode machines –Comparison with STGs Disclaimer: this is NOT a comprehensive review

171 Motivation Language-based design key enabler to synchronous logic success Use HDL as single language for specification logic simulation and debugging synthesis post-layout simulation HDL must support multiple levels of abstraction

172 Control-data partitioning Splitting of asynchronous control and synchronous data path Automated insertion of bundling delays CONTROL UNIT DATA PATH delay request acknowledge

173 Design flow Control/data splitting STG (control) HDL specification Synthesizable HDL (data) Synthesis (petrify) Timing analysis (Synopsys) HDL implementation Synthesis (Synopsys) Logic implementation Delay insertion Logic delays

174 Asynchronous Verilog subset by example always begin wait(start); R = SMP * 3; RES = SMP * 4 + R; if(RES[7] == 1) RES = 0; else begin if(RES[6] == 1) RES = 1; end; done = 1; wait(!start); done = 0; end R RESRES SMP donestart RES C.U. begin-end for sequencing, fork-join for concurrency, if- else for input choice Only structured mix of sequencing, concurrency and choice can be specified

175 Controller design flow Trace Expressions Circuit Petri Net Transformations Reductions Synthesis HDL Syntax-directed translation

176 Trace expressions: example ( a || ( b ; c) ) || (d e) || ;  a bc de

177 Reduction Example a fb c dg h e d;  a; ( b || f ) c g; h;  e

178 Transformation: concurrency reduction a fb c d ; || a bcdf ; ; Concurrency in TE: b and f have a common parallel father

179 a fb c d f and b are ordered ; || a bcdf ; ; ; Transformation: concurrency reduction

180 Synthesis Place-based encoding ( based on a David-cell approach) Transformations to improve area and performance Structural methods to derive a circuit [Pastor et al.] Transactions on CAD, Nov’98

181 Place-based encoding p1 p2 p3 p4 t1 t2 p3+ p1- p2- p4+ p3- t1 t2 p1+ p2+ p4- 1100 0010 0001 ER(t1) = 111- ER(t2) = --11

182 Synthesis example: VME bus p2+ ldtack+ p8-p11- lds+ p1+ D+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ D-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- LDTACK+ D+ DTACK+ DSr- D- DTACK-LDS- LDTACK-DSr+ LDS+ Place encoding

183 VME bus spec after transforms p2+ ldtack+ p8-p11- lds+ p1+ D+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ D-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- ldtack+ lds+d+ dtack+ dsr-p9+ d- lds-dtack- p9-ldtack- dsr+ Reductions Transforms

184 Deriving Next state function x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ? 000 1-0 1-1 0-1 -0- -1- 010

185 Deriving Next State function x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ? 000 1-0 1-1 0-1 10- 11- -11 010 y = x + z

186 Conclusion Initial prototype of automated flow without state explosion for ASIC design –From HDLs (control / data splitting) –Existing tools for data-path synthesis –Direct synthesis guarantees implementation (HDL  Petri net, Petri-net-based encoding) –Synthesis of large controllers by efficient spec models (Free-choice Petri nets + trace expressions) –Exploration of the design space (optimization) by property-preserving transformations –Logic synthesis by structural methods Quality of design often acceptable Timing post-optimization can be applied

187 Synthesis from asynchronous HDL CSP based languages CSP = communicating sequential processes [T. Hoare] Two synthesis techniques –based on program transformations [Caltech] –based on direct compilation [Philips] Tools are more mature than for asynchronous synthesis from standard HDL Complete shift in design methodology is required

188 Using CSP for control generation After li goes high do full handshake at the right, then complete handshake at the left and iterate. li+ro+ri+ro-ri-lo+li-lo- ro ri li lo Q element *[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-] “;” = sequencing operator ro+ = ro goes high; ro- = ro goes low [li] = wait until li is high; [not li] = wait until li is low CSP: STG:

189 Using CSP for control generation *[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-] Conflict: ro+ and ro- are not mutually exclusive (since ri+ and li+ are not) Eliminate conflict by state signal insertion (= CSC) CSP: Production rules: li -> ro+; ri -> ro- not ri -> lo+; not li -> lo- ri li ro weak

190 Conflict elimination *[[li];ro+;[ri];x+;[x];ro-;[not ri];lo+;[not li];x-;[not x];lo-] CSP: Production rules: not x and li -> ro+; x or not li -> ro- x and not ri -> lo+; not x or ri -> lo- ri -> x+; not li -> x- FF x not x li lo ri ro

191 Conclusions Generating circuits from CSP control program is similar to STG synthesis One can be reduced to the other Particular technique may vary. Direct CSP program transformations can be (and were) used instead of methods based on state space generation See reference list for more details

192 Buffer example in Tangram (a?byte & b!byte) begin x0: var byte | forever do a?x0 ; b!x0 od end Buffer * x a b T ; T a b passive port active port Each circle mapped to a netlist Data path Q element

193 Summary Tangram program is partitioned into data path and control Data path is implemented as dual or single rail Control is mapped to composition of standard elements (“;” “||” etc) Each standard element is mapped to a circuit Post-optimization is done Composing islands of control elements and re-synthesis with STG can give more aggressive optimization Philips made a few chips using Tangram, including a product: 8051 micro-controller in low-power pager Muna (25 wks battery life from one AAA battery) Similar approach used in Balsa (Manchester Univ., public domain)

194 Burst mode FSM s1 s2 s3 s4 b-/x- a+b+/y+ a-/x+y- c+/y- c-/y+ Close to synchronous FSMs with binary encoded I/O Work in bursts: –Input transitions fire –Output transitions fire –State signals change Mostly limited to fundamental mode: next input burst cannot arrive before stabilization at the outputs

195 Extended Burst mode s1 s2 s3 s4 b-/x- a+b*/y+ a-/x+y- c+/y- c-/y+ Directed don’t cares (b*): some concurrency is allowed for input transitions that do not influence an output burst Conditional guards = “if b=1 then …”

196 Synthesis of XBM Next state and output functions free of functional and logic hazards Sequential feedbacks should not introduce new hazards State assignment –one state of the BM spec to one layer of Karnaugh map –compatible layers are merged –layers are compatible if merging does not introduce CSC violations or hazards –Layers are encoded using race free encoding

197 XBM and STG s1 s2 s3 s4 b-/x- a+b*/y+ a-/x+y- c+/y- c-/y+ x- a+ y+ b+ eps c- a- c+ y- y+ x+ y- b-

198 Summary Specification: XBM is subclass of STGs Synthesis: techniques are extensions of synchronous state assignment and logic minimization Timing: –environment is limited to fundamental mode (difficult for pipelined and highly concurrent systems) –internals are delay insensitive See reference list for details

199 Summary Specification: Signal Transition Graph (formalized timing diagram) Synthesis: –state encoding –Boolean function derivation –algebraic and Boolean sequential decomposition –technology mapping Timing: –delay model implies timing constraints –exploiting timing assumptions leads to minimization and generates further assumptions Future work: –integrated flow –testing


Download ppt "Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky,"

Similar presentations


Ads by Google