Presentation is loading. Please wait.

Presentation is loading. Please wait.

Southampton: Oct 99Asynchronous Circuit Compilation- 1 Asynchronous Circuit Compilation Dr. Doug Edwards

Similar presentations


Presentation on theme: "Southampton: Oct 99Asynchronous Circuit Compilation- 1 Asynchronous Circuit Compilation Dr. Doug Edwards"— Presentation transcript:

1 Southampton: Oct 99Asynchronous Circuit Compilation- 1 Asynchronous Circuit Compilation Dr. Doug Edwards doug@cs.man.ac.uk

2 Southampton: Oct 99Asynchronous Circuit Compilation- 2 Overview: n Asynchronous circuits n Advantages n Asynchronous Design Paradigms n Syntax Directed Compilation Handshake Circuits n Balsa n Datapath Compilation n Design Example - DMA Controller

3 Southampton: Oct 99Asynchronous Circuit Compilation- 3 Asynchronous (self-timed) Basics n Synchronous circuits a global clock separates system states – A time domain view of system activity. n Asynchronous circuits input changes separate system states –A sequence or trace domain view of system activity.

4 Southampton: Oct 99Asynchronous Circuit Compilation- 4 Why Asynchronous? n Low Power data-driven: power is only used to do useful work zero power when idle with instant restart n Low EMI In a clocked circuit, all noise is correlated Async circuits have “distributed” switching activity leading to uncorrelated EMI

5 Southampton: Oct 99Asynchronous Circuit Compilation- 5 Why Asynchronous? n No clock distribution problems n Composability/Modularity facilitates IP reuse n Average Case Performance exploit the fact that worst-case often occurs infrequently

6 Southampton: Oct 99Asynchronous Circuit Compilation- 6 Timing Models n Delay Insensitive (DI) Delays in circuits & wires are arbitrary n Quasi-Delay Insensitive (QDI) Similar to DI but assuming isochronic forks n Speed Independent (SI) Wires have no delays, arbitrary gate delays n Bounded Delay Single-sided timing constraints

7 Southampton: Oct 99Asynchronous Circuit Compilation- 7 Asynchronous Design Paradigms n AFSMs - for fast controllers etc Traditionally hard –hazards, races,state asigment problems Research has led to new techniques –STG/Petri net based SI circuits –Burst-Mode circuits n Macromodule-like for larger systems micropipeline approach, handshake circuits

8 Southampton: Oct 99Asynchronous Circuit Compilation- 8 n With no clock, some other means is required to co-ordinate control flow n Use a request/acknowledge handshake Asynchronous Control Req Ack Sender

9 Southampton: Oct 99Asynchronous Circuit Compilation- 9 Signalling Protocols n req & ack are abstractions: layer a signalling protocol on top of them: n Two common protocols 2-phase (transition signalling, NRZ) 4-phase (Return-to-Zero signalling)

10 Southampton: Oct 99Asynchronous Circuit Compilation- 10 Data Validity Models n Self Timed The validity of the data is encoded within the data itself – redundant coding e.g. Dual Rail: each data bit requires two wires. 00 -> no data, 01 -> ‘0’, 10 -> ‘1’ n Bundled Data approach conventional datapath validity is assured by imposing timing constraints.

11 Southampton: Oct 99Asynchronous Circuit Compilation- 11 valid 1 transaction1 transaction valid  Req Ack 2-phase Protocol n Events are transitions:

12 Southampton: Oct 99Asynchronous Circuit Compilation- 12 4-phase protocol n Signals are returned to initial state after each transaction Several possible interleavings of the signal transitions

13 Southampton: Oct 99Asynchronous Circuit Compilation- 13 Comparison of Approaches n 2-phase/4-phase 2-phase conceptually simpler (once an event mind-set is adopted) 2-phase circuits slower & more complex think 2-phase, build 4-phase n Bundled-Data/Dual-rail Current orthodoxy: bundled data is faster, lower power, smaller area with tolerancing task no worse than for a clocked design

14 Southampton: Oct 99Asynchronous Circuit Compilation- 14 Current Approach n QDI control n Bounded-Delay (bundled-data) datapath n 4-phase signalling Amulet3i

15 Southampton: Oct 99Asynchronous Circuit Compilation- 15 Asynchronous HDLs n Conventional programming languages lack 3 necessary constructs: communication parallelism/concurrency sharing (of hardware) n Conventional HDLs lack adequate fine-grain concurrency channel based communication primitives

16 Southampton: Oct 99Asynchronous Circuit Compilation- 16 Asynchronous HDLs – 2 n Tangram, Balsa CSP based + data types + … based on underlying formal semantics –guarantees correct composition rules –easier composition than in sync circuits??? transparent compilation –each production rule in the language translates to an intermediate handshake circuit –allows designer to infer circuit costs & performance from the program

17 Southampton: Oct 99Asynchronous Circuit Compilation- 17 Handshake Circuits - 1 n Circuits communicate along channels n Channels connect ports at circuit interface n Ports have: Type Direction Sense

18 Southampton: Oct 99Asynchronous Circuit Compilation- 18 Handshake Circuits - 2 n Port type determines the number of data wires no data wires == control only port! n Port direction is input, output or control only n Port sense Active: initiates transfers Passive: responds to requests

19 Southampton: Oct 99Asynchronous Circuit Compilation- 19 Micropipeline-Style Circuits: Push Circuits: Circuit waits for data passive input req ack data cct active output req ack data

20 Southampton: Oct 99Asynchronous Circuit Compilation- 20 Micropipeline-Style Circuits: Push Circuits: data arrives req ack data cct req ack data

21 Southampton: Oct 99Asynchronous Circuit Compilation- 21 Micropipeline-Style Circuits: Push Circuits: data validity signalled req ack data cct req ack data

22 Southampton: Oct 99Asynchronous Circuit Compilation- 22 Micropipeline-Style Circuits: Push Circuits: circuit accepts data req ack data cct req ack data

23 Southampton: Oct 99Asynchronous Circuit Compilation- 23 Micropipeline-Style Circuits: Push Circuits: circuit signals data taken req ack data cct req ack data

24 Southampton: Oct 99Asynchronous Circuit Compilation- 24 Micropipeline-Style Circuits: Push Circuits: Circuit outputs data req ack data cct req ack data

25 Southampton: Oct 99Asynchronous Circuit Compilation- 25 Micropipeline-Style Circuits: Push Circuits: Circuit signals validity req ack data cct req ack data

26 Southampton: Oct 99Asynchronous Circuit Compilation- 26 Micropipeline-Style Circuits: Push Circuits: receiver takes data req ack data cct req ack data

27 Southampton: Oct 99Asynchronous Circuit Compilation- 27 Micropipeline-Style Circuits: n 4-phase protocol not detailed n Previous circuit decoupled input and ouput implies a latch inside the handshake circuit n An alternative is for the input handshake to enclose the output handshake

28 Southampton: Oct 99Asynchronous Circuit Compilation- 28 Enclosed Handshake: Push Circuits: data arrives req ack data cct req ack data

29 Southampton: Oct 99Asynchronous Circuit Compilation- 29 Enclosed Handshake: Push Circuits: data validity signalled req ack data cct req ack data

30 Southampton: Oct 99Asynchronous Circuit Compilation- 30 Enclosed Handshake: Push Circuits: circuit accepts data req ack data cct req ack data

31 Southampton: Oct 99Asynchronous Circuit Compilation- 31 Enclosed Handshake: Push Circuits: Circuit outputs data req ack data cct req ack data

32 Southampton: Oct 99Asynchronous Circuit Compilation- 32 Enclosed Handshake: Push Circuits: Circuit signals validity req ack data cct req ack data

33 Southampton: Oct 99Asynchronous Circuit Compilation- 33 Enclosed Handshake: Push Circuits: receiver takes data req ack data cct req ack data

34 Southampton: Oct 99Asynchronous Circuit Compilation- 34 Enclosed Handshake: Push Circuits: input handshake completes No latch required req ack data cct req ack data

35 Southampton: Oct 99Asynchronous Circuit Compilation- 35 Tangram Style Circuits Pull Circuits: active ported circuits/ control driven req ack data cct req ack data active input port

36 Southampton: Oct 99Asynchronous Circuit Compilation- 36 Tangram Style Circuits Pull Circuits: Circuit demands data req ack data cct req ack data

37 Southampton: Oct 99Asynchronous Circuit Compilation- 37 Tangram Style Circuits Pull Circuits: data is sent on demand req ack data cct req ack data

38 Southampton: Oct 99Asynchronous Circuit Compilation- 38 Tangram Style Circuits Pull Circuits: data is accepted and can then be released req ack data cct req ack data

39 Southampton: Oct 99Asynchronous Circuit Compilation- 39 Balsa n Language for synthesising large async circuits & systems n CSP/OCCAM background n Tangram-like based on Tangram compilation function compiles to a small (but expanding) set of handshake circuits origins: ESPRIT EXACT project

40 Southampton: Oct 99Asynchronous Circuit Compilation- 40 Balsa Language Features n Data types based on sequence of bits Arrays and records are bit-based Element extraction is by array slicing Strict data typing n Structural iteration n Arrayed channels n Parameterised & recursive functions

41 Southampton: Oct 99Asynchronous Circuit Compilation- 41 Balsa Language Features n Enclosed selection semantics Allows passive ported circuits Allows push (micropipeline-style) circuits Allows unbuffered (latch-free) circuits Can be considered a restricted form of Burns’ probe construct.

42 Southampton: Oct 99Asynchronous Circuit Compilation- 42 Balsa Source

43 Southampton: Oct 99Asynchronous Circuit Compilation- 43 Example: Single Place Buffer import [balsa.types.basic] public type word is 16 bits procedure buffer (input i : word; output o : word) is local variable x : word begin loop i -> x;-- Input communication o <- x-- Output communication end library mechanism visibility type declaration channel declarations procedure definition implies latch repeat forever sequential operation read input channel into local variable x output local variable x to output channel

44 Southampton: Oct 99Asynchronous Circuit Compilation- 44 Buffer Handshake Circuit Single-place buffer  # x T ; T io activation channel repeater sequencer variable transferrer

45 Southampton: Oct 99Asynchronous Circuit Compilation- 45 # Buffer Handshake Circuit Single-place buffer repeater is activated  x T ; T io

46 Southampton: Oct 99Asynchronous Circuit Compilation- 46 ; # Buffer Handshake Circuit Single-place buffer Sequencer handshakes to left transferrer  x TT io

47 Southampton: Oct 99Asynchronous Circuit Compilation- 47 ; # Buffer Handshake Circuit Single-place buffer transferrer requests data from environment  x TT io

48 Southampton: Oct 99Asynchronous Circuit Compilation- 48 x ; # Buffer Handshake Circuit Single-place buffer data transferred to variable x  TT io

49 Southampton: Oct 99Asynchronous Circuit Compilation- 49 x ; # Buffer Handshake Circuit Single-place buffer variable handshake completes  TT io

50 Southampton: Oct 99Asynchronous Circuit Compilation- 50 x ; # Buffer Handshake Circuit Single-place buffer transferrer handshake completes to environment  TT io

51 Southampton: Oct 99Asynchronous Circuit Compilation- 51 x ; # Buffer Handshake Circuit Single-place buffer transferrer handshake completes  TT io

52 Southampton: Oct 99Asynchronous Circuit Compilation- 52 x ; # Buffer Handshake Circuit Single-place buffer Sequencer handshakes to right transferrer  TT io

53 Southampton: Oct 99Asynchronous Circuit Compilation- 53 x ; # Buffer Handshake Circuit Single-place buffer Transferrer reads variable  TT io

54 Southampton: Oct 99Asynchronous Circuit Compilation- 54 x ; # Buffer Handshake Circuit Single-place buffer Transferrer outputs to environment  TT io

55 Southampton: Oct 99Asynchronous Circuit Compilation- 55 x ; # Buffer Handshake Circuit Single-place buffer handshakes complete  TT io

56 Southampton: Oct 99Asynchronous Circuit Compilation- 56 x ; # Buffer Handshake Circuit Single-place buffer Sequencer completes its input handshake  TT io

57 Southampton: Oct 99Asynchronous Circuit Compilation- 57 Buffer Handshake Circuit Single-place buffer repeater initiates another transfer, etc x ; #  TT i o

58 Southampton: Oct 99Asynchronous Circuit Compilation- 58 Example: Single Place Buffer import [balsa.types.basic] public type word is 16 bits procedure buffer (input i : word; output o : word) is local variable x : word begin loop i -> x;-- Input communication o <- x-- Output communication end

59 Southampton: Oct 99Asynchronous Circuit Compilation- 59 Example: 2-place buffer import [balsa.types.basic] import [buffer1a] public type word is 16 bits procedure buffer2c (input i : word; output o : word) is local channel c : word begin buffer (i, c) || buffer (c, o) end parallel composition reuse component internal channel connects two 1-place buffers buffers connected by common signal name

60 Southampton: Oct 99Asynchronous Circuit Compilation- 60 2-place Buffer Handshake Circuit B i x   par component o cc passivator

61 Southampton: Oct 99Asynchronous Circuit Compilation- 61 2-place Buffer Handshake Circuit x ; # T T i x ; # T T #  #  par component o cc passivator

62 Southampton: Oct 99Asynchronous Circuit Compilation- 62 Peephole Optimisation n Composition of handshake circuits leads to inefficiencies at circuit boundaries n Straightforward peephole optimizations

63 Southampton: Oct 99Asynchronous Circuit Compilation- 63 2-place Buffer Handshake Circuit x ; # T T i x ; # T T #  #  par component o cc passivator

64 Southampton: Oct 99Asynchronous Circuit Compilation- 64 Optimized 2-place Buffer Circuit x ; #  T T i x ; # T  control-only

65 Southampton: Oct 99Asynchronous Circuit Compilation- 65 The Repeater n “Formal” Definition REP(a ,b ) = (a  : #[b ]) denotes active port  denotes passive port # denotes repeat : denotes handshake enclosure

66 Southampton: Oct 99Asynchronous Circuit Compilation- 66 The Repeater n “Formal” Definition REP (a ,b ) = (a  : #[b ]) = (a   : #[b  ;b  ]) = (a r   : #[b r  ; b a  ; b r  ; b a  ]) b r b a a r a a

67 Southampton: Oct 99Asynchronous Circuit Compilation- 67 The Transferrer n Several Implementations simplest – wire-only: arar crcr baba a brbr caca data[n]

68 Southampton: Oct 99Asynchronous Circuit Compilation- 68 Balsa Toolkit -1 n balsa-c The compiler for the language n breeze2dot Produces a postscript plot of the generated handshake circuits n breezecost Reports the cost of the compiled circuit in arbitrary units

69 Southampton: Oct 99Asynchronous Circuit Compilation- 69 Balsa Toolkit -2 n breeze2lard The interface to the LARD simulation environment. –balsa source is translated to LARD –simple test harness is generated n balsa-md An automatic makefile generation facility. n balsa-mgr A GUI project manager

70 Southampton: Oct 99Asynchronous Circuit Compilation- 70 Mod-16 Counter (all even)

71 Southampton: Oct 99Asynchronous Circuit Compilation- 71 Bundled-Data Datapaths n Problems random standard cell layout –mixed control + datapath timing analysis required robustness of design reduced n Possible Solutions DI codes hybrid bundled + DI simpler timing analysis

72 Southampton: Oct 99Asynchronous Circuit Compilation- 72 DI Codes n Dual Rail (used in 1st Tangram system) Can use standard cell approach without timing analysis –no need to distinguish between control & data abandoned in favour of bundled-data –area cost in extra wires –area & time cost in completion detection Tangram/Balsa generates push-pull pipelines with expensive synchronization

73 Southampton: Oct 99Asynchronous Circuit Compilation- 73 Generic Pipeline n Passivators join compiled procedure B i B   o cc passivator

74 Southampton: Oct 99Asynchronous Circuit Compilation- 74 Passivator Implementation n Bundled Data n Dual Rail arar babaa brbr data[n] d0d0 d1d1 C brbr babaa n-wide C-gate C C n-bits wide d n-1

75 Southampton: Oct 99Asynchronous Circuit Compilation- 75 DI Code Synchronizations n Expensive need C-element synchronisation tree n A partial solution (not always possible/desirable) is: transform to push-style datapath –(not possible in Tangram only Balsa)

76 Southampton: Oct 99Asynchronous Circuit Compilation- 76 Push Pipeline B i B   o cc Passive input port connector (wires-only)

77 Southampton: Oct 99Asynchronous Circuit Compilation- 77 Hybrid Solutions n Use DI coding within bundled datapath framework e.g. use dual-rail carry signals within a conventional adder –early completion easily detected n Average-case performance n Only applicable to a few datapath operations

78 Southampton: Oct 99Asynchronous Circuit Compilation- 78 Simpler Timing Analysis n Separate control and datapath generate regular, compiled, datapath –area improvement over standard cell (because of regular layout) – generate matched delay paths (c.f. self-timed PLAs) must be able to recognize datapath –difficult: control often contains datapath-like elements. –e.g. start at variables and work backwards...

79 Southampton: Oct 99Asynchronous Circuit Compilation- 79 Datapath meets Control n Example: Balsa case statement data “n” bits wide true/complement lines: dual-rail expansion 1 hot encoding

80 Southampton: Oct 99Asynchronous Circuit Compilation- 80 Case Component n input from datapath dual-rail simplifies internal logic n expansions parameterisable n “encode” component is similar opposite of case with true/false expansion

81 Southampton: Oct 99Asynchronous Circuit Compilation- 81 Simpler Timing Analysis n Tool support required use existing (non-Balsa) tools if possible automatically add matched paths/delays to synthesised datapaths n Design own cells where appropriate e.g. hybrid stages

82 Southampton: Oct 99Asynchronous Circuit Compilation- 82 Future Work n Provide support for DI, hybrid and datapath-compiled datapaths even with datapath compilation, some datapath would still be standard cell –e.g. instruction decoder (control heavy) –datapath in control cost of connecting separate blocks in layout n Test Design required (datapath heavy)

83 Southampton: Oct 99Asynchronous Circuit Compilation- 83 Tool Enhancement n balsa-c support for attribution to select compilation mechanisms/ optimisation schemes n breeze2lard new models n balsa-netlist: new tech-mapping descriptions interface to datapath compilers

84 Southampton: Oct 99Asynchronous Circuit Compilation- 84 AMULET3i n Asynchronous macrocell ARM compatible processor core Full custom RAM Compiled ROM Balsa compiled DMA controller Test I/F, synchronous and off-chip bus bridges n Synchronous peripherals Designed by commercial partner...

85 Southampton: Oct 99Asynchronous Circuit Compilation- 85 AMULET3 System CPU / RAM ROMDMAC Periph1 Sync bridge MARBLESOCB

86 Southampton: Oct 99Asynchronous Circuit Compilation- 86 DMA Local RAM Access CPU / RAM ROMDMAC Periph1 Sync bridge MARBLESOCB

87 Southampton: Oct 99Asynchronous Circuit Compilation- 87 DMA Peripheral Accesses CPU / RAM ROMDMAC Periph1 Sync bridge MARBLESOCB DMA requests

88 Southampton: Oct 99Asynchronous Circuit Compilation- 88 Requirements / Specification n 16 clients, 32 channels n 3 channel types - complicated register structure n Programmable client  channel 1  many mapping n Support synchronous requests n Transfers mostly between synchronous clients

89 Southampton: Oct 99Asynchronous Circuit Compilation- 89 Controller Structure

90 Southampton: Oct 99Asynchronous Circuit Compilation- 90 Two Controller Descriptions n Sequential (previous slides) Very simple control flow Requires two passes through register bank Slow!, Only memory decoupling helps n Parallel (next slides) Decouple TE actions from memory R/W with a new unit: Transfer Interface Interrupt the register bank on end of transfer

91 Southampton: Oct 99Asynchronous Circuit Compilation- 91 “Parallel” Design

92 Southampton: Oct 99Asynchronous Circuit Compilation- 92 The Design n 919 lines of Balsa describing register bank control, TE and TI. n Custom register banks and Synchronous Peripheral Interface n Miscellaneous glue standard cells Register bank controllers MARBLE interfaces n Compass Design Automation CAD

93 Southampton: Oct 99Asynchronous Circuit Compilation- 93 Implementation Technology n 0.35  m, 3LM CMOS n Standard cells from ARM Ltd. n Locally designed complex gates and asynchronous elements/gates. n Automated standard cell P&R n Only “essential” and simple gate level optimisation (by hand)

94 Southampton: Oct 99Asynchronous Circuit Compilation- 94 Design Partitioning Marble BUS: outside of DMA controller

95 Southampton: Oct 99Asynchronous Circuit Compilation- 95 Design Partitioning Balsa synthesised standard cells

96 Southampton: Oct 99Asynchronous Circuit Compilation- 96 Design Partitioning Custom “regular” layout

97 Southampton: Oct 99Asynchronous Circuit Compilation- 97 Design Partitioning Hand designed standard cells

98 Southampton: Oct 99Asynchronous Circuit Compilation- 98 DMA Controller Floor-Plan


Download ppt "Southampton: Oct 99Asynchronous Circuit Compilation- 1 Asynchronous Circuit Compilation Dr. Doug Edwards"

Similar presentations


Ads by Google