Presentation is loading. Please wait.

Presentation is loading. Please wait.

AY-Jan.20011 Communicating in Systems with Heterogeneous Timing Alex Yakovlev, Asynchronous Systems Laboratory University of Newcastle upon Tyne Edinburgh,11.

Similar presentations


Presentation on theme: "AY-Jan.20011 Communicating in Systems with Heterogeneous Timing Alex Yakovlev, Asynchronous Systems Laboratory University of Newcastle upon Tyne Edinburgh,11."— Presentation transcript:

1 AY-Jan Communicating in Systems with Heterogeneous Timing Alex Yakovlev, Asynchronous Systems Laboratory University of Newcastle upon Tyne Edinburgh,11 Jan. 2001

2 AY-Jan Objectives To study a range of asynchronous communication mechanisms (ACMs) that can be used in constructing (distributed and concurrent) systems with heterogeneous timing To develop hardware implementations for ACMs, using self-timed circuits for potential use in Systems-On-a-Chip (SOCs) and embedded (miniature, low power and EMC) applications Work is done within a collaborative EPSRC research project COMFORT with King’s College London.

3 AY-Jan Heterogeneously Timed Nets (hets) A1C1 A3 A4 A2 C3 C2

4 AY-Jan Hets A1C1 A3 A4 A2 C3 C2 Time/event/data-driven Data processing elements (active)

5 AY-Jan Hets A1C1 A3 A4 A2 C3 C2 Data communication elements (passive) - ACMs

6 AY-Jan Previous work Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems –high time heterogeneity but relatively low speed Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits –high speed but very limited time heterogeneity (mesa- chronous or source synchronous)

7 AY-Jan Interaction between system parts AB Comm. Mechanism (e.g. shared memory)

8 AY-Jan Terminology on timing Temporal relationship between parts A and B in a system can be: –(Globally, locally for A/B) clocked = synchronous on (global, local for A/B) clock –Self-timed = synchronous on handshakes and/or by some time constraints, e.g. I/O and fundamental modes –(Mutually) asynchronous = NOT synchronous (on global clock or on handshakes); hence asynchronous is neither self-timed nor globally clocked

9 AY-Jan Globally clocked AB Comm. Mechanism (e.g. shared memory) Global clock

10 AY-Jan Self-timed (via handshake) AB Comm. Mechanism (e.g. shared memory) Req/Ack handshake(s), possibly with bounded buffer in between

11 AY-Jan Fully Asynchronous AB Comm. Mechanism (e.g. shared memory) Timing for ATiming for B Temporal firewall

12 AY-Jan Evolution of timing (1) Globally clocked systems: Good: deterministic and predictable for real-time, safety-critical systems Bad: prone to clock skew, bad for power consumption and EMC: indiscriminate data- crunching

13 AY-Jan Evolution of timing (2) Self-timed systems (with micropipelines and handshakes): Good: no skew problems, good for power and EMC if data-driven Bad: temporal non-determinism, lockable handshakes, hence bad for real-time

14 AY-Jan Evolution of timing (3) Fully or partially Asynchronous systems: Good: distributed and heterogeneous clocking; real-time applied locally – fully predictable; self-timing can be applied where possible for power saving and EMC Bad: potential loss of information where full asynchrony (e.g. due to real-time) is applied

15 AY-Jan Asynchronous Communication mechanisms (ACMs) WriterReader ACM Level of asynchrony is defined by WRITE and READ rules

16 AY-Jan Classification of ACMs Hugo Simpson’s classification: Destructive read (read can be held up) Non-destructive read (read cannot be held up) Destructive write (write cannot be held up) Signal (event data) Pool (reference data) Non-destructive write (write can be held up) Channel (message data) Constant (configuration data)

17 AY-Jan Difficulty with Simpson’s classification Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division, but what is meant is that: –Destructive (non-destructive) write cannot (can) wait –Destructive (non-destructive) read can (cannot) wait There is symmetry (duality) between Pool and Channel but no symmetry between Signal and Constant, because Constant allows ‘constructive’ write only once - yet ‘constructive’ writes are also allowed by Signal

18 AY-Jan Petri net capture of Simpson’s protocols Signal non-destr writeempty full destr write non-destr write empty full destr read non-destr write empty full destr writenon-destr read destr read Constant Channel Pool non-destr read Constructive writes

19 AY-Jan Another interpretation Signal write read unread over-write read unread write read unread read CommandChannel Pool write read re-read read unread over-write write re-read read Constant is a special case of Command

20 AY-Jan Another interpretation Signal write read unread over-write read unread write read unread read CommandChannel Pool write read re-read read unread over-write write re-read read Busy Writer

21 AY-Jan Another interpretation Signal write read unread over-write read unread write read unread read CommandChannel Pool write read re-read read unread over-write write re-read read Lazy Writer

22 AY-Jan Another interpretation Signal write read unread over-write read unread write read unread read CommandChannel Pool write read re-read read unread over-write write re-read read Busy Reader

23 AY-Jan Another interpretation Signal write read unread over-write read unread write read unread read CommandChannel Pool write read re-read read unread over-write write re-read read Lazy Reader

24 AY-Jan Another classification of ACMs Lazy read = read only previously unread data (read can be held up) Busy read = may re- read data already read (read cannot be held up) Busy write = may over-write unread data (write cannot be held up) BW-LR (Signal) (event data) BW-BR (Pool) (reference data) Lazy write = write only if previous read (write can be held up) LW-LR (Channel) (message data) LW-BR (Command) (configuration data)

25 AY-Jan Signal vs Pool Pool Real time 1 (busy domain) Real time 2 (busy domain) Signal Real time (busy domain) Data-driven (lazy domain) Low Power!

26 AY-Jan Problems with the above Petri net definitions These Petri nets assumed: –Data capacity (max value of the data state of the ACM) equals 1 (this can be easily generalised to any finite n>0 for Channel, defined as an n-place buffer with a wide range of known hardware implementations); do we semantically need other ACMs with n>1? –Write and Read access are held up only by the data state of the ACM and not by the Read and Write operations themselves – those are treated as atomic and taking no time; in reality they are not and should be assumed to take arbitrary time

27 AY-Jan Breaking the atomicity Signal with atomic access over-write write read unread read write read unread reading over-write not-in- writing in writing Signal with non- atomic access

28 AY-Jan Breaking the atomicity Signal with atomic access over-write write read unread read write read unread in reading over-write not-in- writing in writing Signal with non- atomic access Read may be held up by write being in progress … but not write by reading! not-in-reading

29 AY-Jan But … write read unread reading over-write not-in- writing in writing Signal with non- atomic access What if Reading begins just before Writing? Problem with data integrity if only one data slot (one data token) is available

30 AY-Jan Required Properties of Signal(1) 1.Data states and their updating: –Signal’s capacity is 1 (at any time, it has either 0 or 1 unread data items) –At the end of write access, Signal’s state is set to unread (1) –At the end of read access, Signal’s state is set to read (0)

31 AY-Jan Required Properties of Signal(2) 2.Conditional asynchrony for the reader: –Read access may start only when Signal’s data state is unread (1) and no write access is in progress –Read access can be arbitrarily long 3.Unconditional asynchrony for the writer: –Write must be allowed to start and complete access at any time, regardless of Signal’s data state and the status of read access.

32 AY-Jan Required Properties of Signal(3) 4.Data coherence: –Any item of data that is read from Signal must not have been changed since been written (i.e. no writing or reading in part) 5.Data freshness: –Any read access must obtain the data item designated as the current unread item in Signal, i.e. the data item made available by the latest completed write access

33 AY-Jan Data slots and Signal “Data slot” is a unique portion of the shared memory which may contain one item of data of arbitrary (but bounded) size Signal cannot be implemented using One Slot only and satisfy all of the above properties Let us construct a Signal with TWO data slots First a formal specification, State Graph (or Transition System) must be built

34 AY-Jan Formal spec of Signal Automaton for Signal Write slot 0 (wr0) Write slot 1 (wr1) Read slot 0 (rd0) Read slot 1 (rd1) Problem: construct a maximally permissible automaton, on alphabet of {wr0,wr1,rd0,rd1}, satisfying the required properties of the Signal ACM

35 AY-Jan State Graph constraints 1. Data states, their updates and asynchrony: s wrirdj s wriwrj s rdirdj s rdiwrj 2. Data coherence: s wri rdj s wri rdj only if i<>j An wr action is enabled in every state

36 AY-Jan State Graph constraints 3. Data freshness (slot swapping): 4. No “re-try loops” (persistency in reading): s wrirdi rdj wrj s wri rdj Ifthen wrjwri there is no rdi on this path s rdi i<>j …s’ rdj

37 AY-Jan State Graph for 2-slot Signal s0 s5s1 s4 s3 s2 s0 rd0 rd1 wr1 wr0 wr1 wr0 init state

38 AY-Jan How to implement 2-slot Signal? s0 s5s1 s4 s3 s2 s0 rd0 rd1 wr1 wr0 wr1 wr0 init state In order to implement Signal we must distribute states and events between elements of implementation architecture. For that we must first separate states using a behavioural model of the implementation

39 AY-Jan Implementation architecture Writer Reader Signal control wr0 rd1rd0wr1 Wreq Wack Rreq Rack Data slots Data access Control access The following structure must be kept in mind: In hardware implementation of Signal control, latches and logic will be used to generate signals corresponding to steering events wri and rdi, events on handshakes with writer and reader, and some internal events

40 AY-Jan Behavioural model for Signal Petri nets can be used as a behavioural model (algorithm) for Signal: –A 1-safe Petri net can be synthesised from a finite Transition System using theory of regions (Ehrenfeucht, Rozenberg et al) –A 1-safe Petri net can be implemented in a self- timed circuit using either direct translation techniques or logic synthesis from Signal Transition Graphs (Yakovlev,Koelmans98)

41 AY-Jan State Graph refinement wr0 rd0 rd1 wr0 wr1 rd0 wr1         s0 s5s1 s4 s3 s2 s0 rd0 rd1 wr1 wr0 wr1 wr0 init state This Transition System cannot be synthesised into a 1-safe Petri net with unique event labelling – it requires refinement (it violates some separation conditions). There is also arbitration (conflict relation) between rdi and wrj events – in a physical implementation one cannot disable output actions

42 AY-Jan State Graph refinement wr0 rd0 rd1 wr0 wr1 rd0 wr1         Now arbitration is between internal events while wri and rdj are persistent

43 AY-Jan Distributing states b/w Write and Read parts wr0 rd0 rd1 wr0 wr1 rd0 wr1         Write superstates Write elementary states   wr0   wr Write part:

44 AY-Jan Distributing states b/w Write and Read parts wr0 rd0 rd1 wr0 wr1 rd0 wr1         Read superstates Read elementary states Read part:  rd1  rd

45 AY-Jan Completing the Petri net model   wr0   wr  rd1  rd

46 AY-Jan Introducing binary control variables 4   wr0   wr  rd1  rd w=1 w- w=0 w+ r- r+ r=1 r=0 ‘w’ encodes the slot being accessed for writing ‘r’ encodes the slot being accessed for reading

47 AY-Jan Towards circuit implementation Data-out Wreq Wack Rreq Rack Data-in Slot 0 Slot 1 Write part Read part w r set/reset test wr0 wr1 rd1rd0

48 AY-Jan Direct translation of PNs to circuits p1p2 p1 p2 (1)(0) (1) 1* (1) Operation Controlled To Operation

49 AY-Jan Direct translation of PNs to circuits p1p2 p1 p2 (1)(0) 0->11->0 (1) To Operation

50 AY-Jan Direct translation of PNs to circuits p1p2 p1 p2 1->0 0->1 1->0 1->0->1 1* To Operation

51 AY-Jan Direct translation of PNs to circuits This method associates places with latches (flip-flops) – so the state memory (marking) of PN is directly mimicked in the circuit’s state memory Transitions are associated with controlled actions (e.g. activations of data path units or lower level control blocks – by using handshake protocols) Modelling discrepancy (be careful!): –in Petri nets removal of a token from pre-places and adding tokens in post-places is instantaneous (i.e. no intermediate states) –in circuits the “move of a token” has a duration and there is an intermediate state

52 AY-Jan Translation in brief This method has been used for designing control of a token ring adaptor [Yakovlev, Varshavsky, Marakhovsky, Semenov, IEEE Conf. on Asynchronous Design Methodologies, London, 1995

53 AY-Jan Refining the Write part   wr0   wr w=1 w- w=0 w r=1 r=0 1 2 wr  w+ 3 wr r=0 r=1 w-  w=1 w=0

54 AY-Jan Control circuit for Write part 1 2 wr  w+ 3 wr r=0 r=1 w-  w=1 w=0

55 AY-Jan Implementing David cells (1) Speed-independent version: “Aggressive” relative timing version:

56 AY-Jan Implementing David cells (2) This is an peep-hole optimised solution for two David cells (places 1 and 3) and interface to the handshake with the Writer

57 AY-Jan Implementing ‘sync’ blocks r ck1 r_0 r_1 (0) (1) (0)

58 AY-Jan Simulation using Cadence toolkit metastability inside mutex Write response time input of sync output of sync

59 AY-Jan Cycle times (ns) for 0.6 micron typeWriteRead Without set- reset of w With set- reset of w No waiting for Write Speed- independent With Relative Timing

60 AY-Jan Improving performance wr0 rd0 rd1 wr0 wr1 rd0 wr1         s0 s5s1 s4 s3 s2 s0 rd0 rd1 wr1 wr0 wr1 wr0 init state In case of repetitive writing (of, eg., slot 1), read access may have to wait for the completion of write just because of a timing clash on the same slot – and not because of absence of new data in the ACM (original aim of Signal) This problem cannot be resolved within the TWO slot ACM because of coherence violation. Can we do it with an extra slot?

61 AY-Jan Towards 3-slot Signal Idea: After writing a slot (e.g.2) for the first time writer alternates between 3 and 2 Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free

62 AY-Jan Towards 3-slot Signal Idea: After writing a slot (e.g.2) for the first time writer alternates between 3 and 2 Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free or

63 AY-Jan Towards 3-slot Signal Idea: After writing a slot (e.g.2) for the first time writer alternates between 3 and 2 Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free or

64 AY-Jan slot Signal refined r l w Control variables 21(32): w(2->1) l(3->2)  32: r(3->2) Algorithm: Write part: write slot w; l:=w; w:=differ(l,r) Read part: if (r<>l) r:=l else wait; read slot r; r-read, w-write, l-last

65 AY-Jan slot Pool In Pool we must have: Read asynchrony Write part: write slot w; l:=w; w:=differ(l,r) Read part: r:=l; read slot r; Algorithm: r-read, w-write, l-last

66 AY-Jan Three-slot algorithm (due to Hugo Simpson) Writer: Reader: wr: d[n]:=input w0: l:=n w1: n:=differ(l,r) r0: r:=l rd: output:=d[r] n(next), l(last), r(read) – 3-valued var’s

67 AY-Jan Three-slot algorithm differ:

68 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last 02.01

69 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last

70 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last

71 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last 02.01

72 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last

73 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last 02.01

74 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last 02.01

75 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last

76 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last 02.01

77 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last

78 AY-Jan Three-slot Pool Writer:Reader: s next read s1 s3 last

79 AY-Jan slot ACM design write controlmutexread control differ & reg n reg lreg r l r nl r Rw0 Gw0Gr0 Rr0 w1- req/ack w0- req/ack r0- req/ack

80 AY-Jan slot ACM design write controlmutexread control differ & reg n reg lreg r l r l r Rw0 Gw0Gr0 Rr0 w1- req/ack w0- req/ack r0- req/ack n

81 AY-Jan Differ and register logic l1 l2 l3 r1 r2 w1-req r3 differregister w1-ackn2 n3 n1

82 AY-Jan slot ACM design write controlmutexread control differ & reg n reg lreg r l r nl r Rw0 Gw0Gr0 Rr0 w1- req/ack w0- req/ack r0- req/ack

83 AY-Jan Write control circuit: STG

84 AY-Jan Write control ckt: from Petrify

85 AY-Jan Four-slot Pool Writer:Reader: next read d[0,0] last d[0,1] d[1,0] d[1,1] s[0]s[1] v[0]v[1]

86 AY-Jan Four-slot Pool algorithm (H.Simpson) Writer: Reader: wr: d[n,¬s[n]]:=input w0: s[n]:= ¬s[n] w1: l:=n || n:=¬r r0: r:=l r1: v:=s rd: output:=d[r,v[r]] n (next), l(last), r(read) – binary var’s

87 AY-Jan slot vs 4-slot performance statements3-slot min time ns 4-slot min time ns w0+w r0+(r1) Time for control statements

88 AY-Jan Are we in the end fully asynchronous? Circuit implementations involve use of latches, which may go metastable. Metastability always implies a trade-off, in terms of noise, between data or time domain error. In a “truly busy (real-time)’’ environment, where the ack signal is not used, the corresponding process (e.g., writer) must allow for a small interval (3-4ns for.6  m CMOS), sufficient for metastability to get resolved practically with the probability of 1. Our h/w solutions for “busy” domains aim at maximising the “wait-free” aspect of communication but theoretically cannot fully eliminate mutual dependency between processes (hidden within ACM control variable circuits).

89 AY-Jan Concluding remarks Constructing ACMs to interface sub-systems with different time and energy requirements, and implementing them in high-speed hardware, proves feasible. Application of hets in control or image processing (e.g. via neural networks) is needed to fully assess their potential for future application-specific SOCs More work on mathematical modelling of hets and on developing an extensive parametrised library of ACM circuits is needed.

90 AY-Jan VLSI design layout (chip fab’ed in June 2000 via EUROPRACTICE) 4-slot Pool ACM

91 AY-Jan slot ACM part Tested (physically) correct (details on testing in 9thAsync UK Forum paper)

92 AY-Jan Acknowledgements and References Members of the COMFORT team: At KCL – Tony Davies, Ian Clark, David Fraser, Sergio Velastin At NCL – Fei Xia, David Kinniment, Albert Koelmans, Delong Shang, Alex Bystrov BAe colleagues: Hugo Simpson and Eric Campbell Project COMFORT web site: Work supported by EPSRC, EU (ACiD-WG) and reported and published at Async2000, AINT’2000, Async2001 etc.


Download ppt "AY-Jan.20011 Communicating in Systems with Heterogeneous Timing Alex Yakovlev, Asynchronous Systems Laboratory University of Newcastle upon Tyne Edinburgh,11."

Similar presentations


Ads by Google