Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri.

Similar presentations


Presentation on theme: "1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri."— Presentation transcript:

1 1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri

2 2 A Modular Synchronizing FIFO for NoCs Paper presented in NOC-2009 Authors : Tarik Ono -Sun Microsystems Mark Greenstreet - University of British Columbia

3 3 Motivation & Purpose of Synchronizing FIFO Timing Domain 1Timing Domain 2Timing Domain 3 Synchronizing FIFO Synchronizing FIFO Synchronizing FIFO Network-on-Chip Multiple clock domains in NoC require many FIFOs

4 4 Synchronizing FIFO Targets Design Targets for FIFO:  FIFO can be built using standard cells  Easy integration into CAD flow  Modular FIFO design with choice of clockless or clocked interfaces  Modular, simple architecture reduces NoC design time

5 5 Talk Outline FIFO Overview FIFO Blocks  Clockless Put and Get Interface  Clocked Put and Get Interface  Full-Empty Control and Data Store FIFO Latency and Throughput Implementation Results

6 6 FIFO Overview: Operation stage 1stage 2stage 3 Put Interface Get Interface Sender Receiver Timing Domain A FIFO consists of number of stages Sender communicates with Put Interface, Receiver with Get Interface Tokens determine FIFO stage for next put and get operation Timing Domain B

7 7 FIFO Overview: Structure stage 1stage 2stage 3 Put Interface Cell Sender Receiver Timing Domain A Each FIFO stage has a  Put Interface Cell  Get Interface Cell  Full-Empty Control  Data Store Timing Domain B Put Interface Cell Put Interface Cell Get Interface Cell Get Interface Cell Get Interface Cell Full-Empty Control Full-Empty Control Data Store Full-Empty Control Data Store Data Store

8 8 FIFO Overview: Modular Design stage 2stage 3 Put Interface Cell Sender Receiver Clocked Domain A Clockless Noc Put Interface Cell Get Interface Cell Get Interface Cell Full-Empty Control Data Store Full-Empty Control Data Store Data Store Get Interface Cell Put Interface Cell Full-Empty Control stage 1 CLOCKED PUT INTERFACE CLOCKLESS GET INTERFACE Mix-and-Match Interfaces

9 9 FIFO Overview: Modular Design stage 2stage 3 Sender Receiver Fast Clocked Domain A Slow Clocked Domain B Full-Empty Control Data Store Full-Empty Control Data Store Data Store Full-Empty Control stage 1 CLOCKED PUT INTERFACE CLOCKED GET INTERFACE Mix-and-Match Interfaces Can use different synchronization time lengths, depending on clock frequency Changing FIFO size doesn't affect individual FIFO stage 1 flop synchronizer 3 flop synchronizer

10 10 Full Empty Control and Data store Data Store consists of latches  enabled when write is high Same blocks for clocked or clockless interfaces Full-Empty Control consists of a SR-latch  on write, set output (full signal) high  on read, set output low

11 11 asP* FIFO asP*- Asynchronous Symmetric Persistent Pulse Protocol Standard cells Good performance Doesn’t require C-elements asP* handshaking protocol is chosen as baseline for FIFO design

12 12 asP* FIFO -simulation 0 X 111 0 000 Initial state  SR latches keeps track of empty/full status  AND gates coordinate data transfer between stages

13 13 asP* FIFO -simulation 1 D 111 Data arrives, req rises  SR latch EFi is set to indicate Li latch holds valid data 0000

14 14 asP* FIFO -simulation 1 D 1 1 11 Data arrives, req rises  SR latch EFi is set to indicate Li latch holds valid data 00 0 0

15 15 asP* FIFO -simulation 1 D 1 1 D 11 Data propagates through L1 0 0 00

16 16 asP* FIFO -simulation 1 D 0 1 D 111 SR latch EF1 is set 000

17 17 asP* FIFO -simulation 0 X 0 0 D 111 1 Enabling L2 latch  When stage i-1 is full and i is empty AND gate goes high loading data to Li 0 00

18 18 asP* FIFO -simulation 0 X 1 0 D 001 1 D 1 Clearing EF1 latch  When stage i-1 is full and i is empty AND gate goes high loading data to Li  Clearing SR EFi-1 latch to indicate that latch Li is now empty 00

19 19 asP* FIFO -simulation 0 X 1 0 D 001 0 D 1 1 00

20 20 asP* FIFO -simulation 0 X 1 0 D 010 0 D 0 1 D 1 0 Data available at output data_R  Req_R goes high as data arrives to last stage

21 21 asP* FIFO -simulation 0 X 1 0 D 010 0 D 0 0 D 1 0

22 22 asP* FIFO -simulation 1 D1 1 1 D 010 0 D 0 0 D 1 Next data enters FIFO  Actually it can enter just after ack_L falls indicating first data is written 0

23 23 asP* FIFO -simulation 0 D1 0 1 110 0 D 0 0 D 1 0

24 24 asP* FIFO -simulation 0 X 0 0 D1 110 1 D 0 0 D 1 0

25 25 asP* FIFO -simulation 0 X 1 0 D1 000 1 1 0 D 1 0

26 26 asP* FIFO -simulation 1 D2 1 0 D1 000 0 1 0 D 1 Next data enters FIFO 0

27 27 asP* FIFO -simulation 1 D2 1 1 000 0 D1 1 0 D 10

28 28 asP* FIFO -simulation 1 D2 0 1 100 0 D1 1 0 D 1 0

29 29 asP* FIFO -simulation 0 X 0 0 D2 100 0 D1 1 0 D 10

30 30 asP* FIFO -simulation 1 D3 0 0 D2 100 0 D1 1 0 D 1 FIFO FULL! No Acknowledge until next read out 0

31 31 asP* FIFO -simulation 1 D3 0 0 D2 100 0 D1 1 0 D 11 1 D Ack_R rises, data read out

32 32 asP* FIFO -simulation 1 D3 0 0 D2 101 0 D1 1 1 D 01 1

33 33 asP* FIFO -simulation 1 D3 0 0 D2 101 0 D1 1 1 00 0 Data propagates to empty space

34 34 asP* FIFO -simulation 1 D3 0 0 D2 110 0 D1 0 1 10 0

35 35 asP* Put Interface Cell 1 D3 0 0 D2 110 1 0 0 D1 10 0 Data propagates to empty space

36 36 asP* FIFO -simulation 1 D3 1 0 D2 000 1 1 0 D1 10 0

37 37 asP* FIFO -simulation 1 D3 1 0 D2 000 0 1 0 D1 10 0

38 38 asP* FIFO -simulation 1 D3 1 1 000 0 D2 1 0 D1 10 0 Now D3 can enter FIFO

39 39 asP* FIFO -simulation 1 D3 0 0 100 0 D2 1 0 D1 10 0

40 40 asP* FIFO -simulation 0 X 0 0 D3 100 0 D2 1 0 D1 10 0 Sender lowers Req_L

41 41 asP* FIFO - Timing Issue 1 1 1 0 D 1 0 T [En->Q ] Q ]+T AND

42 42 asP* FIFO - Timing Issue 1 11 0 0 0 MinResetPulseWidth[ R->Q ] Q ]+T AND

43 43 3-stage clockless FIFO Write Port Read Port Write requestRises if write succeeded Rises if data available at output Receiver acknowledge receipt of data

44 44 Stage of clockless FIFO Latches to load data Written when cell is empty Tri-state buffer Transfers tokens

45 45 asP* Put Interface Cell Signal from Sender (fanout to all stages)

46 46 asP* Put Interface Cell Signal to Sender (fanin from all stages)

47 47 asP* Put Interface Cell Signal to Data Store and Full-Empty Control

48 48 asP* Put Interface Cell Signal from Full-Empty Control

49 49 asP* Put Interface Cell Signal from previous stage Signal to next stage

50 50 asP* Put Interface Cell Sets in all but one cell to low

51 51 asP* Put Interface Cell

52 52 asP* Put Interface Cell

53 53 asP* Put Interface Cell

54 54 asP* Put Interface Cell

55 55 asP* Put Interface Cell

56 56 asP* Put Interface Cell

57 57 asP* Put Interface Cell

58 58 asP* Get Interface Cell Signal from Receiver

59 59 asP* Get Interface Cell Signal to Receiver

60 60 asP* Get Interface Cell Signal to Data Store and Full-Empty Control

61 61 asP* Get Interface Cell Signal from Full-Empty Control

62 62 asP* Get Interface Cell Signal to all stages

63 63 asP* Get Interface Cell -simulation 1 0 00 1 1 01 1 0 0 0

64 64 Full –empty cell Keeps track of whether cell is empty or full Set by write operation from put interface Reset by read operation from get interface AND gate ensures MUTEX on Set and Reset  Avoid races  Simplifies timing

65 65 Timing requirements for FIFO  The minimum low time for req_put must be at least as large as the minimum clock pulse width for the FFs in the put interfaces.  The minimum high time for req_put must be at least as large as the minimum pulse width for the set signal of the SR latch in the empty/full controller.  The minimum high time for got_data must be at least as large as the minimum pulse width for the set signal of the SR latch.

66 66 Protocol converters asP* simple and efficient But: timing constraints make it unsuitable for long interconnect LEDR is delay insensitive and better suited for long interconnect Other converters possible

67 67 LEDR protocol –brief overview Dual-rail encoding: two wires per bit – delay-insensitive “Level-encoding”: Data rail: holds actual data value Parity rail: holds parity value Alternating-phase protocol: Encoding parity alternates between odd and even 0 1 Even 0 0 1 1 Odd 0 1 1 0 data rail parity rail parity rail Bit value LEDR Encoding Phase

68 68 LEDR signaling data parity evenoddevenevenoddevenodd Data rail: carries bit value in both phases Parity rail: phase alternates with each data item 0100111 Exactly one wire transition for each new data item

69 69 LEDR - completion detector 1-bit LEDR completion detector N-bit LEDR completion detector

70 70 LEDR-to-asP* converter Completion detector per bit Even parity detector Odd parity detector Store data when all data [1:n] bits have changed LEDR to asP* converter

71 71 LEDR-to-asP* converter In this Example : Assume Even parity phase 1 X P 1 0 0 D 1 0 0 D 1 0 1 D

72 72 LEDR-to-asP* converter 1 D P 1 0 0 X 0 0 1 D 1 0 1 1 0 0 1 1 0 1 1 0 X X

73 73 asP*-to-LEDR converter 0 0 0 0 0 0 0

74 74 asP*-to-LEDR converter 0 0 0 0 0 0 0 1 1 D D 1 1 DP 1 0 1 0 0 1 1

75 75 Clocked FIFOs Design goal is to provide all flavors of synchronization converters Synchronous-to-Asynchronous Asynchronous-to-Synchronous Synchronous-to-Synchronous Asyn-to-Sync and Sycn-to-Async is obtained by combining async put interface with sync get interace and vice versa Synchronous-to-Synchronous will be detailed in next slides

76 76 3-Stage Clocked FIFO Indicates that Data can be put into FIFO Ensures fully sync behavior

77 77 FIFO stage with clocked RX and TX

78 78 Clocked Put Interface Cell Signal to sender Signals from sender Synchronizer ●State (full or empty) of FIFO stage is synchronized ●One 1-bit synchronizer per FIFO stage interface ●Asymmetric delay

79 79 Clocked Put Interface Cell !

80 80 Clocked Put Interface Cell !

81 81 Clocked Put Interface Cell !

82 82 Clocked Put Interface Cell !

83 83 Clocked Put Interface Cell !

84 84 Clocked Put Interface Cell !

85 85 Clocked Put Interface Cell !

86 86 Clocked Put Interface Cell !

87 87 Clocked Put Interface Cell !

88 88 Clocked Put Interface Cell Clocked get interface cell is analogous

89 89 Example of 1.5 cycle Synchronizer IN OUT Async_ OUT

90 90 Synchronizer MTBF for different synchronizers and clock speeds 90nm technology τ- metastability resolving constant

91 91 FIFO latency and throughput Latency  minimum time data spends in FIFO  independent of FIFO length Throughput  maximum number of data transfers per time  depends on FIFO length

92 92 FIFO throughput Throughput is limited by slower of put and get interfaces Put interface delay: minimum time between two successive FIFO writes Get interface delay: minimum time between two successive FIFO reads

93 93 Clocked FIFO throughput simulation Simulation scenario 2-cycle synchronizer Same put and get frequency with zero phase shift Throughput results Doesn’t allow to write every clock cycle Need to increase FIFO to 6 stages FIFO with equal put and get frequencies and n-cycle synchronizer needs 2*(n+1) stages to support max throughput

94 94 asP* FIFO latency Write latency Read latency Receiver latency Full-Empty Control

95 95 asP* FIFO latency –clockless Latency measured from rising req_put to data_valid rising (220ps) + got_data rising to empty cell status (140ps) equals at total to 360ps Throughput limited by slower get and put interface, evaluated max 1.95Ghz Power 5.27mW at 1.95Ghz 5.27mW

96 96 asP* FIFO latency –clocked Latency measured from rising clk_put to rising clk_get with valid data (doesn’t depends on FIFO length) + tsync(173ps) Throughput gain when using 6 stage FIFO is 2 times 6 stages FIFO running at 1.28Ghz consumes 4.91mW 5.27mW

97 97 Clocked FIFO latency Measured from clk_put edge that latches data in FIFO until clk_get edge that notifies receiver of available data

98 98 Clocked FIFO throughput Throughput determined by slower of put and get interfaces There is a minimum required FIFO length to support maximum throughput Minimum FIFO length depends on  synchronization latencies  ratio of put and get clock speeds  phase relationship of put and get clock

99 99 Conclusions Presented a synchronizing FIFO that  can be built using standard cells  has modular design  following properties can be chosen independently:  type of put and get interface  synchronization time length  FIFO size  has simple interfaces

100 100 References T.Ono, M.Greenstreet. A modular synchronizing FIFO for NoCs Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip M. E. Dean, T. E. Williams, and D. L. Dill. Efficient selftiming with level-encoded 2- phase dual-rail (LEDR). 1991. MIT Press. C. E. Molnar, I. W. Jones, W. S. Coates, and J. K. Lexau. A FIFO ring performance experiment. In Advanced Research in Asynchronous Circuits and Systems, 1997. Proceedings of the Third International Symposium on, pages 279–289, Eindhoven, Apr. 1997. I. E. Sutherland. Micropipelines. Commun. ACM,32(6):720–738, June 1989. Turing Award lecture. Mark Dean, Ted Williams and David Dill, “Efficient Self-Timing with Level-Encoded 2- Phase Dual Rail(LEDR)”, ARVLSI, 1991, pp. 55-70.


Download ppt "1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri."

Similar presentations


Ads by Google