Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01] Tiberiu ChelceaSteven M. Nowick Department of Computer Science Columbia University

Similar presentations


Presentation on theme: "Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01] Tiberiu ChelceaSteven M. Nowick Department of Computer Science Columbia University"— Presentation transcript:

1 Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01] Tiberiu ChelceaSteven M. Nowick Department of Computer Science Columbia University {tibi,nowick}@cs.columbia.edu

2 Introduction Key Trend in VLSI systems: systems-on-a-chip (SoC) Two fundamental challenges:  mixed-timing domains  long interconnect delays Our Goal: design of efficient interface circuits Desirable Features:  arbitrarily robust  low-latency, high-throughput  modularity, scalability Few satisfactory solutions to date….

3 Timing Issues in SoC Design (a) single-clock long inter- connect Domain #1 sync or async (b) mixed-timing domains Domain #2 sync or async Domain #1 Domain #2 long inter- connect

4 Timing Issues in SoC Design (cont.) Solution: provide interface circuits (a) single-clock long inter- connect Carloni et al., “relay stations” Domain #1 sync or async (b) mixed-timing domains Domain #2 sync or async Domain #1 Domain #2 long inter- connect NEW: “mixed-timing FIFO’s” NEW: “mixed-timing “relay stations”

5 Contributions Complete set of mixed-timing interface circuits:  sync-sync, async-sync, sync-async, async-async Features:  Arbitrary Robustness: wrt synchronization failures  High-Throughput:  in steady-state operation: no synchronization overhead  Low-Latency: “fast restart”  in empty FIFO: only synchronization overhead  Reusability:  each interface partitioned into reusable sub-components Two Contributions:  Mixed-Timing FIFO’s  Mixed-Timing Relay Stations

6 Contribution #1: Mixed-Timing FIFO’s Addresses issue of interfacing mixed-timing domains Features: token ring architecture  circular array of identical cells  shared buses: data + control  data: “immobile” once enqueued  distributed control: allows concurrent put/get operations 2 circulating tokens: define tail & head of queue Potential benefits:  low latency  low power  scalability

7 Contribution #2: Mixed-Timing Relay Stations Addresses issue of long interconnect delays “Latency-Insensitive Protocols”: safely tolerate long interconnect delays between systems Prior Contribution: introduce “relay stations”  single-clock domains (Carloni et al., ICCAD-99) Our Contribution: introduce “mixed-timing relay stations”  mixed-clock (sync-sync)  async-sync First proposed solutions to date….

8 Related Work Single-Clock Domains: handling clock discrepancies  clock skew and jitter (Kol98, Greenstreet95)  long interconnect delays (Carloni99) Mixed-Timing Domains: 3 common approaches  Use “Wrapper Logic”:  add logic layer to synchronize data/control (Seitz80, Seizovic94)  drawback: long latencies in communication  Modify Receiver’s Clock:  stretchable and pausible clocks (Chapiro84, Yun96, Bormann97, Sjogren/Myers97)  drawback: penalties in restarting clock

9 Related Work: Closer Approaches Mixed-Timing Domains (cont.):  Interface Circuits: Mixed-Clock FIFO’s (Intel, Jex et al. 1997):  drawback: significant area overhead = synchronizer for each cell Our approach: mixed-clock FIFO’s  … only 2 synchronizers for entire FIFO

10 Outline  Mixed-Clock Interfaces  FIFO  Relay Station Async-Sync Interfaces Async-Sync Interfaces  FIFO  Relay Station Results Results Conclusions Conclusions

11 Mixed-Clock FIFO: Block Level full req_put data_put CLK_put req_get valid_get empty data_get CLK_get Mixed-Clock FIFO Bus for data itemsIndicates when FIFO full Indicates when FIFO empty Controls get operations Initiates get operations Bus for data items Indicates data items validity (always 1 in this design)synchronous put inteface synchronous get interface Initiates put operations Controls put operations

12

13 Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty FIFO not full Put Controller enables a put operation TAIL At the end of clock cycle Cell enqueues data HEAD Sender starts a put operation Steady state: FIFO neither full, nor empty

14 Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL Passes the put token HEAD

15 Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL HEAD Get Operation

16 Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL HEAD Steady state operation: Puts and Gets “reasonably spaced” Zero probability of synchronization failure Steady state operation: Zero synchronization overhead

17 Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL HEAD TAIL

18 Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty Put interface stalled FIFO FULL HEAD TAIL

19 Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector full req_put data_put CLK_put CLK_get data_get req_get valid_get empty HEAD Put Controller TAIL

20 Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector full req_put data_put CLK_put CLK_get data_get req_get valid_get empty Put Controller TAIL FIFO NOT FULL HEAD

21 Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector full req_put data_put CLK_put CLK_get data_get req_get valid_get empty Put Controller TAIL HEAD

22 REG Mixed-Clock FIFO: Cell Implementation En f_i e_i ptok_outptok_in gtok_ingtok_out CLK_geten_getvaliddata_get CLK_puten_putreq_putdata_put SR en_put en_get Enables a get operation Enables a put operation Synchronous Put Part Synchronous Get Part Data Validity Controller reusable f_i e_i Cell FULL Cell EMPTY Status Bits: ptok_outptok_ingtok_outgtok_in En validdata_get Data item out Validity bit out req_putdata_put Data item in Validity bit in

23 Mixed-Clock FIFO: Architecture Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty FIFO not full

24 Synchronization Issues Challenge: interfaces are highly-concurrent  Global “FIFO state”: controlled by 2 different clocks Problem #1: Metastability  Each FIFO interface needs clean state signals Solution: Synchronize “full” & “empty” signals  “full” with CLK_put  “empty” with CLK_get Add 2 (or more) synchronizing latches to each signal Observable “full”/“empty” safely approximate true FIFO state

25 Synchronization Issues (cont.) Problem #2: FIFO now may underflow/overflow!  synchronizing latches add extra latency Solution: Modify definitions of “full” and “empty” New FULL: 0 or 1 empty cells left New EMPTY: 0 or 1 full cells left e_0 e_1 e_2 e_3 e_2 e_1 e_0 CLK_put full  Two consecutive empty cells FIFO not full = NO two consecutive empty cells Synchronizing Latches New Full Detector

26 Synchronization Issues (cont.) Problem #3: Potential for deadlock Scenario: suppose only 1 data item in quiescent FIFO  FIFO still considered “empty” (new definition)  Get interface: cannot dequeue data item! Solution: bi-modal “empty detector”, combines:  “New empty” detector (0 or 1 data items)  “True empty” detector (0 data items) Two results folded into single global “empty” signal

27 Synchronization Issues: Avoiding Deadlock f_0 f_1 f_2 f_3 f_2 f_1 f_0 CLK_get ne f_1f_3f_2f_0 CLK_get oe req_get en_get empty Detects “new empty” (0 or 1 empty cells) Detects “true empty” (0 empty cells) Combine into global “empty” Bi-modal empty detection: select either ne or oe Reconfigure whenever active get interface When reconfigured use “ne”: FIFO active  avoids underflow When NOT reconfigured, use “oe”: FIFO quiescent  avoids deadlock

28 Mixed-Clock FIFO: Architecture Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty FIFO not full

29 Put/Get Controllers Put Controller:  enables put operation  disabled when FIFO full Get Controller:  enables get operation  indicates when data valid  disabled when FIFO empty en_put full req_put en_get empty valid req_get valid_get

30 Outline  Mixed-Clock Interfaces  FIFO  Relay Station Async-Sync Interfaces Async-Sync Interfaces  FIFO  Relay Station Results Results Conclusions Conclusions

31 Relay Stations: Overview system 1 now sends “data packets” to system 2 RS System 1System 2 Data Packet = data item + validity bit “stop” control = stopIn + stopOut - apply counter-pressure - result: stall communication Proposed by Carloni et al. (ICCAD’99) Steady State: pass data on every cycle (either valid or invalid) Problem: Works only for single-clock systems! CLK system 1 sends “data items” to system 2 Delay = > 1 cycle Delay = 1 cycle

32 Relay Stations: Implementation In normal operation: In normal operation:  packetIn copied to MR and forwarded on packetOut When stopped ( stopIn =1): When stopped ( stopIn =1):  stopOut raised on the next clock edge  extra packet copied to AR switch mux MRAR Control packetOutpacketIn stopIn stopOut

33 Relay Station vs. Mixed-Clock FIFO Steady state: always pass data Data items: both valid & invalid Stopping mechanism: stopIn & stopOut Steady state: only pass data when requested Data items: only valid data Stopping mechanism: none (only full/empty) validOut dataOut stopIn validIn dataIn stopOut emptyfull req_getreq_put dataOut dataIn Relay Station Mixed- Clock FIFO

34 full req_put data_put CLK_put empty req_get valid_get data_get CLK_get Mixed-Clock FIFO CLK Mixed-Clock Relay Stations (MCRS) Mixed-Clock Relay Stations (MCRS) RS System 1System 2 Mixed-Clock Relay Station derived from the Mixed-Clock FIFOvalid_putdata_put stopOutstopIn valid_getdata_get Mixed-Clock Relay Station CLK1CLK2 MCRS CLK1 CLK2 Change ONLY Put and Get Controllers NEW packetIn packetOut

35 Mixed-Clock Relay Station: Implementation Identical: - FIFO cells - Full/Empty detectors (...or can simplify) Only modify: Put & Get Controllers validIn full en_put stopIn empty valid en_get validOut to cells Put ControllerGet Controller Mixed-Clock Relay Station vs. Mixed-Clock FIFO Always enqueue data (unless full)

36 Outline Mixed-Clock Interfaces Mixed-Clock Interfaces  FIFO  Relay Station Async-Sync Interfaces Async-Sync Interfaces  FIFO  Relay Station Results Results Conclusions Conclusions

37 Async-Sync FIFO: Block Level Asynchronous put interface: uses handshaking communication  put_req: request operation  put_ack: acknowledge completion  no “full” signal Synchronous get interface: no change full req_put data_put CLK_put req_get valid_get empty data_get CLK_get Mixed-Clock FIFO put_data req_get valid_get empty data_get CLK_get put_req put_ack Async-Sync FIFO Async DomainSync Domain

38 Async-Sync FIFO: Architecture cell Get Controller Empty Detector put_ack put_req put_data CLK_get data_get req_get valid_get empty Get interface: exactly as in Mixed-Clock FIFO Asynchronous put interface No Full Detector or Put Controller When FIFO full, acknowledgement withheld until safe to perform the put operation

39 REG Async-Sync FIFO: Cell Implementation C + OPTDV En put_reqput_data put_ack we f_i gtok_out we1 gtok_in CLK_geten_getget_data e_i Data Validity Controller new Synchronous Get Part reusable (from mixed-clock FIFO) Asynchronous Put Part reusable from async FIFO (Async00)

40 Async-Sync Relay Stations (ASRS) ARS RS System 1 (async) System 2 (sync) ASRS CLK2 Micropipeline optional

41 Outline Mixed-Clock Interfaces Mixed-Clock Interfaces  FIFO  Relay Station Async-Sync Interfaces Async-Sync Interfaces  FIFO  Relay Station Results Results Conclusions Conclusions

42 Results Each circuit implemented:  using both academic and industry tools  MINIMALIST: Burst-Mode controllers [Nowick et al. ‘99]  PETRIFY: Petri-Net controllers [Cortadella et al. ‘97] Pre-layout simulations: 0.6  m HP CMOS technology Experiments:  various FIFO capacities (4/8/16 cells)  various data widths (8/16 bits)

43 Results: Latency Design 4-place8-place16-place MinMaxMinMaxMinMax Mixed-Clock5.436.345.796.646.147.17 Async-Sync5.536.456.137.176.477.51 Mixed-Clock RS 5.486.416.057.026.237.28 Async-Sync RS 5.616.356.187.136.577.62 Experimental Setup: - 8-bit data items - various FIFO capacities (4, 8, 16) For each design, latency not uniquely defined: Min/Max Latency = time from enqueuing to dequeueing data into an empty FIFO

44 Results: Maximum Operating Rate Design 4-place8-place16-place PutGetPutGetPutGet Mixed-Clock565549544523505484 Async-Sync421549379523357484 Mixed-Clock RS 580539550517509475 Async-Sync RS 421539379517357475 Synchronous interfaces: MegaHertz Asynchronous interfaces: MegaOps/sec Put vs. Get rates: - sync put faster than sync get - async put slower than sync get

45 Conclusions Introduced several new low-latency interface circuits Address 2 major issues in SoC design:  Mixed-timing domains  mixed-clock FIFO  async-sync FIFO  Long interconnect delays  mixed-clock relay station  async-sync relay station Other designs implemented and simulated:  Sync-Async FIFO + Relay Station  Async-Async FIFO + Relay Station Reusable components: mix & match to build circuits Provide useful set of interface circuits for SoC design


Download ppt "Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01] Tiberiu ChelceaSteven M. Nowick Department of Computer Science Columbia University"

Similar presentations


Ads by Google