1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri.

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Data Synchronization Issues in GALS SoCs Rostislav (Reuven) Dobkin and Ran Ginosar Technion Christos P. Sotiriou FORTH ICS- FORTH.
Registers and Counters
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Self-Timed Systems Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical.
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
1 Lecture 20 Sequential Circuits: Latches. 2 Overview °Circuits require memory to store intermediate data °Sequential circuits use a periodic signal to.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
The 8085 Microprocessor Architecture
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
1 Delay Insensitivity does not mean slope insensitivity! Vainbaum Yuri.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
COMP Clockless Logic and Silicon Compilers Lecture 3
1 Synchronization of complex systems Jordi Cortadella Universitat Politecnica de Catalunya Barcelona, Spain Thanks to A. Chakraborty, T. Chelcea, M. Greenstreet.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
11/15/2004EE 42 fall 2004 lecture 321 Lecture #32 Registers, counters etc. Last lecture: –Digital circuits with feedback –Clocks –Flip-Flops This Lecture:
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Fall 2009 / Winter 2010 Ran Ginosar (
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 Pipeline Synchronization Continued This second part is based on the recent article Bridging Clock.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
1 Sequential Circuits Registers and Counters. 2 Master Slave Flip Flops.
Sequential Circuits Chapter 4 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
COE 202: Digital Logic Design Sequential Circuits Part 1
1 Registers and Counters A register consists of a group of flip-flops and gates that affect their transition. An n-bit register consists of n-bit flip-flops.
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Digital Logic Design Review Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office: Ahmad Almulhem, KFUPM 2010.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Chap 4. Sequential Circuits
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
1 Sequential Logic Lecture #7. 모바일컴퓨팅특강 2 강의순서 Latch FlipFlop Shift Register Counter.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
© BYU 18 ASYNCH Page 1 ECEn 224 Handling Asynchronous Inputs.
Presenter : Ching-Hua Huang 2012/6/25 A High-Throughput, Metastability-Free GALS Channel Based on Pausible Clock Method Mohammad Ali Rahimian, Siamak Mohammadi,
ENG241 Digital Design Week #8 Registers and Counters.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
RTL Hardware Design by P. Chu Chapter Poor design practice and remedy 2. More counters 3. Register as fast temporary storage 4. Pipelined circuit.
Reading Assignment: Rabaey: Chapter 9
1 COMP541 Sequential Circuits Montek Singh Feb 1, 2007.
Counters and Registers Synchronous Counters. 7-7 Synchronous Down and Up/Down Counters  In the previous lecture, we’ve learned how synchronous counters.
Introduction to Microprocessors - chapter3 1 Chapter 3 The 8085 Microprocessor Architecture.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Synchronous Sequential Circuits by Dr. Amin Danial Asham.
Chap 5. Registers and Counters
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Lecture 11: Sequential Circuit Design
Class Exercise 1B.
The 8085 Microprocessor Architecture
Registers and Counters
The 8085 Microprocessor Architecture
ECE Digital logic Lecture 16: Synchronous Sequential Logic
Clock Domain Crossing Keon Amini.
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN
The 8085 Microprocessor Architecture
Clockless Logic: Asynchronous Pipelines
Presentation transcript:

1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri

2 A Modular Synchronizing FIFO for NoCs Paper presented in NOC-2009 Authors : Tarik Ono -Sun Microsystems Mark Greenstreet - University of British Columbia

3 Motivation & Purpose of Synchronizing FIFO Timing Domain 1Timing Domain 2Timing Domain 3 Synchronizing FIFO Synchronizing FIFO Synchronizing FIFO Network-on-Chip Multiple clock domains in NoC require many FIFOs

4 Synchronizing FIFO Targets Design Targets for FIFO:  FIFO can be built using standard cells  Easy integration into CAD flow  Modular FIFO design with choice of clockless or clocked interfaces  Modular, simple architecture reduces NoC design time

5 Talk Outline FIFO Overview FIFO Blocks  Clockless Put and Get Interface  Clocked Put and Get Interface  Full-Empty Control and Data Store FIFO Latency and Throughput Implementation Results

6 FIFO Overview: Operation stage 1stage 2stage 3 Put Interface Get Interface Sender Receiver Timing Domain A FIFO consists of number of stages Sender communicates with Put Interface, Receiver with Get Interface Tokens determine FIFO stage for next put and get operation Timing Domain B

7 FIFO Overview: Structure stage 1stage 2stage 3 Put Interface Cell Sender Receiver Timing Domain A Each FIFO stage has a  Put Interface Cell  Get Interface Cell  Full-Empty Control  Data Store Timing Domain B Put Interface Cell Put Interface Cell Get Interface Cell Get Interface Cell Get Interface Cell Full-Empty Control Full-Empty Control Data Store Full-Empty Control Data Store Data Store

8 FIFO Overview: Modular Design stage 2stage 3 Put Interface Cell Sender Receiver Clocked Domain A Clockless Noc Put Interface Cell Get Interface Cell Get Interface Cell Full-Empty Control Data Store Full-Empty Control Data Store Data Store Get Interface Cell Put Interface Cell Full-Empty Control stage 1 CLOCKED PUT INTERFACE CLOCKLESS GET INTERFACE Mix-and-Match Interfaces

9 FIFO Overview: Modular Design stage 2stage 3 Sender Receiver Fast Clocked Domain A Slow Clocked Domain B Full-Empty Control Data Store Full-Empty Control Data Store Data Store Full-Empty Control stage 1 CLOCKED PUT INTERFACE CLOCKED GET INTERFACE Mix-and-Match Interfaces Can use different synchronization time lengths, depending on clock frequency Changing FIFO size doesn't affect individual FIFO stage 1 flop synchronizer 3 flop synchronizer

10 Full Empty Control and Data store Data Store consists of latches  enabled when write is high Same blocks for clocked or clockless interfaces Full-Empty Control consists of a SR-latch  on write, set output (full signal) high  on read, set output low

11 asP* FIFO asP*- Asynchronous Symmetric Persistent Pulse Protocol Standard cells Good performance Doesn’t require C-elements asP* handshaking protocol is chosen as baseline for FIFO design

12 asP* FIFO -simulation 0 X Initial state  SR latches keeps track of empty/full status  AND gates coordinate data transfer between stages

13 asP* FIFO -simulation 1 D 111 Data arrives, req rises  SR latch EFi is set to indicate Li latch holds valid data 0000

14 asP* FIFO -simulation 1 D Data arrives, req rises  SR latch EFi is set to indicate Li latch holds valid data

15 asP* FIFO -simulation 1 D 1 1 D 11 Data propagates through L

16 asP* FIFO -simulation 1 D 0 1 D 111 SR latch EF1 is set 000

17 asP* FIFO -simulation 0 X 0 0 D Enabling L2 latch  When stage i-1 is full and i is empty AND gate goes high loading data to Li 0 00

18 asP* FIFO -simulation 0 X 1 0 D D 1 Clearing EF1 latch  When stage i-1 is full and i is empty AND gate goes high loading data to Li  Clearing SR EFi-1 latch to indicate that latch Li is now empty 00

19 asP* FIFO -simulation 0 X 1 0 D D

20 asP* FIFO -simulation 0 X 1 0 D D 0 1 D 1 0 Data available at output data_R  Req_R goes high as data arrives to last stage

21 asP* FIFO -simulation 0 X 1 0 D D 0 0 D 1 0

22 asP* FIFO -simulation 1 D1 1 1 D D 0 0 D 1 Next data enters FIFO  Actually it can enter just after ack_L falls indicating first data is written 0

23 asP* FIFO -simulation 0 D D 0 0 D 1 0

24 asP* FIFO -simulation 0 X 0 0 D D 0 0 D 1 0

25 asP* FIFO -simulation 0 X 1 0 D D 1 0

26 asP* FIFO -simulation 1 D2 1 0 D D 1 Next data enters FIFO 0

27 asP* FIFO -simulation 1 D D1 1 0 D 10

28 asP* FIFO -simulation 1 D D1 1 0 D 1 0

29 asP* FIFO -simulation 0 X 0 0 D D1 1 0 D 10

30 asP* FIFO -simulation 1 D3 0 0 D D1 1 0 D 1 FIFO FULL! No Acknowledge until next read out 0

31 asP* FIFO -simulation 1 D3 0 0 D D1 1 0 D 11 1 D Ack_R rises, data read out

32 asP* FIFO -simulation 1 D3 0 0 D D1 1 1 D 01 1

33 asP* FIFO -simulation 1 D3 0 0 D D Data propagates to empty space

34 asP* FIFO -simulation 1 D3 0 0 D D

35 asP* Put Interface Cell 1 D3 0 0 D D Data propagates to empty space

36 asP* FIFO -simulation 1 D3 1 0 D D1 10 0

37 asP* FIFO -simulation 1 D3 1 0 D D1 10 0

38 asP* FIFO -simulation 1 D D2 1 0 D Now D3 can enter FIFO

39 asP* FIFO -simulation 1 D D2 1 0 D1 10 0

40 asP* FIFO -simulation 0 X 0 0 D D2 1 0 D Sender lowers Req_L

41 asP* FIFO - Timing Issue D 1 0 T [En->Q ] Q ]+T AND

42 asP* FIFO - Timing Issue MinResetPulseWidth[ R->Q ] Q ]+T AND

43 3-stage clockless FIFO Write Port Read Port Write requestRises if write succeeded Rises if data available at output Receiver acknowledge receipt of data

44 Stage of clockless FIFO Latches to load data Written when cell is empty Tri-state buffer Transfers tokens

45 asP* Put Interface Cell Signal from Sender (fanout to all stages)

46 asP* Put Interface Cell Signal to Sender (fanin from all stages)

47 asP* Put Interface Cell Signal to Data Store and Full-Empty Control

48 asP* Put Interface Cell Signal from Full-Empty Control

49 asP* Put Interface Cell Signal from previous stage Signal to next stage

50 asP* Put Interface Cell Sets in all but one cell to low

51 asP* Put Interface Cell

52 asP* Put Interface Cell

53 asP* Put Interface Cell

54 asP* Put Interface Cell

55 asP* Put Interface Cell

56 asP* Put Interface Cell

57 asP* Put Interface Cell

58 asP* Get Interface Cell Signal from Receiver

59 asP* Get Interface Cell Signal to Receiver

60 asP* Get Interface Cell Signal to Data Store and Full-Empty Control

61 asP* Get Interface Cell Signal from Full-Empty Control

62 asP* Get Interface Cell Signal to all stages

63 asP* Get Interface Cell -simulation

64 Full –empty cell Keeps track of whether cell is empty or full Set by write operation from put interface Reset by read operation from get interface AND gate ensures MUTEX on Set and Reset  Avoid races  Simplifies timing

65 Timing requirements for FIFO  The minimum low time for req_put must be at least as large as the minimum clock pulse width for the FFs in the put interfaces.  The minimum high time for req_put must be at least as large as the minimum pulse width for the set signal of the SR latch in the empty/full controller.  The minimum high time for got_data must be at least as large as the minimum pulse width for the set signal of the SR latch.

66 Protocol converters asP* simple and efficient But: timing constraints make it unsuitable for long interconnect LEDR is delay insensitive and better suited for long interconnect Other converters possible

67 LEDR protocol –brief overview Dual-rail encoding: two wires per bit – delay-insensitive “Level-encoding”: Data rail: holds actual data value Parity rail: holds parity value Alternating-phase protocol: Encoding parity alternates between odd and even 0 1 Even Odd data rail parity rail parity rail Bit value LEDR Encoding Phase

68 LEDR signaling data parity evenoddevenevenoddevenodd Data rail: carries bit value in both phases Parity rail: phase alternates with each data item Exactly one wire transition for each new data item

69 LEDR - completion detector 1-bit LEDR completion detector N-bit LEDR completion detector

70 LEDR-to-asP* converter Completion detector per bit Even parity detector Odd parity detector Store data when all data [1:n] bits have changed LEDR to asP* converter

71 LEDR-to-asP* converter In this Example : Assume Even parity phase 1 X P D D D

72 LEDR-to-asP* converter 1 D P X D X X

73 asP*-to-LEDR converter

74 asP*-to-LEDR converter D D 1 1 DP

75 Clocked FIFOs Design goal is to provide all flavors of synchronization converters Synchronous-to-Asynchronous Asynchronous-to-Synchronous Synchronous-to-Synchronous Asyn-to-Sync and Sycn-to-Async is obtained by combining async put interface with sync get interace and vice versa Synchronous-to-Synchronous will be detailed in next slides

76 3-Stage Clocked FIFO Indicates that Data can be put into FIFO Ensures fully sync behavior

77 FIFO stage with clocked RX and TX

78 Clocked Put Interface Cell Signal to sender Signals from sender Synchronizer ●State (full or empty) of FIFO stage is synchronized ●One 1-bit synchronizer per FIFO stage interface ●Asymmetric delay

79 Clocked Put Interface Cell !

80 Clocked Put Interface Cell !

81 Clocked Put Interface Cell !

82 Clocked Put Interface Cell !

83 Clocked Put Interface Cell !

84 Clocked Put Interface Cell !

85 Clocked Put Interface Cell !

86 Clocked Put Interface Cell !

87 Clocked Put Interface Cell !

88 Clocked Put Interface Cell Clocked get interface cell is analogous

89 Example of 1.5 cycle Synchronizer IN OUT Async_ OUT

90 Synchronizer MTBF for different synchronizers and clock speeds 90nm technology τ- metastability resolving constant

91 FIFO latency and throughput Latency  minimum time data spends in FIFO  independent of FIFO length Throughput  maximum number of data transfers per time  depends on FIFO length

92 FIFO throughput Throughput is limited by slower of put and get interfaces Put interface delay: minimum time between two successive FIFO writes Get interface delay: minimum time between two successive FIFO reads

93 Clocked FIFO throughput simulation Simulation scenario 2-cycle synchronizer Same put and get frequency with zero phase shift Throughput results Doesn’t allow to write every clock cycle Need to increase FIFO to 6 stages FIFO with equal put and get frequencies and n-cycle synchronizer needs 2*(n+1) stages to support max throughput

94 asP* FIFO latency Write latency Read latency Receiver latency Full-Empty Control

95 asP* FIFO latency –clockless Latency measured from rising req_put to data_valid rising (220ps) + got_data rising to empty cell status (140ps) equals at total to 360ps Throughput limited by slower get and put interface, evaluated max 1.95Ghz Power 5.27mW at 1.95Ghz 5.27mW

96 asP* FIFO latency –clocked Latency measured from rising clk_put to rising clk_get with valid data (doesn’t depends on FIFO length) + tsync(173ps) Throughput gain when using 6 stage FIFO is 2 times 6 stages FIFO running at 1.28Ghz consumes 4.91mW 5.27mW

97 Clocked FIFO latency Measured from clk_put edge that latches data in FIFO until clk_get edge that notifies receiver of available data

98 Clocked FIFO throughput Throughput determined by slower of put and get interfaces There is a minimum required FIFO length to support maximum throughput Minimum FIFO length depends on  synchronization latencies  ratio of put and get clock speeds  phase relationship of put and get clock

99 Conclusions Presented a synchronizing FIFO that  can be built using standard cells  has modular design  following properties can be chosen independently:  type of put and get interface  synchronization time length  FIFO size  has simple interfaces

100 References T.Ono, M.Greenstreet. A modular synchronizing FIFO for NoCs Proceedings of the rd ACM/IEEE International Symposium on Networks-on-Chip M. E. Dean, T. E. Williams, and D. L. Dill. Efficient selftiming with level-encoded 2- phase dual-rail (LEDR) MIT Press. C. E. Molnar, I. W. Jones, W. S. Coates, and J. K. Lexau. A FIFO ring performance experiment. In Advanced Research in Asynchronous Circuits and Systems, Proceedings of the Third International Symposium on, pages 279–289, Eindhoven, Apr I. E. Sutherland. Micropipelines. Commun. ACM,32(6):720–738, June Turing Award lecture. Mark Dean, Ted Williams and David Dill, “Efficient Self-Timing with Level-Encoded 2- Phase Dual Rail(LEDR)”, ARVLSI, 1991, pp