1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation.

Slides:



Advertisements
Similar presentations
1 Lecture 16 Timing  Terminology  Timing issues  Asynchronous inputs.
Advertisements

Data Synchronization Issues in GALS SoCs Rostislav (Reuven) Dobkin and Ran Ginosar Technion Christos P. Sotiriou FORTH ICS- FORTH.
1 COMP541 Flip-Flop Timing Montek Singh Oct 6, 2014.
Aug Data/Clock Synchronization Fourteen ways to fool your synchronizer Ginosar, R.; Asynchronous Circuits and Systems, Proceedings. Ninth International.
Introduction to CMOS VLSI Design Sequential Circuits.
1 Introduction Sequential circuit –Output depends not just on present inputs (as in combinational circuit), but on past sequence of inputs Stores bits,
Introduction to Sequential Logic Design Latches. 2 Terminology A bistable memory device is the generic term for the elements we are studying. Latches.
Introduction to CMOS VLSI Design Sequential Circuits
ECE C03 Lecture 81 Lecture 8 Memory Elements and Clocking Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
L06 – Clocks Spring /18/05 Clocking.
Digital Logic Chapter 5 Presented by Prof Tim Johnson
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 A Unique and Successfully Implemented Approach to the Synchronization Problem Based on the article.
EET 1131 Unit 10 Flip-Flops and Registers
(Neil west - p: ). Finite-state machine (FSM) which is composed of a set of logic input feeding a block of combinational logic resulting in a set.
Ch 8. Sequential logic design practices 1. Documentation standards ▶ general requirements : signal name, logic symbol, schematic logic - state machine.
Synchronous Digital Design Methodology and Guidelines
1 Digital Design: State Machines Timing Behavior Credits : Slides adapted from: J.F. Wakerly, Digital Design, 4/e, Prentice Hall, 2006 C.H. Roth, Fundamentals.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Assume array size is 256 (mult: 4ns, add: 2ns)
1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 6 –Selected Design Topics Part 3 – Asynchronous.
COMP Clockless Logic and Silicon Compilers Lecture 3
Sequential Logic 1  Combinational logic:  Compute a function all at one time  Fast/expensive  e.g. combinational multiplier  Sequential logic:  Compute.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Senior Design I Lecture 11 - Timing and Metastability.
Asynchronous Input Example Program counter normally increments, jumps to address of interrupt subroutine on asynchronous interrupt How many states can.
1 Synchronization of complex systems Jordi Cortadella Universitat Politecnica de Catalunya Barcelona, Spain Thanks to A. Chakraborty, T. Chelcea, M. Greenstreet.
CS 151 Digital Systems Design Lecture 20 Sequential Circuits: Flip flops.
11/15/2004EE 42 fall 2004 lecture 321 Lecture #32 Registers, counters etc. Last lecture: –Digital circuits with feedback –Clocks –Flip-Flops This Lecture:
Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
1 EE365 Synchronous Design Methodology Asynchronous Inputs Synchronizers and Metastability.
Fall 2007 L16: Memory Elements LECTURE 16: Clocks Sequential circuit design The basic memory element: a latch Flip Flops.
111/9/2005EE 108A Lecture 13 (c) 2005 W. J. Dally EE108A Lecture 13: Metastability and Synchronization Failure (or When Good Flip-Flops go Bad)
Digital Integrated Circuits for Communication
1 CSE370, Lecture 16 Lecture 19 u Logistics n HW5 is due today (full credit today, 20% off Monday 10:29am, Solutions up Monday 10:30am) n HW6 is due Wednesday.
© 2010 Blended Integrated Circuit Systems, LLC Tom Chaney, Dave Zar, Metastability (What?) TexPoint fonts used in.
© 2003 Xilinx, Inc. All Rights Reserved FPGA Design Techniques.
Low Latency Clock Domain Transfer for Simultaneously Mesochronous, Plesiochronous and Heterochronous Interfaces Wade Williams Philip Madrid, Scott C. Johnson.
1 Sequential Logic Lecture #7. 모바일컴퓨팅특강 2 강의순서 Latch FlipFlop Shift Register Counter.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
EEE2243 Digital System Design Chapter 7: Advanced Design Considerations by Muhazam Mustapha, extracted from Intel Training Slides, April 2012.
© BYU 18 ASYNCH Page 1 ECEn 224 Handling Asynchronous Inputs.
1 CSE370, Lecture 17 Lecture 17 u Logistics n Lab 7 this week n HW6 is due Friday n Office Hours íMine: Friday 10:00-11:00 as usual íSara: Thursday 2:30-3:20.
ENG241 Digital Design Week #8 Registers and Counters.
Computer Architecture Lecture 4 Sequential Circuits Ralph Grishman September 2015 NYU.
Topic: Sequential Circuit Course: Logic Design Slide no. 1 Chapter #6: Sequential Logic Design.
© 2003 Xilinx, Inc. All Rights Reserved Global Timing Constraints FPGA Design Flow Workshop.
ECE 551 Fall /6/2001 ECE Digital System Design & Synthesis Lecture 2 - Pragmatic Design Issues Overview  Classification of Issues oThree-State.
Reading Assignment: Rabaey: Chapter 9
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
SoC Clock Synchronizers Project Elihai Maicas Harel Mechlovitz Characterization Presentation.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
CSE 140: Components and Design Techniques for Digital Systems Lecture 7: Sequential Networks CK Cheng Dept. of Computer Science and Engineering University.
Flip Flops Engr. Micaela Renee Bernardo. A latch is a temporary storage device that has two stable states (bistable). It is a basic form of memory. Latches.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
Lecture 11: Sequential Circuit Design
LATCHED, FLIP-FLOPS,AND TIMERS
CSE 370 – Winter Sequential Logic-2 - 1
Limitations of STA, Slew of a waveform, Skew between Signals
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
CSE 370 – Winter Sequential Logic - 1
Lecture 19 Logistics Last lecture Today
Synchronous Digital Design Methodology and Guidelines
Synchronous Digital Design Methodology and Guidelines
Presentation transcript:

1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

2 ® Charles Dike Introduction Tutorial Share some ideas about synchronization and metastability Introduce NEW, IMPROVED theory on metastability Charles Dike

3 ® Charles Dike Why and where synchronize? Reduce latency between independent clock domains.  Asynchronous domain to synchronous clock.  Synchronous clock to an independent synchronous clock. Benefit - higher performance in critical circuits. Asynchronous Circuit Pausable Clock at 1.8 GHz Synchronous Clock at 3.0 GHz Synchronous Clock at 1.5GHz

4 ® Charles Dike Design Direction MEM FPU ALU MEM FPU ALU MEM FPU ALU MEM FPU ALU 80s towards 100MHz 90s towards 1GHz 00s multi-GHz VALUE ADDED

5 ® Charles Dike Chip Area Networks Late 00s multi-GHz

6 ® Charles Dike I believe…. We must be able to synchronize all domains to a PLL controlled clock Interconnect on chip will be asynchronous (GALS) We need to minimize latency There will be two basic synchronizer uses - near neighbor and the chip net

7 ® Charles Dike Topics of Discussion Generic synchronizer of the type used in the TeraFlops computer Simple synchronizer of the type used in StrongArm The Myrinet pipeline synchronization scheme Latest understanding of metastability

8 ® Charles Dike Generic Synchronizer Handles self timed to synchronous interfaces and vice-versa Supports synchronous to synchronous interfaces Can handle streaming data Adaptable to any speed range Possibly used over the chip network

9 ® Charles Dike Two flop synch DQDQ CLK VALID #1#2

10 ® Charles Dike Single latch synch DQDQ CLK2 REQ SR Q DQDQ CLK1 Write ValidRead Valid ACK LATCH OUTPUT RECEIVER CLOCK SENDER CLOCK

11 ® Charles Dike Multi latch synch DQDQ CLK2 REQ SR Q DQDQ CLK1 Write ValidRead Valid ACK DQDQ CLK2 REQ SR Q DQDQ CLK1 Write ValidRead Valid ACK

12 ® Charles Dike General Case WRITE POINTER READ POINTER EMPTY SYNC STATUS REGISTER SYNCHRONIZERSSYNCHRONIZERS LATENCY PADDING FULL EN Write Clock Write Enable Read Clock

13 ® Charles Dike empty case WRITE POINTER READ POINTER STATUS REGISTER EMPTY DQ R EN DQ R DQ R SYNCHRONIZER Write Pointer a Read Pointer b Read Clock EMPTY DQ R EN DQ R DQ R Write Clock Write Enable Write Pointer b Read Pointer a

14 ® Charles Dike General Case WRITE POINTER READ POINTER EMPTY SYNC STATUS REGISTER SYNCHRONIZERSSYNCHRONIZERS LATENCY PADDING FULL EN Write Clock Write Enable Read Clock

15 ® Charles Dike Topics of Discussion Generic synchronizer of the type used in the TeraFlops computer Simple synchronizer of the type used in StrongArm  processor The Myrinet pipeline synchronization scheme Latest understanding of metastability

16 ® Charles Dike Simple Synchronizer Constrained by frequency ratio Supports synchronous to synchronous interfaces Does it support asynch to synch? Yes, with restrictions. Possibly used in local neighbor synchronizers

17 ® Charles Dike Simple Synchronizer DQDQDQDQ Divide by 2 SLOW CLK FAST CLK SYNC * MI* MI* = Metastable Immune AA1 A2A3 wxyz

18 ® Charles Dike timing1 DQDQDQDQ Divide by 2 SLOW FAST SYNC * MI* AA1 A2A FAST CLOCK SLOW CLOCK A A1 A2 A3 SYNC

19 ® Charles Dike timing2 DQDQDQDQ Divide by 2 SLOW FAST SYNC * MI* AA1 A2A FAST CLOCK SYNC SLOW CLOCK CHEATER CLOCK

20 ® Charles Dike timing3 DQDQDQDQ Divide by 2 SLOW FAST SYNC * MI* AA1 A2A FAST CLOCK SYNC SLOW CLOCK CHEATER CLOCK

21 ® Charles Dike timing4 Divide by 2 SLOW FAST SYNC * MI* AA1 A2A FAST CLOCK SYNC SLOW CLOCK SLOW CLOCK# SYNC DQDQDQ FAST SYNC * MI* AA1 A2A3 DQDQDQDQDQ * MI*

22 ® Charles Dike transfers FAST CLOCK SYNC SLOW CLOCK CHEATER CLOCK DQDQ SYNC FAST CLOCK SLOW CLOCK FAST TO SLOW TRANSFERSLOW TO FAST TRANSFER DQDQ SYNC FAST CLOCK SLOW CLOCK

23 ® Charles Dike Topics of Discussion Generic synchronizer of the type used in the TeraFlops computer Simple synchronizer of the type used in StrongArm The Myrinet pipeline synchronization scheme Latest understanding of metastability

24 ® Charles Dike Pipeline Synchronizer Supports synchronous to synchronous interfaces Supports asynch to synch and vice- versa Possibly used in local neighbor synchronizers Essentially a distributed fifo and synchronizer

25 ® Charles Dike Pipeline Synchronizer S Ri Ai Di Ro Ao Do S Ri Ai Di Ro Ao Do S Ri Ai Di Ro Ao Do   

26 ® Charles Dike  R1R1 R0R0 A1A1 A0A0 ME S  ME element X REQ

27 ® Charles Dike Fifo element Ri Ai Di Ro Ao Do C Ri Ai Ro Ao C Data

28 ® Charles Dike Async to sync S Ri Ai Di Ro Ao Do S Ri Ai Di Ro Ao Do S Ri Ai Di Ro Ao Do      SynchronousAsynchronous

29 ® Charles Dike Sync to async   SynchronousAsynchronous Ri Ai Di Ro Ao Do Ri Ai Di Ro Ao Do Ri Ai Di Ro Ao Do    S SS

30 ® Charles Dike Points to ponder #1 All synchronizing interfaces have one thing in common - a latching element that holds data while metastabilities are being resolved. There is no way to avoid the latency which is required to resolve metastabilities. To minimize latency the latching element characteristics can be improved. We will be required to understand and use this knowledge. This is the future of digital design.

31 ® Charles Dike Topics of Discussion Generic synchronizer of the type used in the TeraFlops computer Simple synchronizer of the type used in StrongArm The Myrinet pipeline synchronization scheme Latest understanding of metastability

32 ® Charles Dike Role of the Synchronizing Flop Reorients incoming information to a clock edge Its performance determines system failure rate or latency

33 ® Charles Dike Real Life There is no magic bullet There is a lot of misinformation on metastability around To date many circuits have been over designed through planning and luck Whenever a circuit fails based on too high of a frequency ultimately the cause of failure is metastability There is no way to synchronize a signal faster than about the time it takes to pass a signal through six static gates

34 ® Charles Dike Metastability is.... SET RESET OUT NODE A NODE B

35 ® Charles Dike Technical terms T w (window size) - likelihood of entering a metastable state - in units of time Tau (  ) - rate at which metastability resolves - in units of time MTBF (Mean Time Between Failures) MTBF = TwfdfcTwfdfc e t  =4kT/C < thermal noise

36 ® Charles Dike Simple jamb latch DATA CLOCKRESET OUT NODE A NODE B Propagation delay  time of data after clock

37 ® Charles Dike Simple jamb latch DATA CLOCKRESET OUT NODE A NODE B Propagation delay  time of data after clock ~RC time constant

38 ® Charles Dike Rough Histogram Propagation delay  time of data after clock Propagation delay  time of data after clock (log scale) MTBF = TwfdfcTwfdfc e t  TwTw The slope is the 

39 ® Charles Dike Why is the theory a problem? It assumes a uniform distribution of data about the clock –What happens when data always violates the setup/ hold window? It is not detailed enough –Doesn’t consider a deterministic region –Doesn’t account for thermal noise People tend to extrapolate the theory improperly MTBF = TwfdfcTwfdfc e t 

40 ® Charles Dike Overview of refined theory Not everything past a normal propagation is a metastable event The T w window can’t be improved by input edge rates T w has a complex relationship to t based on load The MTBF formula needs to be modified due to non-uniform distribution of data about the clock input

41 ® Charles Dike Schematic

42 ® Charles Dike Simulation of a typical latching device

43 ® Charles Dike Test case DQ R PC DELAY PULSE GENERATOR #2 PULSE GENERATOR #1 TRIGGER INPUT TEK B OSCILLOSCOPE DELAY

44 ® Charles Dike Measuring real data advancing time

45 ® Charles Dike Histogram Inflection point time 0.6mv/0.1ps

46 ® Charles Dike Histogram Inflection point time 0.6mv/0.1ps

47 ® Charles Dike Measured versus Basic Propagation delay  time of data after clock (log scale) MTBF = TwfdfcTwfdfc e t  TwTw The slope is the  Propagation delay 0.6mv/0.1ps

48 ® Charles Dike  Simulated.... Voltage Controlled Switch R1 = 100  R1 = 100M  Battery

49 ® Charles Dike Tau Simulated 2  = | t1 - t2 | ln V2 V1 Where: V1 = voltage at time t1 V2 = voltage at time t2 t2 t1 Latch outputs at nodes 1 and ns Semilog difference between latch outputs ns volts time volts

50 ® Charles Dike =4kT/C=4kTBR k = 1.38 x J/K B = 1/  =  5 x Hz R = ~400  T = 300 o K  = 20 picoseconds V n = ~0.6 mv

51 ® Charles Dike Putting it all together ps 18.0 ps 1.80 ps 0.18 ps 18.0 fs 1.80 fs 0.18 fs 1.80 ns (picoseconds) A normal

52 ® Charles Dike Putting it all together ps 18.0 ps 1.80 ps 0.18 ps 18.0 fs 1.80 fs 0.18 fs 1.80 ns (picoseconds) B ? deterministic

53 ® Charles Dike Putting it all together ps 18.0 ps 1.80 ps 0.18 ps 18.0 fs 1.80 fs 0.18 fs 1.80 ns (picoseconds) C Thermal noise point 1.80 v 180 mv 18.0 mv 1.80 mv 180  v 18.0  v 1.80  v deterministic

54 ® Charles Dike Putting it all together ps 18.0 ps 1.80 ps 0.18 ps 18.0 fs 1.80 fs 0.18 fs 1.80 ns (picoseconds) D T=19 ps deterministic true metastability

55 ® Charles Dike Putting it all together ps 18.0 ps 1.80 ps 0.18 ps 18.0 fs 1.80 fs 0.18 fs 1.80 ns (picoseconds) E T w =15 ps T=19 ps deterministic true metastability

56 ® Charles Dike MTBF = TwfdfcTwfdfc e (t-deter)  MTBF = TwfdfcTwfdfc e t  Worst case Simple case MTBF = TwfdfcTwfdfc e (t-0.5*deter)  Expected

57 ® Charles Dike Points to ponder #2 Jakov Seizovic postulated a “malicious” asynchronous signal: no matter how we position the sampling window, and no matter how small we make the sampling window, the asynchronous transition will appear in that window. This case has to be assumed when interfacing to a signal of unknown probability distribution. We know something about just how malicious a signal can be.

58 ® Charles Dike Exploring

59 ® Charles Dike Worst case bound

60 ® Charles Dike < 0.1 ps Uniform distribution 12 ps jitter Not worst case bound

61 ® Charles Dike Final comments With the proper synchronizing device it may be possible to synchronize a signal within a single clock cycle. The constraints are: –You require about 35  s in order to get the MTBF out to about 1 century. –Each typical static gate delay is equivalent to about 5  s in a properly designed synchronizing flop. –The metastability MTBF of a device should probably be an order of magnitude better than the mechanical MTBF. –You must assume a ‘malicious’ input to the synchronizer. Nevertheless, this only adds about 5  s to the delay. –Standard flop designs are generally very poor synchronizers. Use a jamb structure. It has the best transconductance. –You should never require more than two synchronizing flops in series

62 ® Charles Dike Conclusion There are several ways to communicate between independent domains I believe more asynchronous domains will appear that are imbedded within synchronous designs –Latency must be reduced to maximize the use of asynchronous designs. –This is a burden that asynch designers must bear –We need to know the limitations of synchronization and metastability Chip area networks are coming and they will open up opportunities for asynchronous design

63 ® Charles Dike References T. Sakurai, “Optimization of CMOS Arbiter and Synchronizer Circuits with Submicrometer MOSFET’s,” IEEE J. Solid State Circuits, vol. 23,no. 4, pp , Aug L. Kleeman and A. Cantoni, “Metastable Behavior in Digital Systems,” IEEE Design & Test of Computers, pp. 4-19, Dec I. E. Sutherland, “Micropipelines.” Turing Award Lecture, Communications of the ACM, 32(6), pp , J. N. Seizovic, “Pipeline Synchronization,” Proc. Int’l Symp. Advanced Research in Asynchronous Circuits and Systems, CS Press, C. Dike and E. Burton, “Miller and Noise Effects in a Synchronizing Flip-Flop,” IEEE J. Solid State Circuits, vol. 34,no. 6, pp , June A. Van der Ziel, Noise in Measurements. New York: Wiley, 1976.

64 ® Charles Dike Overview of present theory Everything past a normal propagation is considered a metastable event A deterministic region doesn’t exist T w has no fixed relationship to  The MTBF formula assumes a uniform distribution of data about the clock input MTBF = TwfdfcTwfdfc e t 