Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Digital Design GALS Design A. Steininger Vienna University of Technology.

Similar presentations


Presentation on theme: "Advanced Digital Design GALS Design A. Steininger Vienna University of Technology."— Presentation transcript:

1 Advanced Digital Design GALS Design A. Steininger Vienna University of Technology

2 Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 2 Outline Synchronizers Synchronizers trends trends characterization characterization Global synchrony & clock distribution Global synchrony & clock distribution The GALS approach The GALS approach communication communication synchronization synchronization Muller C-Element, Mutex & Arbiter Muller C-Element, Mutex & Arbiter data driven clock & pausible clock data driven clock & pausible clock

3 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 3 Synchronizer-Rules never synchronize more than one signal (rail) never synchronize more than one signal (rail) danger of data inconsistecy danger of data inconsistecy degradation of MTBU by number of signals degradation of MTBU by number of signals for a wider bus, use one signal for handshaking for a wider bus, use one signal for handshaking never introduce a fork before the end of synchronizer never introduce a fork before the end of synchronizer estimate the MTBU of your solution estimate the MTBU of your solution too low MTBU leads to failures too low MTBU leads to failures too many stages introduce unnecessary delay too many stages introduce unnecessary delay there is definitely no magic solution to eliminate the potential for metastability, but it can be made arbitrarily improbable there is definitely no magic solution to eliminate the potential for metastability, but it can be made arbitrarily improbable

4 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 4 Even/Odd Synchronizer works for two periodic clocks only works for two periodic clocks only avoids performance penalty of synchronizers avoids performance penalty of synchronizers largely eliminated potential for metastability largely eliminated potential for metastability for details see [Dally & Tell, The EVEN/Odd Synchronizer, ASYNC 2010] for details see [Dally & Tell, The EVEN/Odd Synchronizer, ASYNC 2010]

5 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 5 Synchronizer – Trends need for more synchronizers need for more synchronizers more function units being integrated on a chip more function units being integrated on a chip more standardized frequencies more standardized frequencies higher communication demands higher communication demands need for more synchronizer stages need for more synchronizer stages increasing PVT variations => larger safety margins increasing PVT variations => larger safety margins synchronizer paramters become worse: synchronizer paramters become worse:  C used to scale proportional to (FO4) propagation delay for decades,  C used to scale proportional to (FO4) propagation delay for decades, below 45nm technologies the scaling is worse below 45nm technologies the scaling is worse synchronizers tend to create a considerable performance loss in the future synchronizers tend to create a considerable performance loss in the future

6 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 6 Characterizing Metastability know (=assume) exponential MTBU relation know (=assume) exponential MTBU relation measure MTBU over t res measure MTBU over t res draw semilog plot => straight line draw semilog plot => straight line find params: find params: slope   C slope   C offset  T 0 offset  T 0 need very good setup for measurements ! (assumptions made…) need very good setup for measurements ! (assumptions made…) dat. f clk. T 0 CC 1 t res (ns) dat = 2MHz f clk = 10MHz

7 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 7 Measuring Metastability DUT clk DQ MS producerMS detectorcounter [Altera]

8 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 8 MS Producer Aims: Aims: create as many MS events as possible in short time create as many MS events as possible in short time produce uniform distribution of phase relations OR produce uniform distribution of phase relations OR well-controlled and reproducible phase well-controlled and reproducible phase Implementation options: Implementation options: two independent clocks two independent clocks one clock source, controllable relative delay between clock and data path one clock source, controllable relative delay between clock and data path variable delay element variable delay element feedback control feedback control

9 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 9 MS Detector Aims: Aims: detect metastable output of DUT detect metastable output of DUT Problem: Problem: How define MS ? How define MS ? late transition detection late transition detection intermediate voltage detection intermediate voltage detection output proximity detection output proximity detection Implementation options (late trans det): Implementation options (late trans det): sample DUT output with FF1 after t res sample DUT output with FF1 after t res compare with reference FF2 having „infinite“ t res compare with reference FF2 having „infinite“ t res mismatch indicates metastability mismatch indicates metastability many sources of error! many sources of error!

10 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 10 Types of Synchrony synchronous synchronous identical frequency, constant phase relation identical frequency, constant phase relation classical synchronous system driven by one clock source classical synchronous system driven by one clock source mesochronous = multisynchronous mesochronous = multisynchronous identical frequency (no accumulating drift) but unknown maybe varying phase relationship (bounded) identical frequency (no accumulating drift) but unknown maybe varying phase relationship (bounded) example: different PLLs driven by the same source example: different PLLs driven by the same source plesiochronous plesiochronous same nominal clock frequency, mutual (low) drift same nominal clock frequency, mutual (low) drift independent clock sources with same nominal frequency independent clock sources with same nominal frequency heterochronous = multisynchronous heterochronous = multisynchronous clocks totally unrelated clocks totally unrelated independent clock sources with different nominal frequency independent clock sources with different nominal frequency

11 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 11 Global Synchrony? Problem 1: Clock distribution Problem 1: Clock distribution Low-skew clock distribution becomes difficult for large chips and high frequencies Low-skew clock distribution becomes difficult for large chips and high frequencies Clock networks consume a considerable share of the power Clock networks consume a considerable share of the power Problem 2: Clock selection Problem 2: Clock selection SoC contains many IPs, each specified for its own frequency SoC contains many IPs, each specified for its own frequency specific frequencies required for some functions (interface standards, e.g.) specific frequencies required for some functions (interface standards, e.g.) dynamic local changes due to voltage & frequency scaling, clock & power gating dynamic local changes due to voltage & frequency scaling, clock & power gating

12 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 12 Clock Distribution TRG src TRG snk t CO t pd t CO valid alid alid synchronous approach: clock skew 1 setup violation

13 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 13 Clock Distribution TRG src TRG snk t CO t pd t CO valid alid alid synchronous approach: clock skew 2 hold violation

14 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 14 Clock Distribution TRG src t CO t pd alid alid asynchronous approach: REQ delay REQ completion detection ACK TRG src TRG snk t CO valid ACK

15 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 15 Clock Distribution TRG src t CO t pd alid alid asynchronous approach: ACK delay REQ completion detection ACK TRG snk ACK TRG src t CO valid

16 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 16 Clock Distribution TRG src t CO t pd alid alid asynchronous approach: data delay ACK REQ completion detection TRG src TRG snk t CO valid ACK

17 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 17 The GALS Approach SoC is clearly structured into IPs anyway SoC is clearly structured into IPs anyway run each at its desired individual frequency => synchronous islands run each at its desired individual frequency => synchronous islands efficient, well understood efficient, well understood communication between IPs communication between IPs has to bridge clock boundaries has to bridge clock boundaries may run over larger distances may run over larger distances => asynchronous paradigm (handshake- based) better suited for composition => asynchronous paradigm (handshake- based) better suited for composition Globally Asynchronous Locally Synchronous (GALS) First mention in PhD thesis by Chapiro / Stanford 84

18 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 18 A GALS Example CPU 2GHz PCI-IF 533MHz DSP 2,7GHz USB-IF 24MHz

19 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 19 Communication in GALS Shared Memory Shared Memory producer writes to memory, consumer reads from there pro: control flow stays independent shared single-port memory shared single-port memory true dual-port memory true dual-port memory Direct Messages (Data words) Direct Messages (Data words) move data word from producer‘s output register to consumer‘s input register non-buffered / buffered (FIFO-queues) non-buffered / buffered (FIFO-queues) clock fixed, data-driven or pausible clock fixed, data-driven or pausible

20 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 20 Shared Memory decoupling of clock domains by memory acting as a third party => high area overhead => unusual decoupling of clock domains by memory acting as a third party => high area overhead => unusual memory must be asynchronous, otherwise direct message model applies (producer => memory and memory => consumer) memory must be asynchronous, otherwise direct message model applies (producer => memory and memory => consumer) for single port memory arbitration required for single port memory arbitration required arbitration problem (unbounded delay…) arbitration problem (unbounded delay…) one side may block the other at the arbiter one side may block the other at the arbiter for multiport memory problems are confined to access to the same cell for multiport memory problems are confined to access to the same cell busy flag may become metastable busy flag may become metastable blocking still possible for one specific address blocking still possible for one specific address

21 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 21 Shared Memory perfect decoupling of data path perfect decoupling of data path potential metastability problems at arbitration logic potential metastability problems at arbitration logic potential blocking through arbitration potential blocking through arbitration CPU 2GHz shared memory Arbi- tration 0xff14 DSP 2,7GHz

22 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 22 Direct Messages clock domain boundary is between producer‘s output register and consumer‘s input register clock domain boundary is between producer‘s output register and consumer‘s input register in general a synchronizer is needed at consumer‘s input in general a synchronizer is needed at consumer‘s input definitely for conventional (fixed) clock definitely for conventional (fixed) clock can be avoided by data-driven / pausible clocking can be avoided by data-driven / pausible clocking control flows of producer and consumer are strongly coupled: not maintaining the input/output register blocks the other party control flows of producer and consumer are strongly coupled: not maintaining the input/output register blocks the other party buffers/queues/FIFOs can buffers/queues/FIFOs can mitigate, but not avoid this problem (full/empty) mitigate, but not avoid this problem (full/empty) compensate variations in the data rate on both sides, but not different average data rates compensate variations in the data rate on both sides, but not different average data rates

23 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 23 Direct Messages data moving over clock domain boundary data moving over clock domain boundary metastability problems metastability problems => need to insert handshake => need to insert handshake …with synchronizers …with synchronizers and (optional) buffers and (optional) buffersS0xff14 CPU 2GHz DSP 2,7GHz S

24 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 24 Arbiter: Principle purpose: purpose: manage concurring requests to shared resource manage concurring requests to shared resource method: method: handle pairs of request_in / grant_out handle pairs of request_in / grant_out requests may arrive in any order requests may arrive in any order arbiter must activate only one grant_out at a time (respond to the first requester) Mutual Exclusion (MUTEX) arbiter must activate only one grant_out at a time (respond to the first requester) Mutual Exclusion (MUTEX) problem: problem: resolve concurrent requests => metastability problem resolve concurrent requests => metastability problem

25 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 25 Arbiter: Circuit MUTEX-element: SR-latch G1’ G2’ R1 R2 G1 G2 „Metastability filter“: e.g., hi-threshold inverter [from D. J. Kinniment „Synchronization and Arbitration in Digital Systems“, Wiley] V out,FF t V th,inv V meta

26 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 26 Arbiter: Operation G1’ G2’ R1 R2 G1 G2 V out,FF t V th,inv V meta R1 G1 R2 G2

27 Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 27 Muller C-Element RS reset set a b y IF a = b THEN y = a ELSE hold y C ab y C a b y

28 Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 28 Muller C-Element: Circuit [Sutherland] [Martin] [van Berkel]

29 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 29 Data-Driven Clocking Principle: Principle: as soon as new data arrive => start clocking as soon as new data arrive => start clocking determine number k of clock cycles required to process new data determine number k of clock cycles required to process new data stop clocking after k cycles, wait for next data stop clocking after k cycles, wait for next data Properties: Properties: need to switch clock on and off => beware spurious clock pulses! need to switch clock on and off => beware spurious clock pulses! no metastability problem: data stable as soon as consumer clock starts no metastability problem: data stable as soon as consumer clock starts potential for power saving potential for power saving useful for very specific applications only (no pipe!) useful for very specific applications only (no pipe!)

30 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 30 Data-Driven Clock: Circuit CLK out  CLK half period deter- mined by  CLK half period deter- mined by  

31 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 31 Data-Driven Clock: Circuit  C REQ ACK CLK out REQ ACK transition on REQ answered by transition on CLK out transition on REQ answered by transition on CLK out min CLK half period deter- mined by  min CLK half period deter- mined by  CLK out  metastability?

32 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 32 Pausible Clocking Principle: Principle: producer requests consumer‘s clock to pause producer requests consumer‘s clock to pause data are provided to input register during idle time data are provided to input register during idle time consumer‘s clock may resume consumer‘s clock may resume free running („pausible clock“) free running („pausible clock“) with one cycle only („stoppable clock“) with one cycle only („stoppable clock“) Properties: Properties: need to switch clock on and off => beware spurious clock pulses! => beware of clock tree delays! need to switch clock on and off => beware spurious clock pulses! => beware of clock tree delays! producer controls consumer‘s clock (blocking!) producer controls consumer‘s clock (blocking!) applications must be able to cope with paused clock applications must be able to cope with paused clock

33 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 33 Pausible Clock: Circuit  C REQ ACK CLK out REQ ACK inverter generates next REQ from ACK inverter generates next REQ from ACK self-oscillation self-oscillation CLK out 

34 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 34 Pausible Clock: Circuit  C REQ’ ACK’ external unit can safely stop CLK by activating REQ’ external unit can safely stop CLK by activating REQ’ … and gets ACK’ as a response … and gets ACK’ as a response CLK out REQ’ ACK’ Arb  metastability?

35 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 35 Pausible Clock: Circuit  C REQ1 ACK1 for more external sources arbiters can be added and “anded” before the Muller C-Element for more external sources arbiters can be added and “anded” before the Muller C-Element the two inverters can be eliminated by using a Muller C-Element with inverting output the two inverters can be eliminated by using a Muller C-Element with inverting output CLK out Arb REQn ACKn Arb


Download ppt "Advanced Digital Design GALS Design A. Steininger Vienna University of Technology."

Similar presentations


Ads by Google