Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky.

Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky

Overview Introduction Effects of On-chip Variability on Synchronizer Performance Proposed Adaptation Schemes Conclusions

Inter-Clock Domain Communication Data transfer between different clock domains should be performed carefully Incoming data change near receiver clock edge causes metastability, which may lead to a functional failure due to non-deterministic propagation delay –Either set-up or hold time is not satisfied

Metastability Resolution Real life: FF will finally resolve into one of the stable stages –Thanks to thermal noise –Thanks to the next clock transition

Synchronization Failure (1) Metastability is not a singular problem at the sampling time, it spreads through your circuit causing total failure!

Synchronization Failure (2) Long delay due to M/S causes violation of cycle time Failures due new M/S event or incorrect function

MTBF Mean Time Between Failures Given metastability at t = 0, probability of metastability at t > 0 = e -t/  Failure: Still metastable by next clock –Failure = p(enter m.s.)  p(still m.s. after T) –Rate(failure) = Rate(enter m.s)  p(still m.s. after T) =W  F c  F d  e -T/  MTBF = 1/ Rate( failure) =  resolution time constant of the synchronizer

Sources of device variability Random dopant fluctuations (RDF) Line-edge/line-width roughness (LER/LWR) Oxide thickness variations (OTV)

LER/LWR

RDF influence on V t 90 nm NMOS

LER/LWR influence on  (V t )

Voltage Threshold Effects on  and MTBF

Effects of On-chip Variability on Synchronizer Performance (1) Process variation At 180 nm  = 4%, so we can expect one synchronizer out of 1000 may have a 12% worse value of . At 45 nm this value will reach 50%. M.Garg et. al., ISCAS 2005, May 2005 & ITRS 2005

Temperature and Supply Voltage Effects on Synchronization for CMOS device in saturation region When synchronizer operates at low supply voltage, the decrease of drain current lengthens the delay and extends the time constant.

Drain Current vs Temperature Near ZTC (zero temperature coefficient) point temperature dependence is minimized

Carrier mobility vs Temperature Increases when temperature decreases High mobility increases the current At a high supply voltage (Vdd > ZTC and Vdd >> Vth) the drain current is dominantly controlled by the carrier mobility, and hence decreases with temperature rise

Threshold Voltage vs Temperature Increases when temperature decreases Higher Vth decreases the current When Vdd approaches Vth (Vdd < ZTC), Vth has a stronger effect on the drain current, and as a result the current grows with temperature rise

Drain Current in Saturation Region - threshold voltage - carrier mobility

High Vdd

Low Vdd

 Gate Delay vs Vdd, T

 vs Vdd, T

Effects of On-chip Variability on Synchronizer Performance (2) Voltage and Temperature variations Disproportional affect is observed. As a result a 50% reduction in power supply voltage may cause over 100% increase in  Simulation results of Jamb latch at 90nm

Synchronizer Selection Scheme (1) Problem: –Technology : 90 nm –  : 11 ps –  : 8% In the worst case we have to allow for a  of 3.09  or 13.72 ps to ensure that the probability of a synchronizer having  worse than this is 0.001. This will add the delay of all synchronizers on the chip and therefore affect the system performance.

Synchronizer Selection Scheme (2) Solution 1: –Make the width of all transistors in the synchronizer N times larger (say N = 4) –Assuming this reduces most of the process variations and the deviation is now: –The worst case becomes 12.36 ps, but the power is increased by 4 times. –Increasing transistor size can not reduce all kinds of process variations, so the actual  will be more than 4%.

Synchronizer Selection Scheme (3) Solution 2: –Make N standard size synchronizers, measure their  on chip, and select the best one. –After the selection, all the others are powered down, as is the measurement circuitry. –Power during operation is therefore the same as for a single small synchronizer, but the performance is improved.

Synchronizer Selection Scheme (3) Example of N = 4 case –The probability of one synchronizer having  worse than 11.81 ps is: –The probability of all 4 synchronizers having  worse than this is

Synchronizer Selection Scheme (4) Solution2 achieves better  than Solution1 Solution2 deals with all kinds of process variations (Solution1 doesn’t deal with oxide thickness)

Synchronization Time Adjustment Scheme (1) Problem: –PVT variations cause 50% worse value of  –To achieve the required MTBF, all the synchronizers have to be extended over 1.5 times their original values –Extended synchronization time may be wasted

Synchronization Time Adjustment Scheme (2) Solution: –Adjust synchronization time of each synchronizer according to actual PVT and data rate variations to improve the system performance on the conditions that the required MTBF is met

On-Chip Measurement of Failure Rates (1)

On-Chip Measurement of Failure Rates (2)

Calculation of  from Measured Failure Rates

Calculation of MTBF from Measured Failure Rates

Synchronizer Selection Scheme Architecture N redundant synchronizers Shared by N synchronizers, from which the best one is to be selected Values from counter2 are stored in a FIFO for comparison

Synchronizer Adjustment Scheme Architecture VDL – variable delay line. Used to control the synchronization time of the synchronizer Registers – used to hold the delay of the VDL Comparator – compares calculated MTBF with the user-required Memory – stores the calculation results for later use and user- inputted data such as clock frequencies for calculation

FPGA Implementation On-Chip OverheadOff-Chip Overhead Synchronizer Selection Scheme 9 flip-flops and 6 gates per synchronizer 34 flip-flops and 110 gates Synchronization Time Adjustment Scheme 33 flip-flops and 104 gates per synchronizer 436 flip-flops and 732 gates

Failure Detector Used to detect the failure at 2 different sampling times of the output of synchronizer Synchronizer is clocked by SCLK signal which is generated from the local clock signal CLK Synchronization time = |rising_edge(SCLK)-rising_edge(CLK)| T2-T1 = 100ps in FPGA implementation

Failure Counters (1)

Failure Counters (2) Count the number of failures detected at different sampling times Counters 1 and 2 are used to count the number of failures at the sampling times SCLK+T1 and SCLK+T2 Counter3 is used to count the number of clock cycles For the synchronizer selection scheme Counter3 is not needed so the hardware overhead can be further reduced

Synchronizer Selection Circuit 4 p-type transistors are used to switch the power for the four synchronizers After the best one is selected, the other three are powered down as is selection circuitry

Variable Delay Line(1) Usually implemented by transistor level circuits In FPGA can only be implemented as inverter chains. Inverters, in turn, are implemented by LUTs. LUT delay + wire delay > 1 ns on Spartan3 Smaller incremental delay can be achieved by using the connection delay difference on FPGA

Variable Delay Line(2) Careful placing of internal XOR gates can get an incremental delay which is the difference between the connection delay in two neighboring paths down to 100ps With VDL implemented on chip an incremental delay of 1ps can be easily achieved

Implementation of  and MTBF Calculation A = MTBF2 B = MTBF1 E = T2 – T1 G = T3 – T1 I = Counter3_output F = 

Division Implementation Divider is pipelined to achieve high performance and low area Divisor and dividend inputs are multiplexed to make it reusable Control Counter counts the number of clock cycles used for division Register stores divider output for later division steps

Log Calculation Implementation(1)

Log Calculation Implementation(2) Uses lookup tables Due to possibly large values it is impossible to build a full log LUT Different resolutions can be used for calculating different values (high resolution for small values, and low – for larger ones) 3 LUTs are used to provide an accuracy of 2 decimals, which leads to an error of 1% in calculated MTBF

Hardware Saving 80% of on-chip overhead goes on VDL implementation on FPGA When implemented on chip using transistors the overhead will be reduced by 50% Off-chip part can also be reduced by lowering the calculation accuracy

Application of 2 Schemes (1) Synchronizer selection scheme is aimed at improving synchronizer performance subject to process variation It only needs to operate once when setting up the chip since the process variation is fixed when the chip is fabricated After the selection, power consumption is the same as of single synchronizer, because all redundant modules are powered down The scheme has a small overhead and can be entirely put on chip

Application of 2 Schemes (2) Synchronization time adjustment scheme is used to deal with PVT and data rate variations Consumes relatively large amount of power and hardware Only needs to operate once when deals with process or fixed Vdd variations. Can be powered down after all When deals with frequent data rate or power variations, the scheme needs to be put entirely on chip and operate frequently Power consumption can be reduced by reducing hardware complexity and adjustment rate

Test Results(1) Measured MTBF vs Data Rate –Synchronization time = 3.5 ns –Clock frequency = 10MHz MTBF decreases with the data rate increasing as expected

Test Results(2) Measured MTBF vs Synchronization Time –Data Rate = 5MHz –Clock frequency = 10MHz MTBF increases with the synchronization time increasing as expected

Test Results(3) Measured  vs Vdd  increases with Vdd decreasing as expected

Conclusions Two adaptation schemes have been proposed to reduce the effects of on-chip variability on synchronizers. They both were implemented on Xilinx’s FGPA Spartan3 Synchronizer selection scheme deals with process variations, has a small overhead and can be put entirely on chip Synchronization time adjustment scheme deals with PVT and data rate variations. It has a relatively large overhead, which can be reduced by lowering the calculation accuracy of MTBF.

References J. Zhou, D. J. Kinniment, G. Russell, and A. Yakovlev, “Adapting Synchronizers to the Effects of On Chip Variability”, 14th IEEE International Symposium on Asynchronous Circuits and Systems, pp. 39-47, 2008. Michael Kayam, Ran Ginosar and Charles E. Dike “Symmetric Boost Synchronizer for Robust Low Voltage, Low Temperature Operation,” Technical Report, Jan. 2007. R. Dobkin, “vSync HDK customers presentation” D.J.Kinniment, A. Bystrov, A.V. Yakovlev, “Synchronization Circuit Performance”, IEEE Journal of Solid-State Circuits, 37(2), pp. 202-2009, 2002

The End Questions?

Rules for Normally Distributed Data If a data distribution is approximately normal, then: –68% of the data values are within  –95% of the data values are within  –99.7% of the data values are within 

Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky.

Similar presentations

Presentation on theme: "Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky.

Similar presentations

Presentation on theme: "Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky."— Presentation transcript:

Similar presentations

About project

Feedback