Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device.

Similar presentations


Presentation on theme: "Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device."— Presentation transcript:

1 Recent Challenges

2 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device packages −Recently: cosmic radiation  Scaling worsens SEU: 1.Voltage scaling + reduced node capacitances −  lower the charge threshold necessary to corrupt the data 2.Greater level of integration −  increases the likelihood that soft errors will affect the device

3 3 SEU Sources:  Configuration memory  Flip-flops  Memory blocks  Combinational circuits (transient error  permanent)

4 4

5 5 SEU in Configuration Memory SEU in cinfiguration bits (SRAM-based):  In Virtex FPGAs, ~ 91% of sensitive bits to soft errors are configuration bits −flash- or antifuse-based do not suffer  Any change to the configuration memory may alter the functionality  Persist until FPGA is reprogrammed

6 6 SEU Mitigation Techniques Mitigation techniques: 1.Circuit and technology-level: −Addition of metal capacitors to nodes in the memory  increases the amount of charge necessary to cause SEU 2.System-level: −Ensures that the system can detect and recover. −Regularly verify their configuration memory by comparing the current values with the desired configuration state using cyclic redundancy checks (Altera Stratix III) 3.User-level: a)TMR (triple modular redundancy): −Replicating a design three times and voting among outputs −Reduce the sensitivity to soft errors in the design by careful selection of the resources used

7 7 Circuit Level [Ebrahimi]:  Reduce # SRAM cells in a switch box (6  5)( 6  4)

8 8 Circuit Level [Ebrahimi]:  Reduce # SRAM cells in a switch box (6  5)( 6  4)

9 9 User Design Level Care bits [Golshan07] :  Only a subset of configuration bits affect the design due to SEU. Resource A is used for net A  A-B SRAM is not a care bit if B is not used by other nets.  A-C SRAM bit is a care bit (change to ‘1’ hurts net A).  A-D SRAM bit is not a care bit (w.r.t. net A) if D not used.

10 10 User Design Level Soft Error Routing Problem [Golshan07]:  Given a routing graph and a set of multi-terminal nets, route each net with the least care-cost, where care- cost is the number of routing care bits. Experiments:  14% reduction in the number of care bits −~80% of soft errors in the FPGA: configuration memory [Kuon07]

11 Recent Challenges Process Variation

12 12 Process Variation Sources x Wafer X Wafer Y 2.0 [IBM, Intel and TSMC]

13 13 Variation Variations Variation of variation over years Variation from mean value −Gate oxides are so thin that a change of one atom can cause a 25 percent difference in substrate current. −EE Times (04/11/2006) ILD: inter-layer dielectric

14 14 Statistical Description  The combined set of underlying deterministic and random contributions are lumped into a combined “random” statistical description.  For devices on one wafer, the distribution (mean and variance) for L can be different from devices within a single die.

15 15 Inter-die vs. Intra-die Variations Figures are courtesy of IBM, Intel and TSMC Intra-die spatial Correlation Inter-die global Correlation L eff

16 16 Impact of Variation Importance of variation:  Timing violations −  Yield loss

17 17 Impact of Variation Process variations can cause  up to 2000% variation in leakage current and  30% variation in frequency in 180nm CMOS −Borkar, S., Karnik, T., Narenda, S., Tschanz, J., Keshavarzi, A., De, V. Parameter Variations and Impact on Circuits and Microarchitecture. In Proc. of DAC (2003),

18 18 Impact of Variation Die-to-die frequency variation

19 19 Variation in FPGA Binning:  Historically: most of variation between dies −  FPGA manufacturers test the speed of each FPGA after manufacturing and binning each device according to its speed. −Higher speeds: more expensive −Unacceptable leakage power: discard the device  More recently: significant within die variation −Cannot be leveraged in the same manner −Operating speeds must be reduced to maintain functionality −90nm: speed reduction of 5.7% −22nm: speed reduction of 22.4%

20 20 Solutions Architectural solution: 1.Select the logic block architecture parameters to minimize this variation −LUT size is particularly important [Wong05] −LUT size = 4 : highest leakage yield −LUT size = 7 : highest timing yield −LUT size = 5 : maximum combined leakage and timing yield. 2.Adaptively compensate for any variation through body- biasing [Nabaa06]: −Slow blocks: set to a body bias  decrease V t  increase block’s speed −Fast blocks: increase threshold voltage  reduce leakage power  Experiments: −  Area penalty: 1%–2% −  Delay variability reduction: 30% −  Leakage variability reduction: 78%

21 21 Solutions CAD-Level: 1.Statistical static timing analysis (SSTA) in FPGA CAD tools −  Improve delays by avoiding the margins that are necessary for traditional STA 2.Testing multiple logically equivalent configurations of the FPGA to find one that is functional at the desired speed [Sedcole07] 3.Generating critical paths that will be more robust in the face of variation [Matsumoto07]

22 22 Inter-die vs. Intra-die Variations P 0 = nominal design value ΔP intradie = intra-die variation (within a given chip) Δ P interdie = Inter-die variation (from one chip to another) Δ P e = remaining “random” or unexplained variation  P: a structural or electrical parameter e.g. −W, −tox, −Vth, −channel mobility, −coupling capacitances, −line resistances.

23 23 Corner Analysis PRCA (Process Corner Analysis):  Takes 1.nominal values of process parameters 2.and a delta for each parameter by which it varies.  Finds −performance as max and min values. Pros:  Simple Cons:  conservative  inaccurate

24 24 Corner Analysis PRCA shortcoming:  Process corners are believed to coincide with performance corners. −Fact: best-case corner may not depend on P min or P max for a particular interconnect parameter but on a value within that range.

25 25 SSTA

26 26 Solutions CAD-Level: 2.Testing multiple logically equivalent configurations of the FPGA to find one that is functional at the desired speed [Sedcole07]

27 27 References [Kuon07] Kuon, Tessier, “FPGA Architecture: Survey and Challenges,” Foundations and Trends in Electronic Design Automation, Vol. 2, No. 2 (2007) 135–253. [Lin07] Yan Lin and Lei He, Device and Architecture Concurrent Optimization for FPGA Transient Soft Error Rate, ICCAD 2007 [Golshan07] S. Golshan and E. Bozorgzadeh, “Single-event- upset (SEU) awareness in FPGA routing,” in DAC ’07: [Xilinx] [Altera] [Wong05] H.-Y.Wong, L. Cheng, Y. Lin, and L. He, “FPGA device and architecture evaluation considering process variations,” in ICCAD, [Nabaa06] G. Nabaa, N. Azizi, and F. N. Najm, “An adaptive FPGA architecture with process variation compensation and reduced leakage,” DAC, 2006.

28 28 References [Sedcole07] P. Sedcole and P. Y. K. Cheung, “Parametric yield in FPGAs due to within-die delay variations: A quantitative analysis,” in FPGA, 2007.

29 29 References [Matsumoto07] Y. Matsumoto, M. Hioki, T. Kawanami, T. Tsutsumi, T. Nakagawa, T. Sekigawa, and H. Koike, “Performance and yield enhancement of FPGAs with within-die variation using multiple configurations,” in FPGA 2007.


Download ppt "Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device."

Similar presentations


Ads by Google