5 SEU in Configuration Memory SEU in cinfiguration bits (SRAM-based): In Virtex FPGAs, ~ 91% of sensitive bits to soft errors are configuration bits −flash- or antifuse-based do not suffer Any change to the configuration memory may alter the functionality Persist until FPGA is reprogrammed
6 SEU Mitigation Techniques Mitigation techniques: 1.Circuit and technology-level: −Addition of metal capacitors to nodes in the memory increases the amount of charge necessary to cause SEU 2.System-level: −Ensures that the system can detect and recover. −Regularly verify their configuration memory by comparing the current values with the desired configuration state using cyclic redundancy checks (Altera Stratix III) 3.User-level: a)TMR (triple modular redundancy): −Replicating a design three times and voting among outputs −Reduce the sensitivity to soft errors in the design by careful selection of the resources used
7 Circuit Level [Ebrahimi]: Reduce # SRAM cells in a switch box (6 5)( 6 4)
8 Circuit Level [Ebrahimi]: Reduce # SRAM cells in a switch box (6 5)( 6 4)
9 User Design Level Care bits [Golshan07] : Only a subset of configuration bits affect the design due to SEU. Resource A is used for net A A-B SRAM is not a care bit if B is not used by other nets. A-C SRAM bit is a care bit (change to ‘1’ hurts net A). A-D SRAM bit is not a care bit (w.r.t. net A) if D not used.
10 User Design Level Soft Error Routing Problem [Golshan07]: Given a routing graph and a set of multi-terminal nets, route each net with the least care-cost, where care- cost is the number of routing care bits. Experiments: 14% reduction in the number of care bits −~80% of soft errors in the FPGA: configuration memory [Kuon07]
12 Process Variation Sources 2.3 2.2 2.1 1.9 1.8 50 100 0 20 40 60 x 10 - 7 Wafer X Wafer Y 2.0 [IBM, Intel and TSMC]
13 Variation Variations Variation of variation over years Variation from mean value −Gate oxides are so thin that a change of one atom can cause a 25 percent difference in substrate current. −EE Times (04/11/2006) ILD: inter-layer dielectric
14 Statistical Description The combined set of underlying deterministic and random contributions are lumped into a combined “random” statistical description. For devices on one wafer, the distribution (mean and variance) for L can be different from devices within a single die.
15 Inter-die vs. Intra-die Variations Figures are courtesy of IBM, Intel and TSMC Intra-die spatial Correlation Inter-die global Correlation L eff
16 Impact of Variation Importance of variation: Timing violations − Yield loss
17 Impact of Variation Process variations can cause up to 2000% variation in leakage current and 30% variation in frequency in 180nm CMOS −Borkar, S., Karnik, T., Narenda, S., Tschanz, J., Keshavarzi, A., De, V. Parameter Variations and Impact on Circuits and Microarchitecture. In Proc. of DAC (2003), 338-342.
18 Impact of Variation Die-to-die frequency variation
19 Variation in FPGA Binning: Historically: most of variation between dies − FPGA manufacturers test the speed of each FPGA after manufacturing and binning each device according to its speed. −Higher speeds: more expensive −Unacceptable leakage power: discard the device More recently: significant within die variation −Cannot be leveraged in the same manner −Operating speeds must be reduced to maintain functionality −90nm: speed reduction of 5.7% −22nm: speed reduction of 22.4%
20 Solutions Architectural solution: 1.Select the logic block architecture parameters to minimize this variation −LUT size is particularly important [Wong05] −LUT size = 4 : highest leakage yield −LUT size = 7 : highest timing yield −LUT size = 5 : maximum combined leakage and timing yield. 2.Adaptively compensate for any variation through body- biasing [Nabaa06]: −Slow blocks: set to a body bias decrease V t increase block’s speed −Fast blocks: increase threshold voltage reduce leakage power Experiments: − Area penalty: 1%–2% − Delay variability reduction: 30% − Leakage variability reduction: 78%
21 Solutions CAD-Level: 1.Statistical static timing analysis (SSTA) in FPGA CAD tools − Improve delays by avoiding the margins that are necessary for traditional STA 2.Testing multiple logically equivalent configurations of the FPGA to find one that is functional at the desired speed [Sedcole07] 3.Generating critical paths that will be more robust in the face of variation [Matsumoto07]
22 Inter-die vs. Intra-die Variations P 0 = nominal design value ΔP intradie = intra-die variation (within a given chip) Δ P interdie = Inter-die variation (from one chip to another) Δ P e = remaining “random” or unexplained variation P: a structural or electrical parameter e.g. −W, −tox, −Vth, −channel mobility, −coupling capacitances, −line resistances.
23 Corner Analysis PRCA (Process Corner Analysis): Takes 1.nominal values of process parameters 2.and a delta for each parameter by which it varies. Finds −performance as max and min values. Pros: Simple Cons: conservative inaccurate
24 Corner Analysis PRCA shortcoming: Process corners are believed to coincide with performance corners. −Fact: best-case corner may not depend on P min or P max for a particular interconnect parameter but on a value within that range.
26 Solutions CAD-Level: 2.Testing multiple logically equivalent configurations of the FPGA to find one that is functional at the desired speed [Sedcole07]
27 References [Kuon07] Kuon, Tessier, “FPGA Architecture: Survey and Challenges,” Foundations and Trends in Electronic Design Automation, Vol. 2, No. 2 (2007) 135–253. [Lin07] Yan Lin and Lei He, Device and Architecture Concurrent Optimization for FPGA Transient Soft Error Rate, ICCAD 2007 [Golshan07] S. Golshan and E. Bozorgzadeh, “Single-event- upset (SEU) awareness in FPGA routing,” in DAC ’07: [Xilinx] www.xilinx.com [Altera] www.altera.com [Wong05] H.-Y.Wong, L. Cheng, Y. Lin, and L. He, “FPGA device and architecture evaluation considering process variations,” in ICCAD, 2005. [Nabaa06] G. Nabaa, N. Azizi, and F. N. Najm, “An adaptive FPGA architecture with process variation compensation and reduced leakage,” DAC, 2006.
28 References [Sedcole07] P. Sedcole and P. Y. K. Cheung, “Parametric yield in FPGAs due to within-die delay variations: A quantitative analysis,” in FPGA, 2007.
29 References [Matsumoto07] Y. Matsumoto, M. Hioki, T. Kawanami, T. Tsutsumi, T. Nakagawa, T. Sekigawa, and H. Koike, “Performance and yield enhancement of FPGAs with within-die variation using multiple configurations,” in FPGA 2007.