Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing.

Similar presentations


Presentation on theme: "Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing."— Presentation transcript:

1 Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)

2 Sources of timing variation PVT variation Dynamic: Voltage & Temperature fluctuations Static: Process variation Aging degradation NBTI, PBTI TDDB Soft errors (in non-regular logics) SEU & SET

3 Process variation Sub-wavelength Lithography What you get is not what you want Systematic Random dopant fluctuations V th variation Random Max Freq. differentiate by 20% ! [Teodorescu, ISCA08] P variation is time-independent, DC component

4 Temperature variation Application-specific Slow-varying Milliseconds Typical thermal constant : 2ms [Donald, ISCA06] T variation is slow-varying, Low-frequency components

5 Voltage variation Fast-changing Inductive noise a.k.a. L(di/dt) problem IR-drop Why it is harder to keep a constant voltage level ? Example Power budget: 100W Working voltage: 1V Current: 100A To keep voltage fluctuation between ±5%, R PDN < 0.5 mOhm PDN hierarchy model V variation is fast-changing, High-frequency components

6 Aging degradation Aging mechanisms NBTI (PMOS) PBTI (NMOS) TDDB 20%degradation 10years Lifetime Useful time Infant mortality Aging Failure rate

7 Soft errors SEU (Single Event Upset) Unintentional bit-flip in storage cells SET (Single Event Transient) Transient voltage pulse propagating in combinational logics SEU SET

8 Outline TEA-TM Timing emergency-aware thread migration PVT variations co-optimization SVFD Stability violation based fault detection On-line fault detection via timing sensing Delay fault, aging delay, soft errors MicroFix Margin-reducing with timing sensing Application to DVFS ReviveNet Aging-delay tolerance

9 TEA-TM : Timing Emergency-Aware Thread Migration Focus on the essential Timing issue Not Necessarily aggregated, but can cancel off each others in some cases. Hence, Complementary. Process variation Voltage variation Temperature variation Timing variation (, )

10 Some terms Timing emergency (TE) Emergency level (EL) Density of TE Define: EL = # of TE per 100 millions cycles Time Delay Timing Emergency Threshold Violent Mild Slow corner Fast corner VoltageTemperatureProcess Large fluctuation Small fluctuation Hot Cool

11 How PVT Variations Complement each other ? Observation in time domain What if exchange the threads on Core1 and Core2? T. Mild, V. Mild Core1: Large margin, low EL T. Violent, V. Violent Core2: Little margin, High EL Time Delay Threshold Time Delay T Violent, V Violent T Mild, V Mild T Mild, V Violent T Violent, V Mild Emergency Excessive headroom Mild + Violent

12 Frequency domain analysis Migrate threads Graft V component

13 Frequency domain analysis (cont.) Relative frequency spectrum deviations on 2GHz quad-core processor. P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz. Potential Core3 and Core4 are mild Strategy exchange threads on Core1 and Core4, Core2 and Core 3

14 TEA-TM Summary Analyzing the complementary effect from both time and frequency domain Presenting a delay sensor- based scheme (TEA-TM) to exploit the complementary effect Simple, cost-efficient FFT-like heuristic Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp.485-496, Jun. 2010. Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp.485-496, Jun. 2010. Throughput: 30% Fairness: 80%

15 Stability Violation Stable Period vs. Variable Period Stability Violation: Signal transitions occur in Stable Period.

16 In what situations would SVs occur? Delay faults resulting from –Delay defects (introduced in manufacturing processes) –Aging (Wearout) induced performance degradation Due to Delay Fault Setup time Setup time violation TT But, Can soft error be modeled by SV? Thus, delay faults caused stability violation do not differ too much from setup time violation YES!

17 How do Soft Errors cause SV? SEU Si violates Stability Requirement! SET So violates Stability Requirement! Notice: NOLY the SVs occurring in vulnerable window --- within which the flip-flops are updated --- could cause failures.

18 The next problem is How to detect stability violations? Low cost stability checker

19 Some Rresults Implementation SVFD-protected FPU Using 65nm PTM, Hspice Simulation A Unified Online Fault Detection Scheme via Checking of Stability Violation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE/ACM Desing, Automation and Test in Europe (DATE 09), pp.496-501, 2009. SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), 19(9), Sep. 2011.

20 Besides of fault detection, what else can we do with SVFD? Dynamic margin reduction MicroFix: an application to DVFS Aging tolerance ReviveNet: Fine-grained aging delay tolerance

21 Dynamic margin reduction Timing sensors setup

22 Operational Principles

23 Fine-grained margin exploited Generous Flip-flop (GFF) Forward Adaptable Flip-flop (FAFF) Backward Adaptable Flip-flop (BAFF)Unadaptable Flip-flop (UAFF) Localized timing imbalance

24 Case study results Apply to a FPU 32nm PTM models TH=0.2~0.3 is an optimal choice! Efficiency Improvement: 35% EDP MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(2), 1-21, 2011. MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED09), pp395-400, 2009.

25 Localized Aging Tolerance The chance for aging adaptation We have chance to act before its too late

26 Nudge for timing margin Dynamic time borrowing Path-grained, NOT stage-grained

27 Aging sensors setup Coarse-grained detection

28 Trail-based adaptation Adaptation latency is non-critical Trail till success Fine-grained adaptation

29 Implementation False-alarm filter Sharing filters to reduce overhead ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE Transactions on Computers (TC), 60(9), Sep. 2011.

30 Conclusion Dynamic timing variation is increasingly critical Online timing variation detection and tolerance is a promising approach to dynamic variation Application-specific timing variation MicroFix for DVFS ReviveNet for aging tolerance Holistic solution can be more cost-effective TEA-TM Architectural optimization for Circuit symptom

31 Publication ( Chronological order ) 1.Guihai Yan, Yinhe Han, Xiaowei Li, ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation, IEEE Transactions on Computers (TC), Vol.60, No.9, pp.1219-1232, Sep. 2011.ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation 2.Guihai Yan, Yinhe Han, Xiaowei Li, SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation, IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), Vol.19, No.9, pp.1627-1640, Sep. 2011.SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation 3.Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction, ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol.16, No.2, pp.1-21, Mar. 2011.MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction 4.Jianbo Dong, Lei Zhang, Yinhe Han, Guihai Yan, Xiaowei Li, Performance-asymmetry-aware Scheduling for Chip Multiprocessors with Static Core Coupling, Journal of Systems Architecture, Vol.56, pp.534-542, 2010.Performance-asymmetry-aware Scheduling for Chip Multiprocessors with Static Core Coupling 5.Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp.485-496, Jun. 2010.Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors 6.Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'09), pp.395-400, 2009.MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency 7.Song Jin, Yinhe Han, Lei Zhang, Huawei Li, Xiaowei Li and Guihai Yan, M-IVC: Using Multiple Input Vectors to Minimize Aging-induced Delay, Proc. of IEEE Asian Test Symposium (ATS'09), 2009.M-IVC: Using Multiple Input Vectors to Minimize Aging-induced Delay 8.Guihai Yan, Yinhe Han, Xiaowei Li, A Unified Online Fault Detection Scheme via Checking of Stability Violation, IEEE/ACM Desing, Automation and Test in Europe (DATE'09), pp.496-501, 2009.A Unified Online Fault Detection Scheme via Checking of Stability Violation 9.Guihai Yan, Yinhe Han, Xiaowei Li, Hui Liu, BAT: Performance-Driven Crosstalk Mitigation Based on Bus-grouping Asynchronous Transmission, IEICE Transactions On Electronics, Vol.E91-C, No.10, pp.1690-1697, Oct, 2008.BAT: Performance-Driven Crosstalk Mitigation Based on Bus-grouping Asynchronous Transmission

32 Book Chapters Fault Tolerance Designs for Digital Integrated Circuits: Tolerating defects/faults, parameter variations, and soft errors (in Chinese), Beijing, Science Press, 2011. ISBN 978-7-03- 030576-3. Fault Tolerance Designs for Digital Integrated Circuits: Tolerating defects/faults, parameter variations, and soft errors

33 When Ive done a program…


Download ppt "Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing."

Similar presentations


Ads by Google