Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Self-calibrated.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Self-calibrated."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Self-calibrated Online Wearout Detection Authors: Jason Blome Shuguang Feng Shantanu Gupta Scott Mahlke MICRO-40 December 3, 2007

2 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 2 Motivation  “Designing Reliable Systems from Unreliable Components…” - Shekhar Borkar (Intel) [Srinivasan, DSN‘04][Borkar, MICRO‘05] More failures to come Failures will be wearout induced

3 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 3 Current Approaches  Traditional  Design margins  Burn-in  Detection: based on replication of computation  TMR (Tandem/HP NonStop servers)  DIVA (Bower, MICRO’05)  Prediction: utilizes precise analytical models and/or sensors  Canary circuits (SentinelSilicion, RidgeTop)  RAMP (Srinivasan, UIUC/IBM) RAMP Costly Static Impractical

4 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 4 Wearout Mechanisms  Many failure mechanisms have been shown to be progressive  Hot carrier injection (HCI)  Electromigration (EM)  Oxide Breakdown (OBD)  Negative Bias Temperature Inversion (NBTI)

5 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 5 Objective  Propose a failure prediction technique that exploits the progressive nature of wearout  Monitor impact on path delays Prediction Monitors evolution of wearout Proactive enables failure avoidance/mitigation Continuous feedback False negatives and positives Detection Identifies existing fault Reactive enables failure recovery End-of-life feedback False negatives

6 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 6 Oxide Breakdown (OBD)  Accumulation of defects leads to a conductive path Percolation Model [Stathis, JAP‘06]

7 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 7 OBD HSPICE Model  Post-breakdown leakage modeling [Rodriguez, Stathis, Linder, IRPS ‘03] [BSIM4.6.0, ‘06]

8 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 8 Characterization Testbench t circuit t cell  90nm standard cell library

9 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 9 Impact on Propagation Delay

10 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 10 Delay Profiling Unit (DPU) input signal Latency Sampling 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 uArch Module

11 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 11 TRIX Analysis Magnitude of divergence between TRIX global and TRIX local reflects amount of degradation

12 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 12  Exponential Moving Average (EMA)  Triple-smoothed Exponential Moving Average TRIX Analysis Details

13 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 13 Noisy Latency Profile Percent Nominal Delay (%) Increasing Age

14 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 14 DPU with TRIX Hardware input signal Latency Sampling TRIX l Calculation Prediction TRIX g Calculation 0 0 0 0 0 0 0 1 1 1

15 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 15 Wearout Detection Unit (WDU) Latency Sampling Prediction TRIX l Calculation + TRIX g Calculation

16 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 16 Evaluation Framework OR1200 Verilog OR1200 Verilog Synthesis and Place and Route Synthesis and Place and Route Timing, Power, and Temperature Simulations Timing, Power, and Temperature Simulations MediaBench Suite MediaBench Suite 90nm Library 90nm Library Fully Synthesized, P&R, OR1200 Core Monte Carlo Simulator OBD Wearout Model OBD Wearout Model HSPICE Simulations HSPICE Simulations Gate-level Processor Simulator Workload Simulator Wearout Simulator

17 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 17 WDU Accuracy

18 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 18 WDU Overhead

19 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 19 WDU Overhead

20 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 20 Long-term Vision  Introspective Reliability Management (IRM)  Intelligent reliability management directed by on-chip sensor feedback  Prospective sensors  Delay (WDU)  Leakage/Vt  Temperature

21 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 21 Introspective Reliability Management

22 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 22 Conclusions  Many progressive wearout phenomenon impact device- level performance.  It’s possible to characterize this impact and anticipate failures  WDU performance  Failure predicted within 20% of end of life (tunable)  Area overhead < 3% (hybrid)  Low-level sensors can be used to enable intelligent reliability management

23 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 23 Questions? ?


Download ppt "University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Self-calibrated."

Similar presentations


Ads by Google