Presentation is loading. Please wait.

Presentation is loading. Please wait.

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.

Similar presentations

Presentation on theme: "IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of."— Presentation transcript:

1 IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences

2 Outline Background and Related Work IVF Computing Methodology Experimental Results Conclusions

3 Background Intermittent faults are emerging as a major source of failures in microprocessors [DSN’02] Failure Rate Infant Mortality Stage Useful Life Stage Wear-out Stage Deep Submicron Era Defect escape Soft Errors Faster Aging Lifetime Intermittent faults

4 Intermittent Faults Description Occur frequently and irregularly for a period of time Caused by loose connection, manufacturing residuals, process variation, or in-progress wear- out, combined with voltage and temperature fluctuations Characteristics Occur in bursts at the same location Removed if replace the offending circuit Activated or deactivated by PVT (process, temperature, and voltage) variations

5 Protecting the Microprocessor Information redundancy techniques Parity and error-correcting codes –High area overhead –High power consumption Hardware redundancy techniques Dual modular redundancy/Triple modular redundancy –100%~200% area overhead Software redundancy techniques Redundant multi-threading –10%~30% performance overhead Conventional protection methods ensure high reliability but also cause high overhead

6 Trade-off Reliability and Overhead Key Observation Not all faults lead to external program failures A fault in branch predictor: doesn’t matter at all A fault in program counter: almost always matters Which bit matters? ACE bit / un-ACE bit: Architectural Correct Execution (ACE) bit [MICRO’03] ACE bit: If changed will lead to an external error Reliability evaluation Protect the most vulnerable structures

7 Related Metrics Mean Time To Failure (MTTF) / Mean Time Between Repair (MTBR) Masking effect Structure utilization Soft Error Vulnerability Analysis Architectural Vulnerability Factor (AVF) [MICRO’03] Program Vulnerability Factor (PVF) [HPCA’09] Hard Fault Vulnerability Analysis Hard-Faults AVF (H-AVF) [SIGMETRICS’06] The vulnerability to intermittent faults are rarely considered due to their rich causes and behaviors

8 Our Contributions Propose a metric Intermittent Vulnerability Factor (IVF) to characterize the vulnerability to intermittent faults IVF definition: a structure’s IVF is the probability an intermittent fault in that structure causes an external visible error Present IVF computing algorithms for reorder buffer and register file Compute IVF with different fault configurations

9 Intermittent Fault Models Causes and mechanisms Fault models at the logic level CellSolder joint Inductive noise Electro- migration Crosstalk Soft breakdown Variation of metal R&C Fluctuation of leakage current Intermittent Stuck-at Intermittent short Intermittent open Intermittent pulse Intermittent delay Intermittent indetermination Manufacturing residues Timing violationsOxide breakdown MemoryBusesInterconnection lines, buses Power supply Intermittent contacts

10 Intermittent Stuck-at Faults Intermittent stuck-at faults Change the correct value intermittently to logic one or logic zero Vulnerable structures: storage structures such as memory and register file Key Parameters Burst length/active time/inactivity time Have adverse effect during the active time... burst length active time inactive time time

11 IVF Computing Determine whether an intermittent fault affects program execution or not Analyze ACE bit / critical time Set the three key parameters: burst length, active time, and inactive time Burst length: randomly generated from [10T, 30T] Duty cycle: 50% Start time: randomly generated Compute IVFs for reorder buffer and register file... burst length active time inactive time time

12 IVF Computing – Reorder buffer entry cycle Y Z ACE X bit ACE Bit Analysis Time An example of an intermittent fault Active time Inactive time Planar representation B1B1 B2B2 B3B3

13 IVF Computing – Register File register version n Allocation W R1R1 R2R2 R last Deallocation Time n+1n-1 critical time non- critical non- critical F1F1 F3F3 F2F2 … Critical Time Analysis

14 Experimental Setup Simulated processor configurations Execution-driven simulator Sim-Alpha Reorder buffer/register file 80/80 entries 4 integer ALUs, 2 integer multipliers, 2 float ALUs Hybrid, 4K global + 2-level 1K local + 4K choice branch predictor 64KB 2-way L1 data cache, 2MB direct mapped L2 cache Workload SPEC2000 integer benchmark suite Simulate 100M instructions with SimPoint

15 IVF vs AVF IVF varies significantly across benchmarks Longer burst length, higher IVF IVF is much higher than AVF Reorder Buffer

16 Different Fault Configurations Reorder Buffer IVF varies little across burst length configuration files IVF varies significantly for different active time

17 IVF at Entry Level IVF varies across different entries Architecture registers are more vulnerable Register File Architecture registers Renaming registers

18 Implications Quantitatively guide reliability design at early design stage and evaluate system reliability Harden partial structures/entries for high reliability while minimizing the overhead Razor [MICRO’03] Parshield [DSN’07] Easily extend to analyze other structures (issue queue, load/store queue, and cache)

19 Conclusions Propose a methodology to characterize the vulnerability of microprocessor structures to intermittent faults Compute IVF for reorder buffer and register file IVF varies significantly across inter- and intra- structures, motivating to protect the most vulnerable structures to improve system reliability

20 Thank You for Your Attention Question?

Download ppt "IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of."

Similar presentations

Ads by Google