Daya S Khudia, Griffin Wright and Scott Mahlke

Slides:



Advertisements
Similar presentations
NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
Advertisements

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors Aviral Shrivastava, Abhishek Rhisheekesan, Reiley Jeyapaul, and Carole-Jean Wu.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
NC STATE UNIVERSITY ASPLOS-XII Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance Vimal Reddy Sailashri.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Shoestring: Probabilistic.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.
Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Slipstream Processors by Pujan Joshi1 Pujan Joshi May 6 th, 2008 Slipstream Processors Improving both Performance and Fault Tolerance.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
Understanding the Propagation of Hard Errors to Software and Implications for Resilient System Design M. Li, P. Ramachandra, S.K. Sahoo, S.V. Adve, V.S.
Eliminating Silent Data Corruptions caused by Soft-Errors Siva Hari, Sarita Adve, Helia Naeimi, Pradeep Ramachandran, University of Illinois at Urbana-Champaign,
Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar Presented by: Ashay Rane Published in: SIGARCH Computer Architecture.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Encore: Low-Cost,
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Transmeta’s New Processor Another way to design CPU By Wu Cheng
Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Low-cost Program-level Detectors for Reducing Silent Data Corruptions Siva Hari †, Sarita Adve †, and Helia Naeimi ‡ † University of Illinois at Urbana-Champaign,
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Efficient Soft Error.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
University of Michigan Electrical Engineering and Computer Science Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES Ganesh Dasika 1,
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Ioannis E. Venetis Department of Computer Engineering and Informatics
Microarchitecture.
Computer Structure Multi-Threading
nZDC: A compiler technique for near-Zero silent Data Corruption
5.2 Eleven Advanced Optimizations of Cache Performance
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
UnSync: A Soft Error Resilient Redundant Multicore Architecture
DynaMOS: Dynamic Schedule Migration for Heterogeneous Cores
Defending against malicious hardware
Hwisoo So. , Moslem Didehban#, Yohan Ko
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Soft Error Detection for Iterative Applications Using Offline Training
15-740/ Computer Architecture Lecture 5: Precise Exceptions
NEMESIS: A Software Approach for Computing in Presence of Soft Errors
Henk Corporaal TUEindhoven 2011
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
InCheck: An In-application Recovery Scheme for Soft Errors
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
Sampoorani, Sivakumar and Joshua
Software Techniques for Soft Error Resilience
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Presentation transcript:

Reducing the Cost of Protection against Soft Errors using Profile-Based Analysis Daya S Khudia, Griffin Wright and Scott Mahlke Computer Science and Engineering University of Michigan Ann Arbor

Soft error rate (SER) One failure per DAY per chip Past Present Future Aggressive voltage scaling (near-threshold computing) At high error rates, mainstream systems can experience unacceptable soft errors There is a need for mainstream solutions One failure per DAY per 100 chips One failure per MONTH per 100 chips 1 FIT means that the mean time before an error occurs is a billion device hours. IBM targets 114 FITs. Which would require a mean time before an SEU causes an undetected error of roughly 1000 years. [Feng’10, Shivakumar’02]

Traditional Architectural Solutions n-Modular Redundancy Increasing Fault Coverage Redundant Multi-threading Invariant Checking Instruction Duplication Performance Overhead and Fault Coverage Target Shoestring Symptom-based Traditional dual/triple – modular redundancy Run on separate hardware and compare results Mission-critical reliability w/ high hardware costs [IBM Z-series, HP NonStop] Redundant execution in a single-threaded context compiler interleaves original and redundant instructions “tunable” coverage [SWIFT, EDDI] Utilize multiple threads (temporal) instead of separate hardware (spatial) Retain high coverage but sacrifice performance costs to save area [AR-SMT, Reunion] Bridge the gap between symptom-based schemes and instruction duplication “reliability for the masses” sacrifice a little on coverage to maintain very low costs Relies on anomalous behavior to identify faults extremely cheap decent coverage [RESTORE, SWAT] Perform selective checking software invariants critical μarch structures [ARGUS, DIVA, Reddy:DSN`08] Spatial or temporal redundant execution has long been a cornerstone for detecting soft errors, with hardware DMR (dual-modular redundancy) and TMR (triplemodular redundancy) being the solutions of choice for missioncritical systems. The basic idea was to use the processor’s extra SMT contexts to run two copies of the same thread, a leading thread and a trailing thread. The leading thread places its results in a buffer, and the trailing thread verifies these results and commits the executed instructions. DIVA is less expansive alternative to full duplication utilizing a small checker core to monitor computations performed by a larger core. Reddy: Simple checks on uarch structures to ensure they are working correctly. Increasing Overheads (area, power, performance, etc.)

Fault coverage from different sources of masking (75% to 92%) Proposed Solution Increasing Fault Coverage Target solution Provide affordable reliability for commodity systems on a cheap budget Exploit cheap symptom-based fault detection Judiciously apply sw-level instruction duplication to improve coverage Cache misses Branch mispredicts Hardware exceptions Fault coverage from different sources of masking (75% to 92%) Target solution Symptom-based detection Instruction duplication-based detection Increasing Overheads (performance)

Contributions Exploit synergy between symptoms and SW-duplication A software solution to generate intelligent code to detect soft errors No user annotations are required Profile based intelligence in the analysis Memory profiling and Edge profiling Some instructions are more important than others Value profiling for generating more software symptoms Exploit statistical invariance

Outline Background System level overview Intelligent duplication Profiling Techniques Experimental evaluation Conclusions and future work

System Overview Compilation Runtime Instrument and get profile data for edge-profiling memory profiling for alias analysis value profiling Profiling Compilation Intelligent Duplication Analyze program structure Load profile data and perform selective duplication Operating System Physical Hardware Trigger Lightweight recovery based on selective symptoms (hardware exceptions) comparison fail in duplicated code Runtime One such solution is safetynet.

Compilation Code analysis and intelligent duplication Application binary Application source code intermediate representation (IR) analyses and optimizations Code analysis and intelligent duplication (IR to IR) Code generation Classification Analysis Duplication Code analysis and intelligent duplication can be used with code written in various languages Is independent of target machine For our Low-cost solution, we target an ARM backend

Baseline Classification and Analysis High value instructions: Likely to produce corrupted output if they consume corrupted input Symptom-generating instructions: Likely to produce symptom if they consume corrupted inputs Safe Instructions: Normally covered by symptom generating instructions Safe addr = addr1 + 8 ld1 Symp-Gen ld1 = load addr op1 op1 = ld1 + 1 EDDI => 2002 SWIFT => 2005 --- High Value call printf (---, op1, ---) Refs: Shoestring, EDDI , SWIFT

Instruction Duplication Recursively duplicate instructions starting from operands of high-value instructions Stop if Already duplicated Safe No more producers --- --- High value original instrs == duplicated cmps and branches Symptom generating Recovery or continue execution Safe

Baseline Coverage and Overhead Goal: Reduce overhead without affecting fault coverage 90% of fault coverage at 40.50% overhead

Profile Based Duplication Memory Profiling Silent stores Need not be protected because expected to write the same value again Get load/store alias information Edge Profiling Do not protect an infrequently executed instruction by duplicating frequently executing instructions Value Profiling Use the statistical invariance to generate more software symptoms By incorporating dynamic behavior, perform intelligent duplication

Refined Duplication Process --- --- Dst1 st1 Dst1 = Dincr + 1 dt1 = incr + 1 lib calls original instrs cmp duplicated Store dt1, addr T cmps and branches br Dependence through memory F ld1 Trigger recovery ld1 = load addr Consider lib calls Dependence through memory Memory is protected by other means Dop1 Dop1 = ld1 + 1 Control flow Data flow op1 op1 = ld1 + 1 cmp Only consider lib calls --- T F br call printf (---, op1, ---) Trigger recovery

Memory Profiling: Silent Stores A store is silent if it writes the already existing value at memory location A large fraction of silent stores exist and can be exploited for intelligent code duplication

Exploiting Silent Stores --- --- Dst1 st1 Dst1 = Dincr + 1 dt1 = incr + 1 silent store original instrs Silent stores are expected to write the same value again soon cmp duplicated store dt1, addr T cmps and branches br F ld1 Trigger recovery ld1 = load addr Dop1 Dop1 = ld1 + 1 As long as the last store is not wrong, this optimization wouldn’t cause any harm. Control flow Data flow op1 op1 = ld1 + 1 cmp --- T F br call printf (---, op1, ---) Trigger recovery

Edge Profiling: Recursive Duplication Break Do not protect a non frequently executed instruction by duplicating a frequently executed instruction BB0: Control flow Data flow 2000 20 BB1: BB2: 10 10 BB3: BB4:

Value Profiling Most of the times, result is 0 Most of the times, result is 71 add xor What are the most frequently value produced by an instruction? If a value is produced more than 99% of the times, use that value for symptom generation

Creating Software Symptoms If in a chain of instruction, last instruction produces the same value Insert the comparison with the frequently generated value --- --- op Generates the same value very frequently

Software Symptom Generation original instrs duplicated cmps and branches --- --- --- --- Produces 0 more than 99% cmp Dop2 op2 Dop2 = Dop3 * Dop4 op2 = op3 * op4 br F Trigger recovery T Dop1 More efficient to check for symptoms. Dop1 = Dop2 + 1 op1 op1 = op2 + 1 cmp --- T call printf (---, op1, ---) F br Trigger recovery

Evaluation Methodology Program analysis, Profiling and intelligent duplication Implemented as compiler pass in the LLVM compiler Input sensitivity of profiling train input of SPECINT2K for training ref input for the actual fault injection runs Statistical fault injection (SFI) experiments GEM5 simulator in ARM syscall emulation mode Random (single) bit flip in physical register file Simulated entire benchmarks after fault injection Log files analyzed for results classification Symptoms Silent stores profiling implemented as an extension of Loop Aware Memory Profiling (LAMP) Value Profiling implemented as instrumentation pass in LLVM are unusual behavior. With value profiling, we are providing a mechanism for generating symptoms.

Fault Injection Outcome Classification Masked No corruption in the program output SWDetects Detected by duplication Covered by symptoms Produces a symptom such as page fault in 1000 cycles of fault injection Failures Fail status on program termination or program did not terminate in 20 Hrs. SDCs (Silent Data Corruptions) Fault injections which results in user visible corruptions SDC are the most catastrophic types of error and will affect the system’s data integ

Performance Overhead Over 40% reduction in overhead

Fault Coverage

Conclusions Proposed software-only low overhead fault detection technique yields intelligent and efficiently duplicated code Profile based analysis Silent stores need not be protected Don’t protect infrequently executed instructions by duplicating frequently executed instructions Value profiling can be used to generate software symptoms Intelligent duplication yields Over 40% reduction in overhead Statistical insignificant difference in fault coverage

Future Work Use of out-of-order model for fault injection campaign More microarchitectural injection sites than just register file TLB, LSQ, ROB etc. Selective usage of control flow signatures Protect the control flow edges which are vulnerable to faults Analyze the faults that corrupt program output Why weren’t these detected? Is there a way to detect these? Alternatives to instruction duplication Judicious use of software based symptoms

Thank You!