Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.

Slides:

Advertisements

Similar presentations

CPU Structure and Function

Advertisements

Computer Architecture

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Lecture 6: Multicore Systems

Instruction-Level Parallelism (ILP)

Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.

Computer Organization and Architecture

CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.

Instruction Set Architecture & Design

Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.

Instruction Level Parallelism (ILP) Colin Stevens.

Chapter 12 Pipelining Strategies Performance Hazards.

Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign.

Chapter 2 Instruction-Level Parallelism and Its Exploitation

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Chapter 12 CPU Structure and Function. Example Register Organizations.

ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky.

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.

Slipstream Processors by Pujan Joshi1 Pujan Joshi May 6 th, 2008 Slipstream Processors Improving both Performance and Fault Tolerance.

More Basics of CPU Design Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.

Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.

1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.

Part.7.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman FAULT TOLERANT SYSTEMS Part 7 - Coding.

Dr. Rabie A. Ramadan Al-Azhar University Lecture 6

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.

Top Level View of Computer Function and Interconnection.

Data and Computer Communications by William Stallings Eighth Edition Digital Data Communications Techniques Digital Data Communications Techniques Click.

Synthesis Of Fault Tolerant Circuits For FSMs & RAMs Rajiv Garg Pradish Mathews Darren Zacher.

EEE440 Computer Architecture

Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.

Computer Architecture Lecture 32 Fasih ur Rehman.

Processor Architecture

Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.

Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.

Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja

Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.

CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.

Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.

My Coordinates Office EM G.27 contact time:

Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.

University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.

RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO

MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.

Simultaneous Multithreading

nZDC: A compiler technique for near-Zero silent Data Corruption

CS203 – Advanced Computer Architecture

/ Computer Architecture and Design

Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Computer Architecture & Operations I

Daya S Khudia, Griffin Wright and Scott Mahlke

Pipelining: Advanced ILP

Hwisoo So. , Moslem Didehban#, Yohan Ko

Computer Architecture Lecture 4 17th May, 2006

/ Computer Architecture and Design

How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.

Presentation transcript:

Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University

Lecture 4: Soft Errors Software Techniques

Outline □Soft Errors Recap □Process Technology and Packaging Solutions □Gate-level and Circuit-level Solutions □Microarchitectural Solutions □Single-core □Multi-threaded □Software Solutions □Multi Bit Upsets (MBUs) □Single Event Latchup

Razor □Originally proposed to tolerate process variations and achieve power reduction □Shadow latch clocked with a delayed clock □If difference in values latched, raise error □How to use it to detect soft errors?

Multi-issue Processors □Superscalar □Execute instructions from the same thread □Multi-threading □Execute instructions from the same threads in one cycle, but can switch between applications □Simultaneous Multithreading □Issue instructions from different threads in the same cycle SuperscalarMultithreadingSimultaneous Multithreading

SMT Solutions □SRT: Simultaneous Redundant Threading □Duplicate a thread, and run them on the same core as leading thread and trailing thread □Threads maintain their contexts, including the register file □Threads should not diverge when there are no faults □Memory interface □Only leading thread can read from the memory □Puts a copy in a LVQ – trailing thread reads from here □Leading thread writes to STB to write store values □Only trailing thread can write to the memory - after checking the value in the STB □Branch Interface □Leading thread writes branch outcomes in BOQ □Trailing thread has perfect branch prediction

SMT Solutions: PER □Trailing thread competes for resources – High ILP phases □STB fills up causing leading thread stalls □PER: Partial Explicit Redundancy □Leading thread uses all resources during high-ILP phases □SEM: Single Execution Mode □Trailing thread executes during low-ILP phases □REM: Redundant Execution Mode □In REM state, check all instructions □Need resume point for trailing thread □Maintain state (LVQ, STB, RF, etc…) □Proportional to slack size

SMT Solutions: IRTR □IR: Instruction Reuse □Do not execute an instruction, if it has already executed with the same inputs □Keep a reuse buffer □IRTR: Implicit Redundancy Through Reuse □Check with previous value for soft errors □If matches, continue and overwrite the value in buffer □If mis-match, raise flag □During high ILP regions

Outline □Soft Errors Recap □Process Technology and Packaging Solutions □Gate-level and Circuit-level Solutions □Microarchitectural Solutions □Single-core □Multi-threaded □Software Solutions □Multi Bit Upsets (MBUs) □Single Event Latchup

Watchdog Processor & Control Flow Checking □Watchdog processor □Simple processor, receives signals from the main processor □Checks to see if the signals are coming in correct order □S3 should not come after S1 □Watchdog program can be automatically generated □Formal techniques for correctness □Asynchronous communication of Main processor with watchdog processor Processor Memory Watchdog Processor BB1 BB2 BB3 Send S1 Send S2 Send S3

EDDI (Error Detection by Duplicated Instructions) □Duplicate instructions □Validation instructions □Store and branch are sync points □Check store and branch operands □Memory penalty □Load/store from duplicated locations

EDDI+CFCSS (Control Flow Checking by Software Signatures) □At the beginning of the node, perform G = G xor d □d2 = s1 xor s2, Then G = s1 xor (s1 xor s2) = s2 □If two source nodes jump to the same destination node, then the two source nodes should have the same signature

CFCSS + SWIFT (Software Implemented Fault Tolerance) □If two source nodes jump to the same destination node, then the two source nodes should have the same signature □Need another path-dependent D □B1 -> B5, D=0, Then G = s1 xor d5 xor 0 = s5 □B3 -> B5, D = s1 xor s3, Then G = s3 xor (s1 xor s5) xor (s1 xor s3) = s5

ED 4 I: Error Detection by Diverse Data and Duplicated Instructions The simplest way to detect Byzantine Faults is to run the same program on multiple processors and compare results. ED 4 I is Byzantine Fault detection for uniprocessors. Must take into account both temporary and and permanent faults. Re-executing with same inputs does not guard against permanent faults Overhead = 100%

Key Idea Lets feed into the program two different sets of data and then compare the results. Key Insight: If the program only uses arithmetic operations, we can alter the input by multiplying all input numbers by a constant. Then the modified output will be the (real output) * (the constant). Thus, you can verify that the two computations succeeded AND the two computations will be affected by errors differently.

New Program If we alter the input to the program, we must alter the program to work with this modified input. The transformation is given the constant k (called the “diversity factor”) and it creates the “k-factor diverse program”. The new program will have the same control flow graph as the old program but all the variables will be k-multiples of the of original ones.

Transformations If k ↔ <, ≥ ↔ ≤) All constants in code get multiplied by k. Addition and Subtraction of variables unchanged. Multiplication: v 1 *v 2 *....*v n → (v 1 *v 2 *....*v n )/k n-1 Division: v 1 /v 2 → (v 1 /v 2 )*k

Fault Detection & Data Integrity For functional unit h i (such as the adder), fault f and diversity factor k: X i = is the set of inputs to h i E i = subset of X containing the inputs that will result in erroneous output due to the fault. E' i = subset of E i that will escape detection C i (k) = Probability of catching an error in h i. D i (k) = Probability of missing no errors in h i.

Choosing the value of k For some functional units we can derive C i (k) and D i (k) analytically for each k. This is too hard in general so try out a range of k's empirically to determine C i (k) and D i (k). Bus Signal (12-bit) 12-bit carry look-ahead adder 12-bit Multipliers and Dividers

Analytical Computation of AVF □Iteration Space □L-dimensional integer vector space □L: levels of loop □Each point in IS represents an iteration □Data dependences exist □Fully ordered in time □Array Space □M-dimensional integer vector space □M: array dimension □Every point represents an element of the array for (i=0; i<N 1 ; i++) for (j=0; j<N 2 ; j++) a[i][j] = a[i][j-1]+ a[i-1][j] + a[i][j+1]

Analytical Computation of AVF □Access Function (AF) of a reference □Mapping from IS to AS □When are the elements of array accessed by a reference □References will access different parts of Array Space □Divide the Array Space into regions, in which every element is accessed by a subset of references □Array Interval (AI): Subset of AS that the reference accesses □Every element is accessed by the same set of references

Analytical Computation of AVF Iteration Intervals for an Array Interval □Each reference will access the elements of array interval at iterations given by AF (Access Function) □Iteration Interval (II) is AF in Array Interval □Formula of access time of each element in II □Vulnerability can be computed as a formula on II □Time from r/w  r □A reference either reads or writes (not both) □Need to time-order points in II □Break into Iteration Segments, which can be ordered □Strict order, or point-wise ordered

Outline □Soft Errors Recap □Process Technology and Packaging Solutions □Gate-level and Circuit-level Solutions □Microarchitectural Solutions □Single-core □Multi-threaded □Software Solutions □Multi Bit Upsets (MBUs) □Single Event Latchup

Multiple-bit Upsets (MBUs) □Error rate ~ 1/100 th of SEU □Hamming Code □1-bit error correction, 2-bit error detection □Reed Solomon Codes □RS(n,k) with s-bit symbols □s - Each symbol is s-bits □n – total number of bits per code, n = 2 s -1 □k – data bits □Number of parity bits = 2t = n-k □Can correct errors in ‘t’ symbols, where t = (n-k)/2 □RS(255, 223) with 8-bit symbols □Can correct 16 symbol errors in each codeword (255 bits) □Other multi-bit error detection and correction schemes □LDPC

Copyright 2005, M. Tahoori 25 Bit Read Bit has error protection Error is only detected (e.g., parity + no recovery) Error can be corrected (e.g, ECC) yes no Does bit matter? Silent Data Corruption (SDC) yes no Detected, but unrecoverable error (DUE) no error yes no benign fault no error benign fault no error Strike on state bit (e.g., in register file)

Interleaving bits □Interleaving converts □spatial multi-bit error  multiple single bit errors bits X X X X = covered with single ECC code = covered with different ECC code / / / 0 0 0

Two Separate Strikes on Different Bits Temporal Double Bit Errors □SECDED ECC (single error correction, double error detection) □could detect error, but cannot correct the error □if errors accumulate □single bit correctable error becomes a double bit detectable error Cycle 100 Cycle 1,000,000

Solutions for Temporal Double Bit Errors □Natural Effects □whenever a processor reads a cache block, we can correct the single bit error □check for errors when cache blocks are replaced from the cache □More Powerful ECC □SECDED ECC requires 8 bits per 64 bits □7 bits for single bit correction □8 th bit for double bit detection □Overhead = 13% □ECC with two bit correction requires 12 bits per 64 bits □Overhead = 19%

Scrubbing □Periodically read memory and correct all single bit errors □Disallows accumulation of temporal double bit errors □Standard technique in main memories (DRAMs)

Outline □Soft Errors Recap □Process Technology and Packaging Solutions □Gate-level and Circuit-level Solutions □Microarchitectural Solutions □Single-core □Multi-threaded □Software Solutions □Multi Bit Upsets (MBUs) □Single Event Latchup

Single Event Latchup □SEL: Single Event Latchup □Parasitic circuit elements forming a silicon controlled rectifier (SCR) □Potentially destructive □the device current may destroy the device if not current limited and removed "in time. □Removal of power to the device is required in all non- catastrophic SEL conditions in order to recover device operations. □SEL probability increases with temperature!