Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting HW+SW Partitioning for Reliable Embedded Systems Part 2.

Similar presentations


Presentation on theme: "Exploiting HW+SW Partitioning for Reliable Embedded Systems Part 2."— Presentation transcript:

1 vargas@computer.org Exploiting HW+SW Partitioning for Reliable Embedded Systems Part 2

2 vargas@computer.org Summary 1.Introduction: targeting the problem 2.The Possible Solution 2.1. SW-Based Fault Detection Mechanisms 2.2. Migrating SW-Based Fault Detection Mechanisms into HW 3.Experimental Evaluation 4.Final Considerations

3 vargas@computer.org 1. Introduction: targeting the problem The increasing # of computer-based critical applications rises questions about the techniques for guaranteeing sufficient degrees of reliability and to keep reasonable costs for design and manufacturing. ?

4 vargas@computer.org ? Techniques commonly used (on-chip and system level): stand-alone solutions Fault-Tolerance Techniques (HW, SW, Time or Info domains) Duplication/Voter, TMR Layout-Driven Fault Avoidance Watch-Dogs Consistency Checks Capability Checks Re-computation EDAC 1. Introduction: targeting the problem

5 vargas@computer.org Duplication/Voter, TMR Layout-Driven Fault Avoidance Watch-Dog Timer ?? Techniques commonly used (on-chip and system level): stand-alone solutions Fault-Tolerance Techniques (HW, SW, Time or Info domains) Consistency Checks Capability Checks Re-computation EDAC  Impacts design: performance, weight, size/volume, power consumption, reliability.  Impacts design: performance, weight, size/volume, power consumption, reliability. 1. Introduction: targeting the problem

6 vargas@computer.org Duplication/Voter, TMR Layout-Driven Fault Avoidance Watch-Dog Timer ? Techniques commonly used (on-chip and system level): stand-alone solutions Fault-Tolerance Techniques (HW, SW, Time or Info domains) Consistency Checks Capability Checks Re-computation EDAC  Impacts design: performance, weight, size/volume, power consumption, reliability.  Impacts design: performance, weight, size/volume, power consumption, reliability. 1. Introduction: targeting the problem

7 vargas@computer.org HW Techniques: Disadvantages: High area overhead High development/fab cost SW Techniques: Disadvantages: Significant performance degradation Memory overhead 1. Introduction: targeting the problem

8 vargas@computer.org Development of a hybrid methodology (HW+SW redundancies) able to perform runtime detection of errors in μprocessor-based SoCs may have very good cost X benefit returns. 2. The Possible Solution

9 vargas@computer.org Returns: Minimization of area overhead and fab/development costs (benefits of SW-based redundancy techniques) Improvement of performance and minimization of memory overhead (benefits of HW-based redundancy techniques) In summary: Minimize fab cost and performance degradation, while improving reliability Target faults: Control flow errors Data handling errors 2. The Possible Solution

10 vargas@computer.org Hybrid methodology (HW+SW redundancies) explores: I-IP Core Architecture Software-Based Techniques 2. The Possible Solution

11 vargas@computer.org HW+SW SoC FT Architecture:  P IP Memory IP Custom IP I/O port WDT I-IP bus SoC Mismatchsignal Computes run-time and stores control flow signatures and data read from memory Stores a hardened program Information flow traveling on the bus Information flow traveling on the bus 2. The Possible Solution

12 vargas@computer.org  Faults Affecting Data:  Cerberus  Cerberus (Matteo et al.)  Faults Affecting Control:  ECCA  ECCA (Matteo et al.)  CFCSS  CFCSS (McCluskey et al.)  ECI  ECI (Miremadi et al.) 2. The Possible Solution SW-Based Fault Detection Mechanisms

13 vargas@computer.org Original Code:Modified Code: a = b;a0 = b0; a1 = b1; if(b0 != b1) error(); a = b + c;a0 = b0 + c0; a1 = b1 + c1; if (b0 != b1) || (c0 != c1) error(); Code modification for errors affecting data.  Faults Affecting Data: Cerberus (Matteo et al.) 2. The Possible Solution SW-Based Fault Detection Mechanisms

14 vargas@computer.org 2. The Possible Solution Original Code:Modified Code: res = search(a);search(a0, a1, &res0, &res1);… int search(int p)void search(int p0, int p1, int *r0, int *r1) {int q;{int q0, q1;… q = p + 1;q0 = p0 + 1; …q1 = p1 + 1; return(1);if(p0 != p1) } error(); … *r0 = 1; *r1 = 1; return; } Code transformation for errors affecting procedure parameters.  Faults Affecting Data: Cerberus (Matteo et al.) SW-Based Fault Detection Mechanisms

15 vargas@computer.org 2. The Possible Solution Original Code:Modified Code: /* Basic Block beginning *//* Basic Block beginning #371 */ …ecf = 371; /* Basic Block end */… if (ecf != 371) error (); /* Basic Block end */ Example of detection of errors affecting not allowed branches  Faults Affecting Control: ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.) SW-Based Fault Detection Mechanisms

16 vargas@computer.org 2. The Possible Solution Original Code:Modified Code: If (condition)If (condition) {/* Block A */{/* Block A */ …if (!condition) }error(); else… {/* Block B */} …else }{/* Block B */ if (condition) error(); … } Code transformation for a test statement SW-Based Fault Detection Mechanisms  Faults Affecting Control: ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)

17 vargas@computer.org 2. The Possible Solution In summary To harden a given program this approach defines the following assertions introduced into each basic block v j : Test Assertion: it controls the signature of basic block v j checking if v i belongs to pred(v j ). Set Assertion: updates the signature setting it to the value B j associated to v j. B j = (B i  M1)  M2 SW-Based Fault Detection Mechanisms  Faults Affecting Control: ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)

18 vargas@computer.org 2. The Possible Solution 01: while(k1<DIM) 02: { 03: if(  != M1 &&  != M2 ) 04: //Error detected 05: A1 = matrixA1[i1][k1]; 06: B1 = matrixB1[k1][j1]; 07: C1 += A1*B1; 08: matrixC1[i1][j1] = C1; 09: k1++; 10:  j =(  i ^M1)^M2; 11: } SW-Based Fault Detection Mechanisms  Faults Affecting Control: ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)

19 vargas@computer.org Principle: Modification of a Basic Block  Faults Affecting Control:  CFCSS (McCluskey et al.) 2. The Possible Solution SW-Based Fault Detection Mechanisms

20 vargas@computer.org 2. The Possible Solution  Faults Affecting Control:  CFCSS (McCluskey et al.) Basically, the approach consists of six steps: Dividebasic blocks 1) Divide the program into basic blocks. A basic block is a minimal set of ordered instructions in which its execution begins from the first instruction and terminates at the last instruction. There is no branching instruction in a basic block except possibly for the last one. A basic block terminates at either an instruction branching to another basic block or an instruction receiving transfer of control flow (CF) from two or more places in the program. Notations: (a) V = {v i : i = 1, 2,…, n}: set of vertices denoting basic blocks. (b) E: set of edges denoting possible CF between basic blocks. Constructgraph 2) Construct a graph for the program according to the instructions flow (each node represents a basic block). Note that a program can be represented by a program- graph, P, where br i,j are not necessarily explicit branch instructions; they also represent fall-through execution paths, jumps, subroutine calls, and returns. Fig. 2.5 is an example. Notation: P: Program Graph {V, E}. Arbitrarily assign signatureeach node 3) Arbitrarily assign a signature for each node (compilation time). Computesignaturedifference 4) Compute the signature difference between the source and the destiny blocks. Computenew signatureeach node 5) Compute the new signature for each node (execution time). Comparesignatures 6) Compare both signatures. SW-Based Fault Detection Mechanisms

21 vargas@computer.org 2. The Possible Solution  Faults Affecting Control:  CFCSS (McCluskey et al.) Sequence of instructions and its graph. Detection of illegal branch. General Form f = f(G, d i ) = G XOR d i G 2 = f(G 1, d 2 ) = G 1 XOR d 2 = s 1 XOR (s 1 XOR s 2 ) = s 2 G 4 = f(G 1, d 4 ) = G 1 XOR d 4 = G 1 XOR (s 3 XOR s 4 ) = s 1 XOR s 3 XOR s 4 ≠ s 4 SW-Based Fault Detection Mechanisms

22 vargas@computer.org 2. The Possible Solution  Faults Affecting Control:  CFCSS (McCluskey et al.) Detection of an illegal branch: a numerical example SW-Based Fault Detection Mechanisms

23 vargas@computer.org 2. The Possible Solution  Faults Affecting Control:  CFCSS (McCluskey et al.) Node v 1 and node v 3 have the same signatures: Branch Fan-in Nodes SW-Based Fault Detection Mechanisms

24 vargas@computer.org 2. The Possible Solution  Faults Affecting Control:  CFCSS (McCluskey et al.) Node v 1 and node v 3 have different signatures: Adjusting Signature D SW-Based Fault Detection Mechanisms

25 vargas@computer.org 2. The Possible Solution  Faults Affecting Control:  CFCSS (McCluskey et al.) Node v 1 and node v 3 have different signatures: Adjusting Signature D SW-Based Fault Detection Mechanisms G5 = f(G1, d5, D1) = G1 XOR d5 XOR D1 = s1 XOR (s1 XOR s5) EXOR “000” = s5 G5 = f(G3, d5, D3) = G3 XOR d5 XOR D3 = s3 XOR (s1 XOR s5) EXOR “s1 EXOR s3” = s5

26 vargas@computer.org 2. The Possible Solution  Faults Affecting Control:  ECI (Miremadi et al.) Insertion of trap instructions in the program area, in the data area, and in the unused area of the memory. The ECIs are inserted in the main memory locations that are not used by the CPU during normal execution. Thus, the execution of an ECI is a indication that a control flow error has occurred. The task of an ECI is to initiate a recovery process. SW-Based Fault Detection Mechanisms

27 vargas@computer.org WDT / I-IP WDT / I-IP works in symbiosis with the processor which is not modified. WDT / I-IP WDT / I-IP continuously spies the information execution flow on the bus (which is computed to test and update signatures). WDT / I-IP If a mismatch is detected, WDT / I-IP outputs a mismatch signal. 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW

28 vargas@computer.org 01: while(k1<DIM) 02: { 03: IIPtest( BB1 ); 04: IIPtest( BB2 ); 05: A1 = matrixA1[i1][k1]; 06: B1 = matrixB1[k1][j1]; 07: C1 += A1*B1; 08: matrixC1[i1][j1] = C1; 09: k1++; 10: IIPset( BB2  ); 11: } 2. The Possible Solution Peace of code for control-flow faults detection (ECCA Partitioning): Migrating SW-Based Fault Detection Mechanism into HW 03: if(  != M1 &&  != M2 ) 04: //Error detected 10:  j =(  i ^M1)^M2;

29 vargas@computer.org WDT / I-IP Architecture: Three modules: - bus interface logic - consistency check logic - CAM memory Bus Interface Logic Consistency Check Logic bus MismatchSignal WDT / I-IP adx, data Compares flow signatures Detects signatures passing on the bus 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW CAM Memory Stores flow signatures

30 vargas@computer.org Clk Reset Instruction_in Ram_data_in Ram_address_in WDT / I-IP Modulo 1 Bus Interface Logic Clk Reset Instrucion_in Ram_data_in Ram_address_in Data_memory_in Data_memory_out Adr_memory_out Ctrl_rw_out En_compare_out Data_1_out Data_2_out Modulo 2 CAM Memory Clk Reset Data_memory_out Data_memory_in Adr_memory_in Ctrl_rw_in Modulo 3 Consistency Check Logic Clk reset En_compare_out Data_1_out Data_2_out Mismatch Signal 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW WDT / I-IP Architecture:

31 vargas@computer.org Consider now that the µprocessor-based SoC runs under an Operating System … 2. The Possible Solution The application code is only a fragment of the total time allocated during system operation! Migrating SW-Based Fault Detection Mechanism into HW ?

32 vargas@computer.org 2. The Possible Solution Critical applications need operating systems (OS) which guarantee a correct and safe behavior despite the occurrence of errors. Faults can affect OS calls as well as the OS kernel: How does the system react in front of invalid or corrupted values operated by the kernel? Migrating SW-Based Fault Detection Mechanism into HW

33 vargas@computer.org µProcessor WDT / I-IP Application Address + Data Bus Status Register SoC Memory ( Operating System ) Driver HW-SW Partitioning for Fault-Detection in Complex Systems 2. The Possible Solution Memory (Application Code + Data) Error Indication Migrating SW-Based Fault Detection Mechanism into HW

34 vargas@computer.org µProcessor WDT / I-IP Application Address + Data Bus Status Register SoC Memory ( Operating System ) Driver HW-SW Partitioning for Fault-Detection in Complex Systems DragonBall, ARM, Pentium, 8086, 68K ProgrammableLogic SW Part HW Part SW Part 2. The Possible Solution Memory (Application Code + Data) Error Indication µCLinux, µCOS-II SW Part Com Channel Migrating SW-Based Fault Detection Mechanism into HW

35 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW MC68VZ328 Block Diagram CGM&Power Control Real-Time Clock In-Circuit Emulation Interrupt Controller Memory Controller Bootstrap Mode 8/16-Bit 68000 Bus Interface FLX6800 Static CPU 16-Bit Timers(2) 8-Bit PWM1 16-Bit PWM2 SPI 1 UART 2 IrDA1.0 UART 1 IrDA1.0 SPI 2 LCD Controller GPIO Ports 68000 Internal Bus Special Function Pins (CPU Space) Status Information

36 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Status Information

37 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Special Function Pins (CPU Space): FC2, FC1, FC0 Function Code Output Processor Cycle Type FC2FC1FC0 000Undefined, reserved 001User Data 010User Program 011Undefined, reserved 100 101Supervisor Data 110Supervisor Program 111CPU space (interrupt acknowledge) Status Information 68000 Die

38 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW 68010 – 68030 Dies A 16 - A 19 Pins Status Information FC2 = FC1 = FC0 = 1 indicate CPU operations other than interrupt acknowledge cycles (e.g. co-processor communications). Then, different CPU spaces are indicated in A16 - A19 pins, if properly decoded.

39 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Interrupt Control Pins: IPL2, IPL1, IPL0 Interrupt Processor Level Processor Cycle Type IPL2IPL1IPL0 000Lowest priority 001|||||||||||||||||| 010 011 100 101 110 111Highest priority Status Information 68000 Die

40 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Event-Ticking Pins – ETPs: PM0, PM1 Status Information Event-Ticking Pins – ETP associated with Model Specific Registers – MSR to monitor: # cache memory misses, # committed instructions, # interruptions executed, # taken branches,... Model Specific Registers – MSRs: Counters CRT0 and CRT1 programmed through the Control and Events Selector Register - CESR Pentium Die

41 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Status Information Instructions used to program counters CRT0 and CRT1 through the Control and Events Selector Register – CESR: WRMSR RDMSR The RDMSR instruction may be executed in all CPLs (Current Privileged Level), but the WRMSR instruction may only be executed in CPL0.

42 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Event-Ticking Pins – ETPs: d_i, s_u Status Information DragonBall Core If “0”: data; If “1”: instruction; If “z”: undefined. If “0”: supervisor mode; If “1”: user mode; If “z”: undefined. These pins were added to the processor core to serve as interface with the I-IP (watch-dog).

43 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Event-Ticking Pins – ETPs: d_i, s_u Status Information

44 vargas@computer.org 2. The Possible Solution µCOS-IIOS error detection coverage has been measured and observations about OS critical data structures to be improved have been commented, in order to improve the final robustness of the µCOS-II operating system. Juan Pardo, 2004 Fault Tolerant Systems Group Polytechnic University of Valencia Spain Migrating SW-Based Fault Detection Mechanism into HW

45 vargas@computer.org 2. The Possible Solution µC/OS-II Operating System Selection came motivated from the perspective that it is a system widely used in particular for embedded applications since several years ago. First Version µC/OS 1992 Industrial robots, motor control, medical instruments, etc. It is 99% compliant with the Motor Industry Software Reliability Association (MISRA) C Coding Standards. All Modified Condition Decision Coverage (MCDC) code in µC/OS-II has been removed, improving code quality for RTCA / EUROCAE DO-178B Level A-certified environments for avionics applications. Migrating SW-Based Fault Detection Mechanism into HW

46 vargas@computer.org 2. The Possible Solution µC/OS-II: Characteristics Portable: uC/OS-II is written in highly portable ANSI C, with target microprocessor-specific code written in assembly language. ROMable: was designed for embedded applications. This means that if you have the proper tool chain (i.e., C compiler, assembler, and linker/locator), you can embed uC/OS-II as part of a product. Scalable: it’s possible to use only the services needed in the application. This allows to reduce the amount of memory (both RAM and ROM) needed. Scalability is accomplished with the use of conditional compilation (full version: 8KB). Preemptive: uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II always runs the highest priority task that is ready. Multitasking: uC/OS-II can manage up to 64 tasks (Current version of the software reserves 8 of these tasks for system use. This leaves for application up to 56 tasks. Each task has a unique priority assigned to it, which means that uC/OS-II cannot do round-robin scheduling.) Migrating SW-Based Fault Detection Mechanism into HW

47 vargas@computer.org µC/OS-II: Characteristics Deterministic: Execution time of all uC/OS-II functions and services are deterministic. You can always know how much time uC/OS-II will take to execute a function or a service. Further­more execution time of all uC/OS-II services do not depend on the number of tasks running in your application. Task Stacks: Each task requires its own stack (uC/OS-II allows each task to have a different stack size. This allows to reduce the amount of RAM needed for application). Services: system services such as mailboxes, queues, semaphores, fixed-sized memory partitions, time-related functions, etc. Interrupt Management: Interrupts can suspend the execution of a task. If a higher priority task is awakened as a result of the interrupt, the highest priority task will run as soon as all nested interrupts complete. Interrupts can be nested up to 255 levels deep. Robust and Reliable: uC/OS-II is based on uC/OS, which has been used in hundreds of commercial applications since 1992. 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW

48 vargas@computer.org Workload Design Characteristics: maximum system calls consume Worst case application: maximum system calls consume. Synchronization SemaphoresMemoryQueues MessagesTasksHandlingTiming Management System calls: Synchronization, Semaphores, Memory, Queues, Messages, Tasks Handling, Timing Management, etc. 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW

49 vargas@computer.org The system workload is continuously running and consists of a series of tasks executing the application. The system workload is continuously running and consists of a series of tasks executing the application. Consistency checks are added to the application code and kernel to detect faults and invalid values at the kernel calls in order to improve system robustness. Consistency checks are added to the application code and kernel to detect faults and invalid values at the kernel calls in order to improve system robustness. monitor The WDT / I-IP is the monitor. Workload Design 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Addition of Consistency Checks

50 vargas@computer.org void RandomNumberTask(void *pdata) { // Declare as auto to ensure reentrancy. auto OS_TCB data; auto INT8U err; auto INT16U RNum; OSTaskQuery(OS_PRIO_SELF, &data); while(1){ // Rand is not reentrant, so access must be controlled // via a semaphore. OSSemPend(RandomSem, 0, &err); RNum = (int)(rand() * 100); OSSemPost(RandomSem); printf("Task%02d's random #: %d\n",data.OSTCBPrio,RNum); // Wait 3 seconds in order to view output from each task. OSTimeDlySec(3);}} 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW // 1. Define necessary configuration constants for uC/OS-II #define OS_MAX_EVENTS 2 #define OS_MAX_TASKS 20 #define OS_MAX_QS 0 #define OS_Q_EN 0 #define OS_MBOX_EN 0 #define OS_TICKS_PER_SEC 32 // 2. Define necessary stack configuration constants #define STACK_CNT_512 1 // initial program stack #define STACK_CNT_1K OS_MAX_TASKS // task stacks // 3. This ensures that the above definitions are used #use "ucos2.lib“ void RandomNumberTask(void *pdata); // Declare semaphore global so all tasks have access OS_EVENT* RandomSem; void main(){ int i; // Initialize OS internals OSInit(); for(i = 0; i < OS_MAX_TASKS; i++){ // Create each of the system tasks OSTaskCreate(RandomNumberTask, NULL, 1024, i); } // semaphore to control access to random number generator RandomSem = OSSemCreate(1); // 4. Set number of system ticks per second OSSetTicksPerSec(OS_TICKS_PER_SEC); // Begin multi-tasking OSStart();} OS Call (task waits for signal) OS Call (task sends a signal) Initializing Tasks Starting Tasks Workload Design

51 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Workload Design OS_ENTER_CRITICAL /*Code implemented for GNU-GAS*/ asm (" move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” move.b #11, %a0 | Move the byte “11” to the address “a0” "); … asm (" move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” move.b #00, %a0 | Move the byte “00” to the address “a0” "); OS_EXIT_CRITICAL Set an indication for the instant when the processor gets into the supervisor mode “OS_ENTER_CRITICAL” and when when it leaves this mode: “OS_EXIT_CRITICAL”. The signaling is done by writing to a specific memory address.

52 vargas@computer.org 2. The Possible Solution Migrating SW-Based Fault Detection Mechanism into HW Workload Design /************************************************************* * PEND ON SEMAPHORE ************************************************************ */ UBYTE OSSemPend(OS_SEM *psem, UWORD timeout) { UBYTE x, y, bitx, bity; OS_ENTER_CRITICAL(); /*Code implemented for GNU-GAS*/ /*Code implemented for GNU-GAS*/ asm (" asm (" move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” move.b #4, %a0 | Move the byte “4” to the address “a0” move.b #4, %a0 | Move the byte “4” to the address “a0” "); ");/*End*/ if (psem->OSSemCnt-- > 0) { OS_EXIT_CRITICAL(); return (OS_NO_ERR);} else { OSTCBCur->OSTCBStat |= OS_STAT_SEM; OSTCBCur->OSTCBDly = timeout; y = OSTCBCur->OSTCBPrio >> 3; x = OSTCBCur->OSTCBPrio & 0x07; bity = OSMapTbl[y]; bitx = OSMapTbl[x]; Systems Calls performed by Pend and Post through Semaphore, Mailbox and QUEUE if ((OSRdyTbl[y] &= ~bitx) == 0) OSRdyGrp &= ~bity; psem->OSSemTbl[y] |= bitx; psem->OSSemGrp |= bity; OS_EXIT_CRITICAL(); OSSched(); OS_ENTER_CRITICAL(); if (OSTCBCur->OSTCBStat & OS_STAT_SEM) { if ((psem->OSSemTbl[y] &= ~bitx) == 0) { psem->OSSemGrp &= ~bity; } OSTCBCur->OSTCBStat = OS_STAT_RDY; OS_EXIT_CRITICAL(); return (OS_TIMEOUT); } else { OS_EXIT_CRITICAL(); return (OS_NO_ERR); } Consistency Check

53 vargas@computer.org Matteo Sonza Reorda, 2002-05 Fault Tolerant Systems Group Politecnico di Torino 3. Experimental Evaluation An Intel 8051-based SoC was inspected. PANDORA I-IP: VHDL (~1500 lines).

54 vargas@computer.org 3. Experimental Evaluation Fault detection capabilities evaluated via HW-based fault injection experiments (FPGA environment). Four benchmarks considered: –Matrix multiplication, Elliptical Filter, FIR Filter and Viterbi Algorithm.

55 vargas@computer.org 3. Experimental Evaluation Detection capabilities: Transient faults (30,000 bit-flips) Number of wrong answers evaluated ( escape detection ). Matrix 9.780.180.994.88 Ellipf 20.8302.3814.29 FIR 5.6402.124.49 Viterbi 21.064.896.3317.48 CFCSS [%][%] ProgramPlain [%] Pandora [%][%] ECCA [%][%] Orig. SWIP (HW+SW)SW Sol.

56 vargas@computer.org 3. Experimental Evaluation Memory overhead: Additional code lines required to implement the hybrid technique. Orig. SWIP (HW+SW)SW Sol.

57 vargas@computer.org 3. Experimental Evaluation Execution time overhead: Orig. SWIP (HW+SW)SW Sol.

58 vargas@computer.org 3. Experimental Evaluation Area overhead: PANDORA size  992 gates 8051 size  30480 gates PANDORA introduces about 3.2% of area overhead Area overhead is expected to decrease when processor size increases.

59 vargas@computer.org 4. Final Considerations Development of a hybrid methodology (HW+SW redundancies) able to perform runtime detection of errors in μprocessor-based SoCs may have very good cost X benefit returns.

60 vargas@computer.org Returns: Minimization of area overhead and fab/development costs (benefits of SW-based redundancy techniques) Improvement of performance and minimization of memory overhead (benefits of HW-based redundancy techniques) In summary: Minimize fab cost and performance degradation, while improving reliability Target faults: Control flow errors Data handling errors 4. Final Considerations

61 vargas@computer.org A hybrid methodology (HW+SW redundancies) explores: I-IP Core Architecture Software-Based Techniques 4. Final Considerations

62 vargas@computer.org 4. Final Considerations  System architecture co-implemented in HW+SW to detect faults in control-flow and application data. The main characteristics of this architecture: SW-embedded structures at the application code level. Partial migration of the SW-embedded structures into HW: specific I-IIP monitors application processor such as a “watch-dog”. Communication channel between the HW+SW entities: driver embedded in the OS Kernel and specific signals used to communicate the I-IP with the application processor.


Download ppt "Exploiting HW+SW Partitioning for Reliable Embedded Systems Part 2."

Similar presentations


Ads by Google