Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 11 Test and Fault Tolerance Ingo Sander

Similar presentations


Presentation on theme: "Lecture 11 Test and Fault Tolerance Ingo Sander"— Presentation transcript:

1 Lecture 11 Test and Fault Tolerance Ingo Sander ingo@kth.se

2 Test of Embedded Systems

3 IL2206 Embedded Systems3 Program design and analysis Verification costs are a significant part of the overall design costs For large design the share of the verification costs can be up to 50% of the total design costs Simulation and Test are the predominating verification method in industry … but there is a large interest from industry to incorporate formal methods into the verification flow October 18, 2015

4 IL2206 Embedded Systems4 Goals Make sure software works as intended. We will concentrate on functional testing--- performance testing is harder. What tests are required to adequately test the program? What is “adequate”? It is almost never practically possible to test the full software, since a program is so complex © 2000 Wolf (Morgan Kaufman) October 18, 2015

5 IL2206 Embedded Systems5 Test Environment Provide the program with inputs Execute the program Compare the outputs to expected results Test Environment System under Test Inputs Outputs October 18, 2015

6 IL2206 Embedded Systems6 Types of software testing Black-box: tests are generated without knowledge of program internals. Clear-box (white-box): tests are generated from the program structure. © 2000 Wolf (Morgan Kaufman) October 18, 2015

7 IL2206 Embedded Systems7 Clear-box testing Generate tests based on the structure of the program. Is a given block of code executed when we think it should be executed? Does a variable receive the value we think it should get? © 2000 Wolf (Morgan Kaufman) October 18, 2015

8 IL2206 Embedded Systems8 Controllability and observability Controllability: must be able to cause a particular internal condition to occur. Observability: must be able to see the effects of a state from the outside. © 2000 Wolf (Morgan Kaufman) October 18, 2015

9 IL2206 Embedded Systems9 Example: FIR filter Code: for (firout = 0.0, j =0; j < N; j++) firout += buff[j] * c[j]; if (firout > 100.0) firout = 100.0; if (firout < -100.0) firout = -100.0; Controllability: to test range checks for firout, must first load circular buffer. Observability: how do we observe values of buff, firout? © 2000 Wolf (Morgan Kaufman) October 18, 2015

10 Example: FIR-Filter How do we observe correct operation? 1. Set the system into a defined state Input k-1 0’s Input 1 2. Observe output Expected Output: c k 3. Input k-1 0’s Expected Outputs: c k-1, c k-2, …, c 1 IL2206 Embedded Systems10 D * + D * D + * D + * ”Tap” xkxk x k-1 x2x2 x1x1 ckck c k-1 c2c2 c1c1 ykyk y k = c k x k + c k-1 x k-1 +... + c 1 x 1 October 18, 2015

11 IL2206 Embedded Systems11 Path-based testing Clear-box testing generally tests selected program paths: control program to exercise a path; observe program to determine if path was properly executed. May look at whether location on path was reached (control), whether variable on path was set (data). © 2000 Wolf (Morgan Kaufman) October 18, 2015

12 IL2206 Embedded Systems12 Example: choosing paths Two possible criteria for selecting a set of paths: Execute every statement at least once. Execute every direction of a branch at least once. Equivalent for structured programs, but not for programs with goto s. © 2000 Wolf (Morgan Kaufman) October 18, 2015

13 IL2206 Embedded Systems13 Path example Covers all statements +/+ Covers all branches © 2000 Wolf (Morgan Kaufman) October 18, 2015

14 IL2206 Embedded Systems14 Branch testing strategy Exercise the elements of a conditional, not just one true and one false case. Devise a test for every simple condition in a Boolean expression. © 2000 Wolf (Morgan Kaufman) October 18, 2015

15 IL2206 Embedded Systems15 Example: branch testing Meant to write: if (a || (b >= c)) { printf(“OK\n”); } Actually wrote: if (a && (b >= c)) { printf(“OK\n”); } Branch testing strategy: One test is a=F, (b >= c) = T: a=0, b=3, c=2. Produces different answers. © 2000 Wolf (Morgan Kaufman) October 18, 2015

16 IL2206 Embedded Systems16 Another branch testing example Meant to write: if ((x == good_pointer) && (x->field1 == 3))... Actually wrote: if ((x = good_pointer) && (x->field1 == 3))... Branch testing strategy: If we use only field1 value to exercise branch, we may miss pointer problem. © 2000 Wolf (Morgan Kaufman) October 18, 2015

17 Tools for Code Coverage Tools exist to analyze to what extent the code is executed ‘gcov’, which is part of ‘gcc’ is a tool to measure code coverage Tool coverage tools can significantly improve the tests of Embedded Software, since it becomes obvious, which parts of the code are never executed during a test! IL2206 Embedded Systems17October 18, 2015

18 Code Coverage Tool gcov Example Code int main (void) { int i; for (i = 1; i < 10; i++) { if (i % 3 == 0) printf ("%d can be divided by 3\n", i); if (i % 11 == 0) printf ("%d can be divided by 11\n", i); } return 0; } IL2206 Embedded Systems18October 18, 2015

19 Running gcov > gcc -fprofile-arcs -ftest-coverage gcov.c > a.out 3 can be divided by 3 6 can be divided by 3 9 can be divided by 3 > gcov gcov.c File 'gcov.c' Lines executed:85.71% of 7 gcov.c:creating 'gcov.c.gcov' IL2206 Embedded Systems19October 18, 2015

20 gcov Output -: 1:#include -: 2: 1: 3:int main (void) { -: 4: int i; 10: 5: for (i = 1; i < 10; i++) -: 6: { 9: 7: if (i % 3 == 0) 3: 8: printf ("%d can be divided by 3\n", i); 9: 9: if (i % 11 == 0) #####: 10: printf ("%d can be divided by 11\n", i); -: 11: } 1: 12: return 0; -: 13:} IL2206 Embedded Systems20October 18, 2015

21 IL2206 Embedded Systems21 Data flow testing Def-use analysis: match variable definitions (assignments) and uses. Example: x = 5; … if (x > 0)... Does assignment get to the use? def use © 2000 Wolf (Morgan Kaufman) October 18, 2015

22 IL2206 Embedded Systems22 Black-box testing Black-box tests are made from the specifications, not the code. Black-box testing complements clear-box. May test unusual cases better. © 2000 Wolf (Morgan Kaufman) October 18, 2015

23 IL2206 Embedded Systems23 Types of black-box tests Specified inputs/outputs: select inputs from spec, determine required outputs. Random: generate random tests, determine appropriate output. Regression: tests used in previous versions of system. © 2000 Wolf (Morgan Kaufman) October 18, 2015

24 IL2206 Embedded Systems24 Evaluating tests It is very important to evaluate your tests Keep track of bugs found Introduce a new test procedure for every found bug Error injection: add bugs to copy of code, run tests on modified code. Error injection can be used to measure fault coverage October 18, 2015

25 IL2206 Embedded Systems25 Formal Verification An alternative to test is formal verification Example Model Checking A formal model of a system is created  State machine Properties that systems shall specify are specified in a formal way  It should never happen that both traffic lights signal “Green” Tool checks that all properties are fulfilled for all input and state combinations, otherwise counter example is generated Only small systems can be verified (state explosion problem) October 18, 2015

26 IL2206 Embedded Systems26 Summary Test and verification are very important for in embedded system design Good tests have to be planned Difficult to cover all test cases Tests should be evaluated in order to allow possible improvements Formal verification is a very promising alternative for critical parts! October 18, 2015

27 Fault Tolerance Ref: E. McCluskey and S. Mitra, “Fault Tolerance”

28 IL2206 Embedded Systems28 Fault Tolerance Fault Tolerance is the ability of a system to continue correct operation after the occurrence of hardware or software failures or operator errors Fault tolerance includes detection of system malfunction identification of faulty units recovery of system from failure October 18, 2015

29 IL2206 Embedded Systems29 Reliability Requirements Reliability requirements vary for different kinds of embedded systems low-cost systems shall operate for a reasonable time and may then fail (calculator, cell phone) repair is often uneconomical safety-critical systems must have a very high reliability (nuclear power plants, automotive control) probability of error in aircraft computer system is less than 10 -9 per hour October 18, 2015

30 IL2206 Embedded Systems30 Failures Any deviation from expected behaviour is a failure Failures that cause system to stop or crash are much easier to detect than failures that degrade system performance occasionally October 18, 2015

31 IL2206 Embedded Systems31 Failures A permanent failure is a failure that is always present incorrect hardware or software functions A temporary failure is a failure that is not always present during operation transient failures (externally induced signal perturbation, power-supply disturbances) intermittent failure (weak system component produces incorrect outputs under certain operating conditions) October 18, 2015

32 IL2206 Embedded Systems32 Source for Failures Incorrect or incomplete specification interfaces not clearly defined Incorrect design (bugs) memory allocation management of data structures communication between processes Non-careful verification process not all possible scenarios are tested or verified October 18, 2015

33 IL2206 Embedded Systems33 Error An error is the occurrence when incorrect data or control signals are produced If a failure occurs in a system it may cause an error not cause an error, if the failure does not affect system operation October 18, 2015

34 IL2206 Embedded Systems34 Fault Model A fault model represents the effect of a failure by means of the change produced in the system signals The usefulness of a fault model can be judged by Effectiveness in failure detection Accuracy of the representation of effects of failures Tractability of design tools that use fault model October 18, 2015

35 IL2206 Embedded Systems35 Example Single Stuck-at Fault Model Single Stuck-at fault model is used to test hardware circuits Very efficient in detection of defect chips Used to determine a minimal set of test vectors Properties Assumes single fault One signal in the system is stuck at value 0 or 1 Failure is observed at output October 18, 2015

36 IL2206 Embedded Systems36 Example Single Stuck-at Fault Model Which test vectors are needed to test an AND-gate according to the single stuck-at-model? AND A B Y October 18, 2015

37 IL2206 Embedded Systems37 Example Single Stuck-at Fault Model Six faults are possible s-a-0(A), s-a-1(A), s-a-0(B), s-a-1(B), s-a-0(Y), s-a-1(Y) AND A B Y 1 s-a-1(A): stuck-at-1 fault in A October 18, 2015

38 IL2206 Embedded Systems38 Three test vectors (ABY = {010, 100, 111}) needed. Reduction with 25%! Example Single Stuck-at Fault Model Faults can only be observed at output! AND A B Y AABBYY ABYs-a-0s-a-1s-a-0s-a-1s-a-0s-a-1 000x 010xx 100xx 111xxx x = Fault can be observed at output! October 18, 2015

39 IL2206 Embedded Systems39 Fault Models Single stuck-at-model has been very successful in hardware design More complicated fault models exist Difficult to develop fault models for software no consensus about the effectiveness of software fault models October 18, 2015

40 IL2206 Embedded Systems40 Reliability Metrics There are many different metrics for reliability depending on the character of the system Reliability of a system at time t is the probability that system will produce correct output up to time t Availability of a system at time t is the probability that the system is operational at time t October 18, 2015

41 IL2206 Embedded Systems41 Reliability Metrics Safety of a system at time t is the probability that the system either will be operating correctly or will fail in a “safe” manner Performability of a system at time t is the probability that the system is operating correctly or at a reduced throughput greater or equal a given value Maintainability M(t) is the probability that it takes t units of time to restore a failed system to normal operation October 18, 2015

42 IL2206 Embedded Systems42 Metrics for Testability There exist even measures for testability, which is the ease with which the system can be tested difficult to quantify important factors test pattern generation cost test application cost observability of state information controllability – production of an internal signal October 18, 2015

43 IL2206 Embedded Systems43 Measurement of reliability Test of a large number of components N At time t G(t) is number of correctly operating components F(t) is number of components that have failed Reliability R(t) = G(t)/N October 18, 2015

44 IL2206 Embedded Systems44 Measurement of reliability There are important other metrics that are related to the presented reliability metrics Mean Time To Failure (MTTF) Mean Time To Repair (MTTR) Together these metrics can be used to calculate other metrics Average Availability: MTTF / (MTTF + MTTR) Mean Time Between Failures (MTBF) MTBF = MTTF + MTTR October 18, 2015

45 IL2206 Embedded Systems45 Bathtub Curve For hardware systems the bathtub curve illustrates the reliability of typical systems October 18, 2015

46 IL2206 Embedded Systems46 Fault Avoidance Reliability can be improved during the design process robust design techniques design validation techniques reliability verification techniques thorough production techniques Fault avoidance techniques are very costly October 18, 2015

47 IL2206 Embedded Systems47 System Failure Response System can respond in different ways to a failure Error on output – Acceptable in non-critical applications digital watch, games Errors masked – Outputs correct even when fault occurs flight control Fault secure – Output correct or error indication if output incorrect banking, telephony, networking Fail safe – Output correct or at “safe value” “red” light for traffic control October 18, 2015

48 IL2206 Embedded Systems48 Error Masking Triple modular redundancy Critical component is tripled Additional majority voting logic R R R 2/3 October 18, 2015

49 IL2206 Embedded Systems49 Error Masking Triple modular redundancy Critical component is tripled Additional majority voting logic R R R 2/3 October 18, 2015

50 IL2206 Embedded Systems50 Error Masking Software Techniques N-Version Programming Several versions of program are written independently Voting is used Recovery Blocks Several versions of program are written independently Only one program is run and monitored If error is detected an alternate program is run October 18, 2015

51 IL2206 Embedded Systems51 Repair Techniques When there is a failure in a system the failure must be detected and isolated Built-In Self-Test: Additional functionality tests if system operates correctly and identifies faulty parts system must respond to the error system must be repaired self-repair techniques (space missions) exact diagnosis for fault and report to maintenance personal October 18, 2015

52 IL2206 Embedded Systems52 Summary Reliability comes with a cost Redundancy is required (extra components that are not needed for the pure functionality) There is a trade-off between reliability and design costs Often a very low probability for a fault-free system is good enough! October 18, 2015


Download ppt "Lecture 11 Test and Fault Tolerance Ingo Sander"

Similar presentations


Ads by Google