1 OS II: Dependability & Trust SWIFI-based OS Evaluations Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri.

1 OS II: Dependability & Trust SWIFI-based OS Evaluations Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri Stefan Winter Dept. of Computer Science TU Darmstadt, Germany

2 Fault Detection: Software Testing  So far: Verification & Validation  Testing Techniques  Static vs. Dynamic  Black-box vs. White-box  Last time: Fault Injection (FI)  Applications  Techniques  Some FI tools  Today: Testing (SWIFI) of operating systems  WHERE: Error propagation in OSs [Johansson’05]  WHAT: Error selection for testing [Johansson’07]  WHEN: Injection trigger selection [Johansson’07]  Next lecture:  Profiling the OS extensions (state change @ runtime)

3 FI Recap Fault Injection (FI) is the process of either inserting bugs into your system or exposing your system to operational perturbations  FI applications for dependable system development  Defect Count Estimation (Fault Seeding)  Test Suite Evaluation (Mutation Testing)  Security Testing  Experimental Dependability Evaluations  FI techniques  Physical FI  HW FI  Simulated FI  SWIFI

4 FI Recap (cont.)  Where to apply change (location, abstraction/system level)  What to inject (what should be injected/corrupted?)  Which trigger to use (event, instruction, timeout, exception, … ?)  When to inject (on first/second/… trigger event)  How often to inject (Heisen-/Bohrbugs)  …  What to record & interpret? For what purpose?  How is the system loaded at the time of the injection  Applications running and their load (workload)  System resources  Real  realistic  synthetic workload

5 Outline for today‘s lecture  Drivers - a major dependability issue in commodity OSs  An error propagation view  FI-based robustness evaluations of the kernel  Black box assumption  Fault representativeness vs. failure relevance  Design and implementation issues of a suitable FI framework  Fault modeling  Failure modeling  Workloads

6 The problem: Drivers!  Device drivers  Numerous: 250 installed (100 active) drivers in XP/Vista  Large & complex: 70% of Linux code base  Immature: every day 25 new / 100 revised versions Vista drivers  Access Rights: kernel mode operation in monolithic OSs  Device drivers are the dominant cause of OS failures despite sustained testing efforts Causes of WinXP outages Causes of Win2k outages

7 The problem (cont.)  Problem statement: Driver failures lead to OS API failures  Mitigation approaches 1.Harden OS robustness 2.Improve driver reliability

8 The problem (cont.) The problem in terms of error propagation The effect of testing in terms of error propagation The effect of robustness hardening in terms of error propagation

9 Issues with the driver testing approach What if the driver is not the root cause? What if we cannot remove defects (e.g. commercial OSs)?

10 Issues with the hardening approach What if we cannot remove robustness vulnerabilities? More issues with the hardening approach in next week‘s lecture...

11 FI-based robustness evaluations  Fault containment wrappers are expensive  Additional code is an additional source of bugs  Runtime overhead for error checks  Where should we add fault containment wrappers?  Where errors with critical effects are likely to occur  Where propagation is likely  Where critical errors propagate  How do we know where which errors propagate?  Propagation analysis (cf. PROPANE)

12 A BD C E F Increasinglybad C E A F DB ! ! Robustness Evaluations

13 Robustness Evaluations  Experimental technique to ascertain “vulnerabilities”  Identify (potential) sources, error propagation & hot spots, etc.  Estimate their “effects” on applications  Component enhancement with “wrappers” if (X > 100 && Y < 30) then Exception(); Location of wrappers  Aspects  Metrics for error propagation profiles  Experimental analysis

14 System Model Applications Operating System Drivers ?

15 Device Driver  Model the interfaces (defined in C)  Export (functions provided by the driver)  Import (functions used by the driver) Driver X ds x.1 … ds x.m os x.1 … os x.n Hardware ExportedImported

16 Metrics Three metrics for profiling 1.Propagation - how errors flow through the OS 2.Exposure - which OS services are affected 3.Diffusion - which drivers are the sources  Impact analysis –Metrics –Case study (WinCE) –Results

17 Service Error Permeability 1. Service Error Permeability:  Measure one driver’s influence on one OS service  Used to study service-driver relations

18 OS Service Error Exposure 2. OS Service Error Exposure:  An application uses certain services  How are these services influenced by driver errors?  Used to compare services

19 Driver Error Diffusion 3. Driver Error Diffusion:  Which driver affects the system the most?  Used to compare drivers

20 Case Study: Windows CE  Targeted drivers  Serial  Ethernet  FI at interface  Data level errors  Effects on OS services  4 Test applications Test App OS Drivers Target Driver Manager Interceptor Drivers Host

21 Error Model  Data level errors in OS-Driver interface  Wrong values  Based on the C-type Boundary Special values Offsets  Transient  First occurrence

22 Impact Analysis  Impact ascertained via failure mode analysis  Failure classes:  Class NF:No visible effect  Class 1:Error, no violation  Class 2:Error, violation  Class 3:OS Crash/Hang ?

23 Error Model ErrorC-Type#cases Integers int 7 unsigned int 5 long 7 unsigned long 5 short 7 unsigned short 5 LARGE_INTEGER 7 Void * void 3 Char’s char 7 unsigned char 5 wchar_t 5 Boolean bool 1 Enumsmultiple#ident’s Structsmultiple1 Case #New value 1previous – 1 2previous +1 31 40 5 6INT_MIN 7INT_MAX LONG RegQueryValueEx([in] HKEY hKey, [in] LPCWSTR lpValueName, [in] LPDWORD lpReserved, [out] LPDWORD lpType, [out] LPBYTE lpData, [in/out] LPDWORD lpcbData);

24 Service Error Permeability  Ethernet driver  42 imported svcs  12 exported svcs  Most Class 1  3 Crashes (Class 3)

25 OS Service Error Exposure  Serial driver  50 imported svcs  10 exported svcs  Clustering of failures

26 Driver Error Diffusion  Higher diffusion for Ethernet  Most Class NF  Failures at boot-up EthernetSerial #Experiments414411 #Injections228187 #Class NF330 (80%) 377 (92%) #Class 180 (19%)25 (7%) #Class 219 #Class 330 0.6160.460 0.0020.022 0.0070

27 Error Models: “What to Inject?”  FI’s effectiveness arises based on the chosen error model being (a) representative of actual errors, and (b) effectively triggering “vulnerabilities”.  Comparative evaluation of “effectiveness” of different error models:  Fewest injections?  Most failures?  Best “coverage”?  Propose a composite error model for enhancing FI effectiveness

28 Chosen Drivers & Error Models Error Models:  Data-type (DT)  Bit-flips (BF)  Fuzzing (FZ) DriverDescription #Injection cases DTBFFZ cerfio_serialSerial port39723621410 91C111Ethernet25517221050 atadiskCompactFlash29416581035

29 Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

30 Error Models – Data-Type (DT) Errors CaseNew Value 1Previous – 1 2Previous +1 31 40 5 6INT_MIN 7INT_MAX 0x80000000 int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

31 Error Models – Data-Type (DT) Errors  Varied #cases depending on the data type  Requires tracking of the types for correct injection  Complex implementation but scales well int foo(int a, int b) {…} int ret = foo(0x80000000, 0x00000000);

32 Error Models – Data-Type (DT) Errors Data typeC-Type#Cases Integers int 7 unsigned int 5 long 7 unsigned long 5 short 7 unsigned short 5 LARGE_INTEGER 7 Misc. * void 3 HKEY 6 struct { … } multiple Strings4 Characters char 7 unsigned char 5 wchar_t 5 Boolean bool 1 Enums multiple cases

33 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a209f1, 0x00000000);

34 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001

35 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a209f1, 0x00000000); 1000101101000101000100111110001 1000101101000100000100111110001

36 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a289f1, 0x00000000);  Typically 32 cases per parameter  Easy to implement 1000101101000101000100111110001

37 Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

38 Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x17af34c2

39 Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x17af34c2, 0x00000000);  Selective #cases  Simple implementation

40 Comparison Compare Error Models on:  Number of failures  Effectiveness  Experimentation Time  Identifying services  Error propagation

41 Failure Classes & Driver Diffusion Failure ClassDescription No FailureNo observable effect Class 1 Error propagated, but still satisfied the OS service specification Class 2 Error propagated and violated the service specification Class 3The OS hung or crashed

42 Failure Classes & Driver Diffusion Failure ClassDescription No FailureNo observable effect Class 1 Error propagated, but still satisfied the OS service specification Class 2 Error propagated and violated the service specification Class 3The OS hung or crashed Driver Diffusion: a measure of a driver’s ability to spread errors:

43 Number of Failures (Class 3)

44 Failure Classes & Driver Diffusion DriversDTBFFZ cerfio_serial1.501.051.56 91C1110.730.980.69 atadisk0.631.860.29 Driver Diffusion (Class 3)

45 Experimentation Time DriverError Model Exec. time hmin cerfio_serial DT515 BF3814 FZ2044 91C111 DT156 BF1720 FZ748 atadisk DT256 BF2051 FZ1155

46 Identifying Services (Class 3)  Which OS services can cause Class 3 failures?  Which error model identifies most services (coverage)?  Is some model consistently better/worse?  Can we combine models? ServiceDTBFFZ 1X 2XX 3X 4XX 5X 6XX 7XX 8XX 9XXX 10XXX 11XXX 12X 13X 14XXX 15X 16XXX 17X 18X

47 Identifying Services (Class 3 + 2)  Which OS services can cause Class 3 failures?  Which error model identifies most services (coverage)?  Is some model consistently better/worse?  Can we combine models? ServiceDTBFFZ 1OXO 2XXO 3XO 4XX 5X 6XX 7XXO 8XX 9XXX 10XXX 11XXX 12OX 13X 14XXX 15X 16XXX 17X 18X

48 Bit-Flips: Sensitivity to Bit Position? [LSB][MSB]

49 Bit-Flips: Bit Position Profile Cumulative #services identified

50 Fuzzing – Number of injections?

51 Composite Error Model  Let’s take the best of bit-flips and fuzzing  Bit-flips: bit 0-9 and 31  Fuzzing: 10 cases  ~50% fewer injections  Identifies the same service set

52 Composite Error Model – Results

53 Summary  Comparison across three well established error models + CM  Data-type  Bit-flips  Fuzzing ModelImplementationCoverageExecution DT BF FZ CM

54 Summary  Comparison across three well established error models + CM  Data-type  Bit-flips  Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Requires tracking data types Requires few experiments

55 Summary  Comparison across three well established error models + CM  Data-type  Bit-flips  Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Found the most Class 3 failures Requires many experiments

56 Summary  Comparison across three well established error models + CM  Data-type  Bit-flips  Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Finds additional services

57 Summary  Comparison across three well established error models + CM  Data-type  Bit-flips  Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Profiling gives combined BF & FZ with high coverage

58 Summary  Comparison across three well established error models + CM  Data-type  Bit-flips  Fuzzing  Outlook:  When to do the injection?  More drivers, OS’s, models? ModelImplementationCoverageExecution DT BF FZ CM

59 On the Impact of Injection Triggers for OS Robustness Evaluation Andréas Johansson, Neeraj Suri Department of Computer Science Technische Universtät Darmstadt Germany DEEDS: Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Brendan Murphy Microsoft Research CambridgeUK Presented at ISSRE 2007

60 Operating System Robustness  Operating System  Key operational element  Used in virtually all environments  robustness!  Drivers are a major source of failures [1] [2] [1] Ganapathi et. al., LISA’06 [2] Chou et. al., SOSP’01

61 Operating System Robustness  External faults  Robustness  Drivers  Interfaces  Experimental  Fault injection  Run-time  Interface OS-Driver  No source code  Goal  Identify services with robustness issues  Identify drivers spreading errors Applications Drivers OS

62 Operating System Robustness  The issues behind FI based OS robustness  Where to inject? [3]  What to inject? [4]  When to inject? [today]  Outline  Problem definition  Call strings and call blocks  System and error model  Experimental setup and method  Results [3] Johansson et. al., DSN’05 [4] Johansson et. al., DSN’07

63 Fault Injection  Target: interface OS-Driver  Each call potential injection  Problem: too many calls  First-occurrence  Sample (uniform?) Service invocations

64 Fault Injection  Observation: calls are not made randomly  Repeating sequences of calls  Idea: select calls based on “operations”  Identify subsequences, select services

65 Call Strings & Call Blocks  Call string  List of tokens (invocations) to a specific driver  Call block  Subsequence of a call string  May be repeating  Corresponds to a higher level “operation”  Used as trigger for injection

66 System and Error Model  Error model: bit-flips  Shown to be effective  Simple to implement  Injection  Function parameter values

67 Experimental Process  Execute workload  Record call string  Extract call blocks  Select service targets (1 per call block)  Define triggers  Based on tracking call blocks  Perform injections

68 Injection Setup  Target OS: Windows CE.Net  Target HW: XScale 255

69 Failure Classes Failure ClassDescription No FailureNo observable effect Class 1 Error propagated, but still satisfied the OS service specification Class 2 Error propagated and violated the service specification Class 3The OS hung or crashed

70 Selected Drivers  Serial port driver  Ethernet card driver  Workload/driver phases:

71 Serial Driver Call String and Call Blocks  Call string: D02775(747){23}732775(747){23}23 Init Working Clean up

72 Ethernet Driver Call String and Call Blocks

73 Driver Profiles  Driver invocation patterns differ  Impact of call block injection efficiency SerialEthernet

74 Serial Driver Results

75 Serial Driver Service Identification FO δαβ1β1 γ1γ1 ω1ω1 β2β2 γ2γ2 ω2ω2 CreateThreadxxx DisableThreadLibraryCallsxx EventModifyxx FreeLibraryxx HalTranslateBusAddressx InitializeCriticalSectionx InterlockedDecrementx LoadLibraryxx LocalAllocxx memcpyxxx memsetxxx SetProcPermissionsxxx TransBusAddrToStaticx

76 Ethernet Driver Results Trigger SerialEthernet #Injections#C3#Injections#C3 First Occ. 24368182012 Call Blocks 840813235612

77 Summary  Where, What & When?  New timing model for interface fault injection  Faults in device drivers  Based on call strings & call blocks  Results  Significant difference  More services  Driver dependent  Driver profiling  More injections (2436 vs. 8408)  Focus on init/clean up?

78 Discussion & Outlook  Call block identification  Scalability?  New data structures (suffix trees)  Call block selection  Working phase vs. initial/clean up  Determinism & concurrency  Workload selection  Error models

1 OS II: Dependability & Trust SWIFI-based OS Evaluations Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri.

Similar presentations

Presentation on theme: "1 OS II: Dependability & Trust SWIFI-based OS Evaluations Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 OS II: Dependability & Trust SWIFI-based OS Evaluations Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri.

Similar presentations

Presentation on theme: "1 OS II: Dependability & Trust SWIFI-based OS Evaluations Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri."— Presentation transcript:

Similar presentations

About project

Feedback