Download presentation
Presentation is loading. Please wait.
Published byGraciela Bowles Modified over 9 years ago
1
1 OS II: Dependability & Trust SWIFI-based OS Evaluations Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri Stefan Winter Dept. of Computer Science TU Darmstadt, Germany
2
2 Fault Detection: Software Testing So far: Verification & Validation Testing Techniques Static vs. Dynamic Black-box vs. White-box Last time: Fault Injection (FI) Applications Techniques Some FI tools Today: Testing (SWIFI) of operating systems WHERE: Error propagation in OSs [Johansson’05] WHAT: Error selection for testing [Johansson’07] WHEN: Injection trigger selection [Johansson’07] Next lecture: Profiling the OS extensions (state change @ runtime)
3
3 FI Recap Fault Injection (FI) is the process of either inserting bugs into your system or exposing your system to operational perturbations FI applications for dependable system development Defect Count Estimation (Fault Seeding) Test Suite Evaluation (Mutation Testing) Security Testing Experimental Dependability Evaluations FI techniques Physical FI HW FI Simulated FI SWIFI
4
4 FI Recap (cont.) Where to apply change (location, abstraction/system level) What to inject (what should be injected/corrupted?) Which trigger to use (event, instruction, timeout, exception, … ?) When to inject (on first/second/… trigger event) How often to inject (Heisen-/Bohrbugs) … What to record & interpret? For what purpose? How is the system loaded at the time of the injection Applications running and their load (workload) System resources Real realistic synthetic workload
5
5 Outline for today‘s lecture Drivers - a major dependability issue in commodity OSs An error propagation view FI-based robustness evaluations of the kernel Black box assumption Fault representativeness vs. failure relevance Design and implementation issues of a suitable FI framework Fault modeling Failure modeling Workloads
6
6 The problem: Drivers! Device drivers Numerous: 250 installed (100 active) drivers in XP/Vista Large & complex: 70% of Linux code base Immature: every day 25 new / 100 revised versions Vista drivers Access Rights: kernel mode operation in monolithic OSs Device drivers are the dominant cause of OS failures despite sustained testing efforts Causes of WinXP outages Causes of Win2k outages
7
7 The problem (cont.) Problem statement: Driver failures lead to OS API failures Mitigation approaches 1.Harden OS robustness 2.Improve driver reliability
8
8 The problem (cont.) The problem in terms of error propagation The effect of testing in terms of error propagation The effect of robustness hardening in terms of error propagation
9
9 Issues with the driver testing approach What if the driver is not the root cause? What if we cannot remove defects (e.g. commercial OSs)?
10
10 Issues with the hardening approach What if we cannot remove robustness vulnerabilities? More issues with the hardening approach in next week‘s lecture...
11
11 FI-based robustness evaluations Fault containment wrappers are expensive Additional code is an additional source of bugs Runtime overhead for error checks Where should we add fault containment wrappers? Where errors with critical effects are likely to occur Where propagation is likely Where critical errors propagate How do we know where which errors propagate? Propagation analysis (cf. PROPANE)
12
12 A BD C E F Increasinglybad C E A F DB ! ! Robustness Evaluations
13
13 Robustness Evaluations Experimental technique to ascertain “vulnerabilities” Identify (potential) sources, error propagation & hot spots, etc. Estimate their “effects” on applications Component enhancement with “wrappers” if (X > 100 && Y < 30) then Exception(); Location of wrappers Aspects Metrics for error propagation profiles Experimental analysis
14
14 System Model Applications Operating System Drivers ?
15
15 Device Driver Model the interfaces (defined in C) Export (functions provided by the driver) Import (functions used by the driver) Driver X ds x.1 … ds x.m os x.1 … os x.n Hardware ExportedImported
16
16 Metrics Three metrics for profiling 1.Propagation - how errors flow through the OS 2.Exposure - which OS services are affected 3.Diffusion - which drivers are the sources Impact analysis –Metrics –Case study (WinCE) –Results
17
17 Service Error Permeability 1. Service Error Permeability: Measure one driver’s influence on one OS service Used to study service-driver relations
18
18 OS Service Error Exposure 2. OS Service Error Exposure: An application uses certain services How are these services influenced by driver errors? Used to compare services
19
19 Driver Error Diffusion 3. Driver Error Diffusion: Which driver affects the system the most? Used to compare drivers
20
20 Case Study: Windows CE Targeted drivers Serial Ethernet FI at interface Data level errors Effects on OS services 4 Test applications Test App OS Drivers Target Driver Manager Interceptor Drivers Host
21
21 Error Model Data level errors in OS-Driver interface Wrong values Based on the C-type Boundary Special values Offsets Transient First occurrence
22
22 Impact Analysis Impact ascertained via failure mode analysis Failure classes: Class NF:No visible effect Class 1:Error, no violation Class 2:Error, violation Class 3:OS Crash/Hang ?
23
23 Error Model ErrorC-Type#cases Integers int 7 unsigned int 5 long 7 unsigned long 5 short 7 unsigned short 5 LARGE_INTEGER 7 Void * void 3 Char’s char 7 unsigned char 5 wchar_t 5 Boolean bool 1 Enumsmultiple#ident’s Structsmultiple1 Case #New value 1previous – 1 2previous +1 31 40 5 6INT_MIN 7INT_MAX LONG RegQueryValueEx([in] HKEY hKey, [in] LPCWSTR lpValueName, [in] LPDWORD lpReserved, [out] LPDWORD lpType, [out] LPBYTE lpData, [in/out] LPDWORD lpcbData);
24
24 Service Error Permeability Ethernet driver 42 imported svcs 12 exported svcs Most Class 1 3 Crashes (Class 3)
25
25 OS Service Error Exposure Serial driver 50 imported svcs 10 exported svcs Clustering of failures
26
26 Driver Error Diffusion Higher diffusion for Ethernet Most Class NF Failures at boot-up EthernetSerial #Experiments414411 #Injections228187 #Class NF330 (80%) 377 (92%) #Class 180 (19%)25 (7%) #Class 219 #Class 330 0.6160.460 0.0020.022 0.0070
27
27 Error Models: “What to Inject?” FI’s effectiveness arises based on the chosen error model being (a) representative of actual errors, and (b) effectively triggering “vulnerabilities”. Comparative evaluation of “effectiveness” of different error models: Fewest injections? Most failures? Best “coverage”? Propose a composite error model for enhancing FI effectiveness
28
28 Chosen Drivers & Error Models Error Models: Data-type (DT) Bit-flips (BF) Fuzzing (FZ) DriverDescription #Injection cases DTBFFZ cerfio_serialSerial port39723621410 91C111Ethernet25517221050 atadiskCompactFlash29416581035
29
29 Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);
30
30 Error Models – Data-Type (DT) Errors CaseNew Value 1Previous – 1 2Previous +1 31 40 5 6INT_MIN 7INT_MAX 0x80000000 int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);
31
31 Error Models – Data-Type (DT) Errors Varied #cases depending on the data type Requires tracking of the types for correct injection Complex implementation but scales well int foo(int a, int b) {…} int ret = foo(0x80000000, 0x00000000);
32
32 Error Models – Data-Type (DT) Errors Data typeC-Type#Cases Integers int 7 unsigned int 5 long 7 unsigned long 5 short 7 unsigned short 5 LARGE_INTEGER 7 Misc. * void 3 HKEY 6 struct { … } multiple Strings4 Characters char 7 unsigned char 5 wchar_t 5 Boolean bool 1 Enums multiple cases
33
33 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a209f1, 0x00000000);
34
34 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001
35
35 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a209f1, 0x00000000); 1000101101000101000100111110001 1000101101000100000100111110001
36
36 Error Models – Bit-Flip (BF) Errors int foo(int a, int b) { … } int ret = foo(0x45a289f1, 0x00000000); Typically 32 cases per parameter Easy to implement 1000101101000101000100111110001
37
37 Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);
38
38 Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x17af34c2
39
39 Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x17af34c2, 0x00000000); Selective #cases Simple implementation
40
40 Comparison Compare Error Models on: Number of failures Effectiveness Experimentation Time Identifying services Error propagation
41
41 Failure Classes & Driver Diffusion Failure ClassDescription No FailureNo observable effect Class 1 Error propagated, but still satisfied the OS service specification Class 2 Error propagated and violated the service specification Class 3The OS hung or crashed
42
42 Failure Classes & Driver Diffusion Failure ClassDescription No FailureNo observable effect Class 1 Error propagated, but still satisfied the OS service specification Class 2 Error propagated and violated the service specification Class 3The OS hung or crashed Driver Diffusion: a measure of a driver’s ability to spread errors:
43
43 Number of Failures (Class 3)
44
44 Failure Classes & Driver Diffusion DriversDTBFFZ cerfio_serial1.501.051.56 91C1110.730.980.69 atadisk0.631.860.29 Driver Diffusion (Class 3)
45
45 Experimentation Time DriverError Model Exec. time hmin cerfio_serial DT515 BF3814 FZ2044 91C111 DT156 BF1720 FZ748 atadisk DT256 BF2051 FZ1155
46
46 Identifying Services (Class 3) Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? ServiceDTBFFZ 1X 2XX 3X 4XX 5X 6XX 7XX 8XX 9XXX 10XXX 11XXX 12X 13X 14XXX 15X 16XXX 17X 18X
47
47 Identifying Services (Class 3 + 2) Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? ServiceDTBFFZ 1OXO 2XXO 3XO 4XX 5X 6XX 7XXO 8XX 9XXX 10XXX 11XXX 12OX 13X 14XXX 15X 16XXX 17X 18X
48
48 Bit-Flips: Sensitivity to Bit Position? [LSB][MSB]
49
49 Bit-Flips: Bit Position Profile Cumulative #services identified
50
50 Fuzzing – Number of injections?
51
51 Composite Error Model Let’s take the best of bit-flips and fuzzing Bit-flips: bit 0-9 and 31 Fuzzing: 10 cases ~50% fewer injections Identifies the same service set
52
52 Composite Error Model – Results
53
53 Summary Comparison across three well established error models + CM Data-type Bit-flips Fuzzing ModelImplementationCoverageExecution DT BF FZ CM
54
54 Summary Comparison across three well established error models + CM Data-type Bit-flips Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Requires tracking data types Requires few experiments
55
55 Summary Comparison across three well established error models + CM Data-type Bit-flips Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Found the most Class 3 failures Requires many experiments
56
56 Summary Comparison across three well established error models + CM Data-type Bit-flips Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Finds additional services
57
57 Summary Comparison across three well established error models + CM Data-type Bit-flips Fuzzing ModelImplementationCoverageExecution DT BF FZ CM Profiling gives combined BF & FZ with high coverage
58
58 Summary Comparison across three well established error models + CM Data-type Bit-flips Fuzzing Outlook: When to do the injection? More drivers, OS’s, models? ModelImplementationCoverageExecution DT BF FZ CM
59
59 On the Impact of Injection Triggers for OS Robustness Evaluation Andréas Johansson, Neeraj Suri Department of Computer Science Technische Universtät Darmstadt Germany DEEDS: Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Brendan Murphy Microsoft Research CambridgeUK Presented at ISSRE 2007
60
60 Operating System Robustness Operating System Key operational element Used in virtually all environments robustness! Drivers are a major source of failures [1] [2] [1] Ganapathi et. al., LISA’06 [2] Chou et. al., SOSP’01
61
61 Operating System Robustness External faults Robustness Drivers Interfaces Experimental Fault injection Run-time Interface OS-Driver No source code Goal Identify services with robustness issues Identify drivers spreading errors Applications Drivers OS
62
62 Operating System Robustness The issues behind FI based OS robustness Where to inject? [3] What to inject? [4] When to inject? [today] Outline Problem definition Call strings and call blocks System and error model Experimental setup and method Results [3] Johansson et. al., DSN’05 [4] Johansson et. al., DSN’07
63
63 Fault Injection Target: interface OS-Driver Each call potential injection Problem: too many calls First-occurrence Sample (uniform?) Service invocations
64
64 Fault Injection Observation: calls are not made randomly Repeating sequences of calls Idea: select calls based on “operations” Identify subsequences, select services
65
65 Call Strings & Call Blocks Call string List of tokens (invocations) to a specific driver Call block Subsequence of a call string May be repeating Corresponds to a higher level “operation” Used as trigger for injection
66
66 System and Error Model Error model: bit-flips Shown to be effective Simple to implement Injection Function parameter values
67
67 Experimental Process Execute workload Record call string Extract call blocks Select service targets (1 per call block) Define triggers Based on tracking call blocks Perform injections
68
68 Injection Setup Target OS: Windows CE.Net Target HW: XScale 255
69
69 Failure Classes Failure ClassDescription No FailureNo observable effect Class 1 Error propagated, but still satisfied the OS service specification Class 2 Error propagated and violated the service specification Class 3The OS hung or crashed
70
70 Selected Drivers Serial port driver Ethernet card driver Workload/driver phases:
71
71 Serial Driver Call String and Call Blocks Call string: D02775(747){23}732775(747){23}23 Init Working Clean up
72
72 Ethernet Driver Call String and Call Blocks
73
73 Driver Profiles Driver invocation patterns differ Impact of call block injection efficiency SerialEthernet
74
74 Serial Driver Results
75
75 Serial Driver Service Identification FO δαβ1β1 γ1γ1 ω1ω1 β2β2 γ2γ2 ω2ω2 CreateThreadxxx DisableThreadLibraryCallsxx EventModifyxx FreeLibraryxx HalTranslateBusAddressx InitializeCriticalSectionx InterlockedDecrementx LoadLibraryxx LocalAllocxx memcpyxxx memsetxxx SetProcPermissionsxxx TransBusAddrToStaticx
76
76 Ethernet Driver Results Trigger SerialEthernet #Injections#C3#Injections#C3 First Occ. 24368182012 Call Blocks 840813235612
77
77 Summary Where, What & When? New timing model for interface fault injection Faults in device drivers Based on call strings & call blocks Results Significant difference More services Driver dependent Driver profiling More injections (2436 vs. 8408) Focus on init/clean up?
78
78 Discussion & Outlook Call block identification Scalability? New data structures (suffix trees) Call block selection Working phase vs. initial/clean up Determinism & concurrency Workload selection Error models
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.