Presentation is loading. Please wait.

Presentation is loading. Please wait.

ABCSG - Dependable Systems - 01/06/2006 1 ABCSG Dependable Systems.

Similar presentations


Presentation on theme: "ABCSG - Dependable Systems - 01/06/2006 1 ABCSG Dependable Systems."— Presentation transcript:

1 ABCSG - Dependable Systems - 01/06/2006 1 ABCSG Dependable Systems

2 ABCSG - Dependable Systems - 01/06/20062 Agenda  Dependable Computing Basic concepts Basic concepts DefinitionsDefinitions AttributesAttributes ThreadsThreads Means to attain dependability Means to attain dependability Fault preventionFault prevention Fault removalFault removal Fault forecastingFault forecasting Fault toleranceFault tolerance -> Branch into techniques -> Branch into Coordinated Atomic Actions

3 ABCSG - Dependable Systems - 01/06/20063 Dependable Computing - Definition  Ability to deliver service that can justifiably be trusted or  Ability of a system to avoid service failures that are more frequent or more severe than is acceptable

4 ABCSG - Dependable Systems - 01/06/20064 Dependable Computing - Attributes

5 ABCSG - Dependable Systems - 01/06/20065 Dependable Computing - Threats  Everything that can influence the system in such a way, that it will result in the system to fall outside the definition of dependable Development phase Development phase Physical worldPhysical world Human developersHuman developers Development toolsDevelopment tools Production and test facilitiesProduction and test facilities Use phase Use phase Physical worldPhysical world AdministratorsAdministrators Users of servicesUsers of services Providers of servicesProviders of services InfrastructureInfrastructure IntrudersIntruders

6 ABCSG - Dependable Systems - 01/06/20066 Means - Fault prevention  A failure is the result of an error  An error is the result of a fault => Prevent faults = prevent failure  Basically we all know how (right?) Information hiding Information hiding Modularization Modularization Strongly typed languages Strongly typed languages......

7 ABCSG - Dependable Systems - 01/06/20067 Means - Fault removal  During development (also test fault tolerance by fault injection) (also test fault tolerance by fault injection)  During use Corrective maintenance Corrective maintenance Preventive maintenance Preventive maintenance

8 ABCSG - Dependable Systems - 01/06/20068 Means - Fault forecasting  The performance of a evaluation of the system behavior with respect to fault occurrence or activation.  Qualitative evaluation Identify the failure modes or the event combinations that would lead to system failure. Identify the failure modes or the event combinations that would lead to system failure.  Quantitative evaluation Identify in terms of probabilities the extent to which some of the attributes of dependability are satisfied. Identify in terms of probabilities the extent to which some of the attributes of dependability are satisfied.

9 ABCSG - Dependable Systems - 01/06/20069 Means - Fault tolerance  Fault prevention include human activities and is thus imperfect => We need fault removal  Fault removal include human activities and is thus imperfect => We need fault forecasting  Fault forecasting include human activities and is thus imperfect => We need fault tolerance  Fault tolerance include human activities and is thus imperfect => Systems will fail... but a combination of all aforementioned techniques, can best lead to dependable computing... so lets have a look at fault tolerance

10 ABCSG - Dependable Systems - 01/06/200610 Fault tolerance  Recall that fault tolerance is one of the means to attain dependable systems  Terminology and key concept Fault -> Error -> Failure Fault -> Error -> Failure Failure semantics Failure semantics Redundancy Redundancy  Techniques Sequential Sequential Independent concurrent systems Independent concurrent systems Competitive concurrent systems Competitive concurrent systems Cooperative concurrent systems Cooperative concurrent systems Hybrid systems Hybrid systems

11 ABCSG - Dependable Systems - 01/06/200611 Fault tolerance - Terminology and key concept  A failure is the observation of an erroneous system state  An error is an erroneous system state, which might lead to a failure  A fault is a system defect, which might lead to an error

12 ABCSG - Dependable Systems - 01/06/200612 Fault tolerance - Terminology and key concept English  A failure is a consequence of an error that is the consequence of a fault Fault => Error => Failure Fault => Error => FailureDansk  En fejl er konsekvensen af en fejl som er konsekvensen af en fejl Fejl => Fejl => Fejl Fejl => Fejl => Fejl (Tænk lidt over den)

13 ABCSG - Dependable Systems - 01/06/200613 Fault tolerance - Terminology and key concept  We have a space of possibility between an error and a failure  Redundancy is the key concept

14 ABCSG - Dependable Systems - 01/06/200614 Fault tolerance - Sequential systems  Recovery blocks - redundant algorithms  Retry blocks - redundant data Acceptance test examines the system state to verify that the behavior is acceptable

15 ABCSG - Dependable Systems - 01/06/200615 Fault tolerance - Independent concurrent systems  N-Version programming - The parallel version of recovery blocks  N-Copy programming - The parallel version of retry blocks The decision mechanism must decide if one of the results can be considered correct... and this is not an easy task ! - Multiple correct results, floating point precision... - Exact majority voter, mean voter, consensus voter, etc...

16 ABCSG - Dependable Systems - 01/06/200616 Fault tolerance - Competitive concurrent systems  Two or more processes are not aware of each other, but share some resources  They want to live in their own environment and a fault in one process should not affect the other processes  Transactions Atomicity / Consistency / Isolation / DurabilityAtomicity / Consistency / Isolation / Durability Provide backward error recoveryProvide backward error recovery Together with exception handling, transactions can be used to provide forward error recoveryTogether with exception handling, transactions can be used to provide forward error recovery In self-checking transactional objects methods are decorated with a pre and a post conditionIn self-checking transactional objects methods are decorated with a pre and a post condition

17 ABCSG - Dependable Systems - 01/06/200617 Fault tolerance - Cooperative concurrent systems  Several processes cooperate in executing a common job, and they are aware of each other  Conversation Works like a transaction involving several processes Works like a transaction involving several processes It’s an isolated environment for the participating processes, they are not allowed to communicate outside the conversation (information smuggling) It’s an isolated environment for the participating processes, they are not allowed to communicate outside the conversation (information smuggling) Ultimately everybody commits or rollback to the state from the beginning of the conversation - backward error recovery Ultimately everybody commits or rollback to the state from the beginning of the conversation - backward error recovery  Atomic actions Is a conversation, but with the ability to do forward error recovery Is a conversation, but with the ability to do forward error recovery

18 ABCSG - Dependable Systems - 01/06/200618 Fault tolerance - Hybrid systems  Models that support both competitive and corporative concurrency  Coordinated atomic actions An atomic action, but with the possibility of the participants to access external objects An atomic action, but with the possibility of the participants to access external objects Atomic actions to control cooperative concurrency and coordinated error recovery Atomic actions to control cooperative concurrency and coordinated error recovery Transactions to control competitive concurrency to maintain the consistency of the shared resources in case of failures Transactions to control competitive concurrency to maintain the consistency of the shared resources in case of failures

19 ABCSG - Dependable Systems - 01/06/200619 Coordinated Atomic Actions... must be another day, I think time is up!


Download ppt "ABCSG - Dependable Systems - 01/06/2006 1 ABCSG Dependable Systems."

Similar presentations


Ads by Google