Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B.

Similar presentations


Presentation on theme: "Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B."— Presentation transcript:

1 Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B

2 Motivation Grids have become widespread in organizations Handle large amount of information Manage computational resources Difficult to implement “correct” Grid applications Formal methods useful in order to ensure correctness of specifications Can be difficult to implement The specification language should take into account the features of the underlying platform Fault tolerance also important for correctness due to the nature of the grid environment

3 Grids Used for large-scale distributed systems Scientific computing, e.g., in Physics and engineering Business applications Share information and computational resources over organizational boundaries Loosely coupled systems SOA Client – Server architecture

4 Grid services Services in a grid environment that can be accessed by clients Similar to remote objects in CORBA and RMI Remote procedures used for communication Host Client Grid service

5 Grid Services Based on Web Services XML SOAP WSDL Extends Web services with Potentially transient services containing state Service data Notifications Globus Toolkit middleware ClientGrid service Remote procedure call Notification

6 The architecture of a grid application OS Globus T. Client Grid service RPC Notification Host 1 Host 2

7 Faults Remote procedure calls can fail in four different ways 1.The server grid service instance has crashed before the call 2.The network connection fails when calling a remote procedure 3.The server instance fails during the call 4.The network connection fails when returning the result A notification can fail to arrive for two reasons 1.The sending grid service crashed before sending the notification 2.The network connection fails during sending The client crashes when using a server grid service The server grid service becomes an orphan

8 Fault tolerance using GT No support for advanced fault tolerance mechanisms such as replication or check-pointing Exception is raised in the caller when a call to a remote procedure fails Not easy to know what caused the exception to be raised That a notifications is lost can be discovered with timers in the client The most difficult error to handle is removal of orphan grid service instances

9 Orphan control Need to remove orphan grid service instances, since they waste resources New remote procedure isAlive Timer in both client and server grid service

10 Orphan control Need to remove orphan grid service instances, since they waste resources New remote procedure isAlive Timer in both client and server grid service An exception is raised in the client when a call to isAlive fails The server grid service is deleted when a timeout occurs in it

11 Event B Extension of the B Method Developed by J. R. Abrial Based on Action Systems by Back and Kurki-Suonio Related To B Action Systems SYSTEM C VARIABLES x INVARIANT Inv_C INITIALISATION x := x0 EVENTS C_Evt1 = ANY u WHERE G1(u,x) THEN S1 END; C_Evt2 = SELECT G2 THEN S2 END; END

12 Formal development of Grid applications We like to have a formal method suitable for developing fault tolerant grid applications Difficult to create implementable specifications of grid applications in Event B No grid communication mechanisms such as remote procedures and notifications No fault tolerance mechanisms Difficult to implement due to synchronization issues and atomicity of events We need to extend Event B with constructs for Specifying grid services Remote procedure calls and notifications Fault tolerance Extensions should be introduced in a manner that simplifies implementation

13 Distributed B Provides two new types of B machines GRIDSERVICE GRID_REFINEMENT Take into account grid specific features Remote procedures Notifications Timeouts due to lost notifications Exceptions due to failed calls to isAlive Enables us to prove properties about the entire system Are translated to ordinary B for verification New constructs get their semantics from the translation Automatic generation of proof obligations Enable automatic or semi-automatic translation of the specification to a programming language

14 Grid service machine Abstract specification of a grid service A grid service machine is a template that clients obtain instances of Compare to Classes in OO Remote procedures Ordinary B procedures called from a client Events Executed independently of a client Notifications Sent when all events have become disabled Proc(p) (J1  J2)  Q J2T2 J1T1 Grid service Remote procedures: Events: Notifications:

15 Grid refinement machine (1) A client that uses grid service machine instances Refines GRIDSERVICE, ordinary SYSTEM or REFINEMENT Clause for enabling dynamic management of grid service machine instances Instances are used as variables When a failed instance is discovered it is marked as no longer in use and deleted from the application Clause for refining remote procedures Clause for refining events

16 Grid refinement machine (2) Special substitution used in events for making remote procedures calls Enables the exceptions for failed calls to be handled Special events that consists of two parts for handling notifications First part enabled when a notification has been sent from a grid service Second part enabled when a timeout occurs Executed once for each notification/timeout Special event for handling failed calls to isAlive Enabled for each grid service instance in use. Non-deterministically models failures of instances

17 The behaviour of grid components Proc(p) (J1  J2)  Q G2S2 NotifHandler J2T2G3S3 J1T1 Grid serviceGrid refinement G1S1 Remote procedures: Events: Notifications: Notification handlers: Events:

18 Grid service machine GRIDSERVICE A VARIABLES y INVARIANT Inv_A INITIALISATION y := y0 REMOTE_PROCEDURES Proc(p) = PRE P(p) THEN T END EVENTS A_Evt1 = ANY u WHERE J1(u,y) THEN T1 END; A_Evt2 = SELECT J2 THEN T2 END; NOTIFICATIONS Notif = GUARANTEES Q END END

19 Grid refinement machine (1) GRID_REFINEMENT C2 REFINES C1 REFERENCES A VARIABLES z,x,a_inst INVARIANT a_inst:A & Inv_C’ INITIALISATION x := x0 || z:=z0 || a_inst::A EVENTS C_Evt1 = SELECT G1’ THEN CALL a_inst.Proc(x) EXCEPTION S E END || S1’ END; C_Evt2 = SELECT G2’ THEN S2’ END

20 Grid refinement machine (2) NOTIFICATION_HANDLERS NotifHandler = NOTIFICATION Notif SOURCE v:A THEN S3 TIMEOUT ST END IS_ALIVE_HANDLERS IAHandler = SOURCE v:A THEN S IA END

21 Conclusions Enables construction of correct fault tolerant grid applications Automatic generation of proof obligations Implementable architecture by construction These Event B extensions can also use other middleware for distributed systems


Download ppt "Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B."

Similar presentations


Ads by Google