Presentation is loading. Please wait.

Presentation is loading. Please wait.

8/25/2005IEEE PacRim 20051 The Check-Pointed and Error-Recoverable MPI Java of AgentTeamwork Grid Computing Middleware Munehiro Fukuda and Zhiji Huang.

Similar presentations


Presentation on theme: "8/25/2005IEEE PacRim 20051 The Check-Pointed and Error-Recoverable MPI Java of AgentTeamwork Grid Computing Middleware Munehiro Fukuda and Zhiji Huang."— Presentation transcript:

1 8/25/2005IEEE PacRim 20051 The Check-Pointed and Error-Recoverable MPI Java of AgentTeamwork Grid Computing Middleware Munehiro Fukuda and Zhiji Huang Computing & Software Systems, University of Washington, Bothell Funded by

2 8/25/2005 IEEE PacRim 2005 2 Background Target applications in most grid-computing systems  Communication takes place at the beginning and the end of each sub task.  A crashed sub task will be simply repeated.  Example: Master-worker and parameter-sweep models Fault tolerance  FT-MPI or MPI in Legion/Avaki The system will not recover lost messages. Users must specify variables to save and add a function called from MPI.Init( ) upon a job resumption.  Condor MW Messages between the master and each slave will be saved. No inter-slave communication will be check-pointed.  Rock/Rack Socket buffers will be saved at application level. A process must be mobile-aware to keep track of its communication counterpart.

3 8/25/2005 IEEE PacRim 2005 3 Objective More programming models  Not restricted to master slave or parameter sweep  Targeting heartbeat, pipeline, and collective- communication-oriented applications Process resumption in its middle  Resuming a process from the last checkpoint.  Allowing process migration for performance improvement Error-recovery support from sockets to MPI  Facilitating check-pointed error-recoverable Java socket.  Implementing mpiJava API with our fault-tolerant socket.

4 8/25/2005 IEEE PacRim 2005 4 System Overview Bookkeeper Agent FTP Server User A User B User B snapshot snapshots User program wrapper Snapshot Methods GridTCP User program wrapper Snapshot Methods GridTCP User program wrapper Snapshot Methods GridTCP snapshot User A’s Process User A’s Process User B’s Process TCP Communication Commander Agent Sentinel Agent Resource Agent Sentinel Agent Resource Agent Results

5 8/25/2005 IEEE PacRim 2005 5 Execution Layer Operating systems UWAgents mobile agent execution platform Commander, resource, sentinel, and bookkeeper agents User program wrapper GridTcpJava socket mpiJava-AmpiJava-S mpiJava API Java user applications User program wrapper GridTcp mpiJava-AmpiJava-S

6 8/25/2005 IEEE PacRim 2005 6 Programming Interface public class MyApplication { public GridIpEntry ipEntry[]; // used by the GridTcp socket library public int funcId; // used by the user program wrapper public GridTcp tcp; // the GridTcp error-recoverable socket public int nprocess; // #processors public int myRank; // processor id ( or mpi rank) public int func_0( String args[] ) { // constructor MPJ.Init( args, ipEntry, tcp ); // invoke mpiJava-A.....; // more statements to be inserted return 1; // calls func_1( ) } public int func_1( ) { // called from func_0 if ( MPJ.COMM_WORLD.Rank( ) == 0 ) MPJ.COMM_WORLD.Send(... ); else MPJ.COMM_WORLD.Recv(... );.....; // more statements to be inserted return 2; // calls func_2( ) } public int func_2( ) { // called from func_2, the last function.....; // more statements to be inserted MPJ.finalize( ); // stops mpiJava-A return -2; // application terminated }

7 8/25/2005 IEEE PacRim 2005 7 MPJ Package MPJ Init( ), Rank( ), Size( ), and Finalize( ) Communicator All communication functions: Send( ), Recv( ), Gather( ), Reduce( ), etc. JavaComm GridComm DataType MPJMessage Op etc mpiJava-S: uses java sockets and server sockets. mpiJava-A: uses GridTcp sockets. MPJ.INT, MPJ.LONG, etc. getStatus( ), getMessage( ), etc. Operate( ) Other utilities  InputStream for each rank  OutputStream for each rank  User a permanent 64K buffer for serialization  Emulate collective communication sending the same data to each OutputStream, which deteriorates performance

8 8/25/2005 IEEE PacRim 2005 8 GridTcp – Check-Pointed Connection n1.uwb.edu n3.uwb.edu n2.uwb.edu TCP user program rankip 1n1.uwb.edu 2n2.uwb.edu outgoing backup incoming User Program Wrapper Snapshot maintenance TCP user program n2.uwb.edu2 n1.uwb.edu1 iprank incoming ougoing backup User Program Wrapper n3.uwb.edu user program n3.uwb.edu2 n1.uwb.edu1 iprank incoming ougoing backup User Program Wrapper TCP Outgoing packets saved in a backup queue All packets serialized in a backup file every check pointing Upon a migration  Packets de-serialized from a backup file  Backup packets restored in outgoing queue  IP table updated

9 8/25/2005 IEEE PacRim 2005 9 GridTcp – Over-Gateway Connection user program rankdestgateway 0mnode0- 1medusa- 2uw1-320medusa 3uw1-320-00medusa User Program Wrapper user program rankdestgateway 0mnode0- 1medusa- 2uw1-320- 3uw1-320-00Uw1-320 User Program Wrapper user program rankdestgateway 0mnode0medusa 1 - 2uw1-320- 3uw1-320-00- User Program Wrapper user program rankdestgateway 0mnode0uw1-320 1medusauw1-320 2 - 3uw1-320-00- User Program Wrapper mnode0 (rank 0) medusa.uwb.edu (rank 1) uw1-320.uwb.edu (rank 2) uw1-320-00 (rank 3)  RIP-like connection  Restriction: each node name must be unique.

10 8/25/2005 IEEE PacRim 2005 10 User Program Wrapper statement_1; statement_2; statement_3; statement_4; statement_5; statement_6; statement_7; statement_8; statement_9; int fid = 1; while( fid == -2) { switch( func_id ) { case 0: fid = func_0( ); case 1: fid = func_1( ); case 2: fid = func_2( ); } check_point( ) { // save this object // including func_id // into a file } check_point( ); func_0( ) { statement_1; statement_2; statement_3; return 1; } func_1( ) { statement_4; statement_5; statement_6; return 2; } func_2( ) { statement_7; statement_8; statement_9; return -2; } User Program Wrapper Source Code Preprocessed

11 8/25/2005 IEEE PacRim 2005 11 Preproccesser and Drawback No recursions Useless source line numbers indicated upon errors Still need of explicit snapshot points. statement_1; statement_2; statement_3; check_point( ); while (…) { statement_4; if (…) { statement_5; check_point( ); statement_6; } else statement_7; statement_8; } check_point( ); int func_0( ) { statement_1; statement_2; statement_3; return 1; } int func_1( ) { while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement_8; } int func_2( ) { statement_6; statement_8; while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement8; } Source Code Preprocessed Code Before check_point( ) in if-clause After check_point( ) in if-clause Preprocessed

12 8/25/2005 IEEE PacRim 2005 12 user program n2.uwb.edu2 n1.uwb.edu1 iprank outgoing backup incoming TCP User Program Wrapper MPI Connection MPI Job Coordination UWPlace (UWAgent Execution Platform) Sentinel Agent Main Thread SendSnapshot Thread TCPError ThreadReceiveMsg Thread snapshot Bookkeeper Agent snapshot Resumed Sentinel Agent Restart message (a new rank/ip pair) n3.uwb.edu

13 8/25/2005 IEEE PacRim 2005 13 MPJ.Send and Recv Performance

14 8/25/2005 IEEE PacRim 2005 14 MPJ.Bcast Performance - Doubles

15 8/25/2005 IEEE PacRim 2005 15 Conclusions Raw bandwidth  mpiJava-S comes to about 95-100% of maximum Java performance.  mpiJava-A (with check-pointing and error recovery) incurs 20-60% overhead, but still overtakes mpiJava with bigger data segments. Serialization  When dealing with primitives or objects that need serialization, a 25-50% overhead is incurred. Memory issues related to mpiJavaA  Due to snapshots created every func_n call. Next work  Performance and memory-usage improvement  Preprocessor implementation


Download ppt "8/25/2005IEEE PacRim 20051 The Check-Pointed and Error-Recoverable MPI Java of AgentTeamwork Grid Computing Middleware Munehiro Fukuda and Zhiji Huang."

Similar presentations


Ads by Google