Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.

Similar presentations


Presentation on theme: "June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based."— Presentation transcript:

1 June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based upon work supported by the National Science Foundation under Grant No. 0426972.

2 June 6, 2007TeraGrid '072 Goal Provide a highly available Reliable File Transfer (RFT) Service –Tolerate server failures Hardware/software faults and resource exhaustion –Continue to handle incoming requests –Continue to make forward progress on file transfers in the queue

3 June 6, 2007TeraGrid '073 Globus Toolkit Reliable File Transfer Service RFTClient GridFTP

4 June 6, 2007TeraGrid '074 RFT and GridFTP Clustering GridFTP control RFT GridFTP data RFT

5 June 6, 2007TeraGrid '075 Clustering Approach RFT Load Balancer HA DBMS

6 June 6, 2007TeraGrid '076 Web Service Container RFT State Management RFT Delegation Service Client DBMS

7 June 6, 2007TeraGrid '077 RFT DB Tables RequestTransferRestart ID Termination Time Started Flag Max Attempts Delegated EPR Container ID Start Time ID Request ID Source URL Destination URL Status Attempts Retry Time Transfer ID Restart Marker Last Update Time Added Fields

8 June 6, 2007TeraGrid '078 New Tables Delegation ServicePersistent Subscription Resource ID Caller DN Local Name Termination Time Listener Certificate Container ID Consumer Producer Policy Precondition Selector Topic Security Descriptor …

9 June 6, 2007TeraGrid '079 RFT Fail-Over Based on time-outs Periodically query database for pending requests with no recent activity –Stalled requests could be caused by RFT service crash, hardware failure, RFT service overload, etc. –If found, obtain DB write lock, query again, claim stalled requests, and release lock Configuration values: –Query interval (default: 30 seconds) –Recent interval (default: 60 seconds)

10 June 6, 2007TeraGrid '0710 Evaluation Environment Dedicated 12 node Linux cluster –Red Hat Enterprise Linux AS Release 3 –Switched Gigabit Ethernet –2 GB RAM –dual 2GHz Intel Xeon CPUs 512KB cache Globus Toolkit 4.0.3 MySQL Standard 5.0.27

11 June 6, 2007TeraGrid '0711 Evaluation Correctness / Effectiveness –Submitted multiple RFT requests of different sizes to 12 RFT instances –Verified fail-over and notification functionality Performance –Evaluate overhead of shared DBMS –Stress test: transfer many small files

12 June 6, 2007TeraGrid '0712 web services container stopped fail-over 60 second fail-over interval

13 June 6, 2007TeraGrid '0713

14 June 6, 2007TeraGrid '0714 4% 6% 10% 14% 22% 43% 57% 82% 95%

15 June 6, 2007TeraGrid '0715 Related Work HAND: Highly Available Dynamic Deployment Infrastructure for GT4 –Migrate services between containers to maintain availability during planned outages –Does not address management of persistent service state or fail-over for unplanned outages myGrid –DBMS persistence of WS-ResourceProperties in Apache WSRF –Points to a general-purpose approach for DBMS-based persistence of stateful WSRF services

16 June 6, 2007TeraGrid '0716 Conclusion Clustering RFT provides load-balancing and fail-over with acceptable performance for small clusters Clustering is a promising approach for application to other grid services

17 June 6, 2007TeraGrid '0717 Future Work Correctly handle replay of FTP deletes Implement credentialRefreshListener Evaluate use of different DBMS solutions Investigate GT4 DBMS persistence in general Investigate use of WS-Naming

18 June 6, 2007TeraGrid '0718 Thanks! Questions? Comments? This material is based upon work supported by the National Science Foundation under Grant No. 0426972. Performance experiments were conducted on computers at the Technology Research, Education, and Commercialization Center (TRECC), a program of the University of Illinois at Urbana-Champaign, funded by the Office of Naval Research and administered by the National Center for Supercomputing Applications. We thank Tom Roney for his assistance with the TRECC cluster. We also thank Ravi Madduri from the Globus project for answering our questions about RFT.


Download ppt "June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based."

Similar presentations


Ads by Google