Presentation is loading. Please wait.

Presentation is loading. Please wait.

Globus Toolkit 4: Current Status and Futures Stuart Martin Argonne National Lab.

Similar presentations


Presentation on theme: "Globus Toolkit 4: Current Status and Futures Stuart Martin Argonne National Lab."— Presentation transcript:

1 Globus Toolkit 4: Current Status and Futures Stuart Martin smartin@mcs.anl.gov Argonne National Lab

2 2 Globus is Service-Oriented Infrastructure Technology l Software for service-oriented infrastructure –Service enable new & existing resources –E.g., GRAM on computer, GridFTP on storage system, custom application service –Uniform abstractions & mechanisms l Tools to build applications that exploit service- oriented infrastructure –Registries, security, data management, … l Open source & open standards –Each empowers the other –eg – monitoring across different protocols is hard l Enabler of a rich tool & service ecosystem

3 3 Globus Toolkit V4.0 l Major release on April 29 th 2005 l Precious fifteen months spent on design, development, and testing –1.8M lines of code –Major contributions from five institutions –Hundreds of millions of service calls executed over weeks of continuous operation l Significant improvements over GT3 code base in all dimensions

4 4 Our Goals for GT4 l Usability, reliability, scalability, … –Web service components have quality equal or superior to pre-WS components –Documentation at acceptable quality level l Consistency with latest standards (WS-*, WSRF, WS-N, etc.) and Apache platform –WS-I Basic (Security) Profile compliant l New components, platforms, languages –And links to larger Globus ecosystem

5 5

6 GT4 Documentation is Much Improved!

7 7 GRAM l Common WS interface to schedulers –Unix, Condor, LSF, PBS, SGE, … l More generally: interface for process execution management –Lay down execution environment –Stage data –Monitor & manage lifecycle –Kill it, clean up l A basis for application-driven provisioning

8 8 GT4 WS GRAM l 2nd-generation WS implementation – optimized for performance, stability, scalability l Streamlined critical path –Use only what you need l Flexible credential management –Credential cache & delegation service l GridFTP & RFT used for data operations –Data staging & streaming output –Eliminates redundant GASS code l Single and multi-job support

9 9 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events

10 10 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: Made available to the user application

11 11 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: used to authenticate with RFT

12 12 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Architecture SEG Job events Same delegated credential can be: used to authenticate with GridFTP

13 13 Our Performance Goals “GRAM should add little to no overhead compared to an underlying batch system” –Submit as many jobs to GRAM as is possible to the underlying scheduler >Goal - 10,000 jobs to a batch scheduler >Goal – efficiently fill the process table for fork scheduler –Submit/process jobs as fast to GRAM as is possible to the underlying scheduler >Goal - 1 per second l Sofar, so good

14 14 Design Decisions Design Decisions l Efforts and features towards the goal –Allow job brokers the freedom to optimize >E.g. Condor-G is smarter than globusrun >Protocol steps made optional and shareable –Reduced cost for GRAM service on host >Single WSRF host environment >Better job status monitoring mechanisms l Scheduler Event Generator (SEG) –More scalable/reliable file handling >GridFTP and RFT instead of globus-url-copy >Removal of non-scalable GASS caching l The plan is working –GT4 WS GRAM performs much better than GT3

15 15 4.0 Performance l Throughput –Test: Simple job to fork scheduler (/bin/date); no staging, streaming, or cleanup –~77 jobs/min sustained –~60 jobs/minute with delegation l Long Running test –Ran 500,000+ sequential jobs over 23 days –These included staging, delegation, fork job manager

16 16 4.0 Performance (2) l Concurrency –Job submits to Condor scheduler (long running sleep job); no staging, streaming, or cleanup; no delegation –Current limit is 32,000 jobs due to a Linux directory limit >using multiple sub-directories will resolve this, look for this in 4.2

17 17 Condor-G l Job submission to WS GRAM l Provides credential management to GT delegation service l 1000+ job workflow runs performed –Thanks to Jens Voeckler, Gaurang Mehta, and Jaime Frey! l Still some kinks –Refreshing delegated cred too often –Occasional client-side job delay of > 5 minutes

18 18 Command line programs l globusrun-ws –Submit a single or multi job –Delegate, stream stdout of requested l globus-credential-delegate –Delegate a credential to a remote GT container –Same cred can be used for many GRAM or RFT jobs l wsrf-destroy –Remove/destroy a credential l wsrf-query –Query for compute resource information l globus-job-run-ws (coming soon) –Submit simple jobs without writing an XML JDD

19 19 Short Term Priorities: WS GRAM l Make WS GRAM a “Reliable” service (4.0.x) –Additional controls to limit resource consumption –Out Of Memory (OOM) is not allowed! l Continue to improve performance l WS GRAM version of globus-job-run/submit (4.0.x) l Improved information collection for jobs (4.2) –Nodes allocated by scheduler –Scheduler job ID –Rusage type info l Implement GGF JSDL once finalized

20 20 GridFTP in GT4 l 100% Globus code –No licensing issues –Stable, extensible l IPv6 Support l XIO for different transports l Striping  multi-Gb/sec wide area transport l Pluggable –Front-end: e.g., future WS control channel –Back-end: e.g., HPSS, cluster file systems –Transfer: e.g., UDP, NetBLT transport

21 21 Reliable File Transfer: Third Party Transfer RFT Service RFT Client SOAP Messages Notifications (Optional) Data Channel Protocol Interpreter Master DSI Data Channel Slave DSI IPC Receiver IPC Link Master DSI Protocol Interpreter Data Channel IPC Receiver Slave DSI Data Channel IPC Link GridFTP Server l Fire-and-forget transfer l Web services interface l Many files & directories l Integrated failure recovery

22 22 RFT Performance Stats l Current maximum request size is approx 20,000 entries with a default 64MB heap size. l Infinite transfer - LAN –~120,000 transfers (servers were killed by mistake) –Was a good test. Found a corner case where postgres was not able to perform ~ 3 update queries / sec and was using up CPU l Infinite transfer – WAN – ~67000 transfers (killed because of the same reason as above) l Sloan Digital Sky Survey DR3 archive move –900+K files, 6 TB –Killed the transfer several times for recoverability testing –No human intervention has been required to date

23 23 Short-Term Priorities: Data Management l Concurrency in globus-url-copy l Priorities in RFT l Data replication service l Enhance policy support in data services l Physical file name creation service l Scalable & distributed metadata manager l OGSA-DAI will become a core component

24 24 GT4 Container GT4 Monitoring & Discovery GRAMUser Index GT4 Cont. RFT Index GT4 Container Index GridFTP adapter Registration & WSRF/WSN Access Custom protocols for non-WSRF entities Clients (e.g., WebMDS) Automated registration in container WS-ServiceGroup

25 25 MDS4 Extensibility l Aggregator framework provides –Registration management –Collection of information from Grid Resources –Plug in interface for data access, collection,query, … l WebMDS framework provides for customized display –XSLT transformations

26 26 With a standard deployment, a project can… l Discover needed data from services in order to make job submission or replica selection decisions by querying the VO-wide Index l Evaluate the status of Grid services by looking at the VO-wide WebMDS setup l Be notified when disks are full or other error conditions happen by being on the list of administrators l Individual projects can examine the state of the resources and services of interest to them

27 27 Short-Term Priorities: Information Services l Many more information sources, including gateways to other systems l Automated configuration of monitoring l Specialized monitoring displays l Performance optimization of registry l Archiver service l Helper tools to streamline integration of new information sources

28 28 2005 and Beyond l We have a solid Web services base l We now want to build, on that base, a open source service-oriented infrastructure –Virtualization –New services for provisioning, data management, security, VO management –End-user tools for application development –Etc., etc.

29 29 Next Step Plans l Support! l Actively working with user groups to make sure their deployments are stable l Move everyone from GT2 and GT3 to GT4 l Continue to improve documentation –Goal: every support question gets put into the docs

30 30 THANKS! Questions? Stuart Martin smartin@mcs.anl.gov Argonne National Lab


Download ppt "Globus Toolkit 4: Current Status and Futures Stuart Martin Argonne National Lab."

Similar presentations


Ads by Google