INFSO-RI Service Oriented Architectures - Introduction. Managing Grid Resources with Condor Web Services Corina Stratan
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Outline Service Oriented Architecture – Introduction Web Services – Standards and protocols Condor Condor BirdBath – Web Service interface for Condor
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July SOA – Service Oriented Architectures Current requirements in the IT industry: –Collaboration –Sharing of data and resources –Structuring large applications in small blocks that can be re-used (services) SOA definition from OASIS: A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations. Service Oriented Architecture - components: –Service Providers –Service Requestors –Service Brokers
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July SOA Characteristics Interoperability among different platforms and programming languages The functional blocks are encapsulated in components that function as services Separation of the interface from the implementation For complex applications, the control of the execution flow (workflow) is separated from the services The services can be added/removed dynamically The correspondence between interfaces and implementation is done through configuration files and can be adapted
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Outline Service Oriented Architecture – Introduction Web Services – Standards and protocols Condor Condor BirdBath – Web Service interface for Condor
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Web Services – Brief Overview Provide an implementation for the service oriented architecture Software components that provide services over the Internet and communicate through standardized XML messages A newer form of Remote Procedure Call, with the advantage of standardization Designed to be used by applications, not by humans Self-describing (public interfaces specified with the aid of WSDL) Can be published in “Yellow Pages”–like registries
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Web Services Protocol Stack Service Transport (HTTP, FTP, …) XML Messanging (XML- RPC, SOAP) Service Description (WSDL) Service Discovery (UDDI) Transports messages between applications Encodes messages in the XML format Describes the public interface of the service Common registry for web services
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Web Services Scenario Web Services Registry Service Provider Service Requestor WSDL SOAP Service published to the registry 2 – The requestor obtains the public interface of the service 3 – The requestor invokes the service and receives the result
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Software Platforms for Web Services 1998 – 1999: Microsoft starts developing the SOAP standard 2000: IBM extends the SOAP specifications (SOAP 1.1) 2001: the development of the SOAP and WSDL standards is coordinated by W3C Software platforms: –Microsoft: web services support included in the.NET platform –IBM: developed the IBM-SOAP platform, which was subsequently acquired by Apache and became Apache SOAP –Apache SOAP was followed by Apache Axis –IBM WebSphere contains a web services engine
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Axis – Apache Extensible Integration System Open-source framework for web services programming Based on the JAX-RPC/JAX-WS APIs Includes a simple stand-alone web services server Can be integrated in application servers like Apache Tomcat Provides tools for generating WSDL descriptions from Java classes, and for generating Java classes from WSDL descriptions (java2wsdl, wsdl2java)
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Why Web Services? Standardization Interoperability Open standards and protocols Based on HTTP: no problems with the firewalls Loosely-coupled components – approach specific to distributed computing
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Outline Service Oriented Architecture – Introduction Web Services – Standards and protocols Condor Condor BirdBath – Web Service interface for Condor
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Condor - Introduction CONDOR: workload management system for computational intensive jobs (High Throughput Computing) Availabe since 1984 Functionality is similar with the classical batch queuing systems (PBS, LSF,...) + New concepts: opportunistic computing, resource classification
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Resource Management in Condor Opportunistic computing: Condor can utilize non- dedicated machines, taking advantage of the time when the owner does not work on them The owner of each machine can establish the policy for running jobs on the machine, e.g.: –If the keyboard/mouse have not been used for 15min –During the nights or the week-ends –Only jobs that require less than x MB of RAM Condor motto: Leave the owner in control, regardless of the cost.
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Checkpoints The jobs’ memory images can be saved periodically in checkpoint files Better fault tolerance Checkpoints can be used to move a job on another machine without restarting it from the beginning The checkpoint files may be stored on the execution machine or on a dedicated server
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Resource classification - ClassAds Classified Ads = mechanism that allows a machine to publish the available resources and a job submitter to publish the requirements for execution Typical attributes for a machine: no. of CPUs, CPU types and benchmark results, the amount of available memory/disk space, the restrictions for running jobs Typical attributes for a job: the needed architecture and operating system, the minimum/maximum amount of memory, the estimated execution time
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Machine ClassAd - Example Name = "wn1.rogrid.pub.ro" Machine = "wn1.rogrid.pub.ro" CpuBusy = ((LoadAvg - CondorLoadAvg) >= ) COLLECTOR_HOST_STRING = "lcfg.rogrid.pub.ro" CondorVersion = "$CondorVersion: Jun $" VirtualMachineID = 1 Disk = LoadAvg = Memory = 496 Cpus = 1 StartdIpAddr = " " Arch = "INTEL" OpSys = "LINUX" KFlops = TotalLoadAvg = TotalCondorLoadAvg = State = "Unclaimed" Start = TRUE Requirements = START …
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Job ClassAd - Example MyType = "Job" ClusterId = 1511 Owner = "cmssoft" ExitBySignal = FALSE Cmd = "/home/cmssoft/.globus/.gass_cache/local/md5/b7/5f0d6443e430b2e8b 56324a66524aa/md5/8d/d25385ef07f36cf8cb05e2290bbcc1/data" WantCheckpoint = FALSE Out = "/home/cmssoft/.globus/job/tier2b.cacr.caltech.edu/ /stdout" Err = "/home/cmssoft/.globus/job/tier2b.cacr.caltech.edu/ /stderr" ShouldTransferFiles = "NO" DiskUsage = 8 Requirements = (OpSys == "LINUX" && Arch == "INTEL") && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && (TARGET.FileSystemDomain == MY.FileSystemDomain) Args = "/raid1/OSG-app/cmssoft /raid1/OSG-app/cmssoft/cms /raid1/OSG-app/cmssoft/cmsi OSCAR_3_9_5 CIT_CMS_PG slc3_ia32_gcc323 install_cms_project_1" …
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Matchmaking Matchmaker AgentResource 1. Job ClassAd 1. Machine ClassAd 2. Matchmaking algorithm 3. Notification 4. Resource request
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July The Condor Pool Condor Pool = cluster of machines + the jobs executing on them Machines in a Condor pool: –central manager - unique –execute machine –submit machine A machine can have multiple roles
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Condor Components Condor Master: manages all the Condor daemons that are running on a machine Condor Startd: runs on the execute machines; starts the jobs and ensures that the owner’s policy is respected Condor Schedd: runs on submit machines; keeps a queue with the submitted jobs and publishes their ClassAds Condor Collector: collects information about all the resources from the pool Condor Negotiator: executes the matchmaking algorithm, taking into account the users’ priorities
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Condor Universes Specify the execution environment for jobs Possible universes: –Standard –Vanilla –MPI –PVM –Globus –Java
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Standard & Vanilla Standard – typical universe, Vanilla - “minimal” universe Standard: jobs running in Standard universe can be checkpointed and migrated, but the programs must be link-edited with a Condor library Vanilla: any executable program can be run in Vanilla universe, but checkpointing and migration are not possible
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Condor Shadow Process that is started on the submit machine, in parallel with the actual execution of the job on the execute machine Used to access the environment variables and the local files Each system call is redirected to the shadow Note: because of the shadow processes, even a submit machine may become heavily loaded
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Job execution (command line) Create a job description file Use the condor_submit command: condor_submit calcule.cmd Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 14.
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Job Description File - Example executable = calcule.condor universe = standard output = calcule.out error = calcule.err log = calcule.log arguments = should_transfer_files = NO queue 1
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July The condor_status command condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime lcfg.rogrid.p LINUX INTEL Unclaimed Idle :36:31 wn1.rogrid.pu LINUX INTEL Unclaimed Idle :18:30 Machines Owner Claimed Unclaimed Matched Preempting INTEL/LINUX Total monalisa]$ condor_status -direct tier2b.cacr.caltech.edu Name OpSys Arch State Activity LoadAv Mem ActvtyTime LINUX INTEL Owner Idle :29:00 LINUX INTEL Owner Idle :29:00 LINUX INTEL Owner Idle :29:00 LINUX INTEL Owner Idle :29:00 Machines Owner Claimed Unclaimed Matched Preempting INTEL/LINUX Total
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July The condor_q command monalisa]$ condor_q -- Submitter: tier2b.cacr.caltech.edu : : tier2b.cacr.caltech.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD uscms01 7/8 16: :22:07 I sleep uscms01 7/11 14: :29:20 I tcsh -c id cmssoft 7/15 08: :02:38 R data /raid1/OSG-ap sdss 7/18 12: :52:48 R runRemoteJob.sh sdss 7/18 15: :18:50 R runRemoteJob.sh 04 5 jobs; 2 idle, 3 running, 0 held
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Other Condor Features Condor DAGMan: support for executing multiple jobs with dependencies (workflows) Condor Quill: builds and supports a mirror database of a Condor queue Condor-G: task broker that can be used as a front- end to a grid A C D E B
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Outline Service Oriented Architecture – Introduction Web Services – Standards and protocols Condor Condor BirdBath – Web Service interface for Condor
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Condor & Web Services Condor BirdBath: adds web service interfaces to the Condor daemons Web services allow external applications to interact with Condor – job delegation model Implementation based on Apache Axis To use BirdBath for a Condor server: –Modify the Condor configuration file To create a BirdBath client: –Set the environment variables –Use the wsdl2java tool to generate helper classes –Write the client code
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Services provided by BirdBath For the Condor scheduler (schedd daemon): –Job submission –Data transfer and management –Job Monitoring –Queue management operations –Transactions For the Condor collector: –Querying the available classAds
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Resources - Condor web sitehttp://cs.wisc.edu/condor - Condor BirdBath pagehttp://cs.wisc.edu/condor/birdbath wscondor.html?S_TACT=105AGX07&S_CMP=HP - Condor Web Services tutorialhttp://www-128.ibm.com/developerworks/edu/gr-dw-gr- wscondor.html?S_TACT=105AGX07&S_CMP=HP /webservicefaqs.html - Top Ten FAQs for Web Serviceshttp:// /webservicefaqs.html – collection of publicly available web serviceshttp://
Enabling Grids for E-sciencE INFSO-RI GridInitiative, University “Politehnica” of Bucharest July Proposal for GridInitiative Activities Install Condor Enable the Condor WebServices interface Create and test a client for Condor WebServices Model job workflows with Condor DAGMan