Presentation on theme: "1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27."— Presentation transcript:
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27
2 Outline ATLAS computing model (PanDA) Extending ATLAS computing model to use Cloud computing resources Challenges Solution Work Done
3 1.Submit jobs to PanDA server 2.Pilots are submitted to work nodes 3.Pilot checks environment, fetch jobs from PanDA server Storage Element Logical File Catalog 4.Pilot upload and register output files after job done 5.Pilot updates job status to PanDA server 6. PanDA server managers the final data transfer PanDA - the Production and Distributed Analysis system for the ATLAS Experiment
4 Extending ATLAS computing model to use Cloud Computing resources What are Clouds (in nowadays common terms)? Virtualized computing resources provided by academic and commercial institutions (e.g. CERN lxcloud, Amazon EC2) The resources provided by users participating in volunteer computing projects (e.g. BOINC) The goal: Run ATLAS production jobs on Cloud Computing resources.
5 Challenges! Transparency: users and production operators should not notice the difference The whole set of Cloud resources should appear to PanDA server as just another Grid site Credentials (which are essential for the functioning of PanDA pilot) can not be brought into the ‘untrusted’ environment (e.g. to the machines of the volunteers)
6 Solve the challenge using CernVM CernVM Provides a lightweight virtual machine image containing the applications of LHC experiments The application software is distributed through HTTP based content delivery network and is cached locally Provides Co-Pilot: a framework for the delivery and execution of the workload on remote virtual machines
7 Co-Pilot Job Manager Co-Pilot Storage Manager Storage Element Logical File Catalog Co-Pilot Client 1. submit PanDA job 2. submit Co-Pilot job 3. Agent get a Co-Pilot job which launches the PanDA pilot 4. Pilot fetch PanDA job and runs it 5. uploads output to temporary storage after job finished 6. uploads and register output files 7 update job final status to PanDA server Cloud resources provided through VMs running Co-Pilot Agent CernVM Co-Pilot Integration!
8 WorkDone (1) Setup CERNVM site (part of ATLAS Grid infrastructure) Is a dynamic virtual cluster formed by virtual machines running CernVM Co-Pilot Agents Is configured according to ATLAS computing conventions Appears to ATLAS Grid central services as a Tier 2 site
9 Work Done(2) Adaptation of PanDA Pilot: Adding support for the heterogeneous structure of the software repository Adding support for saving job output metadata and job status files Development of Co-Pilot Storage Manager A component running in the trusted environment and acting as a proxy between Co-Pilot agents and PanDA Grid services
12 Solve the challenge using CernVM CernVM Co-Pilot is to help to run ATLAS PanDA job in a non-credentialed computing environment. CernVM Co-Pilot Components: Co-Pilot client: submit jobs to Co-Pilot JobManager Co-Pilot Server: Co-Pilot Job Manager: dispatch jobs to Co-Pilot Agents Co-Pilot Storage sManager: upload /register output files, change job status with credential Co-Pilot Agent: runs the jobs on non-credentialed computer nodes
13 Ingredients CernVM Provides an ultralight image for different hyper-visors ATLAS software is distributed by CVMFS, cached locally Co-Pilot Co-Pilot Agent is distributed with CernVM image schedule jobs to CernVM virtual clusters
14 Co-Pilot Storage Manager How CoPilot SM(Storage Manager) works? receives “JobDone” message from Co-Pilot agent(JobID is included) SM calls the Co-Pilot_Data_Mover which extracts metadata of job output from pilot log, upload files to designated SE and register them to designated LFC catalog SM verify the status of file uploading and registration SM calls Co-Pilot_Job_Status_Updater which update the status to PanDA server(finished or failed) Both Co-Pilot_Data_Mover and Co- Pilot_Job_Status_Updater are python scripts using libraries from pilot source code