1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27.

Slides:



Advertisements
Similar presentations
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Advertisements

Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
Volunteer Computing Laurence Field IT/SDC 21 November 2014.
A Computation Management Agent for Multi-Institutional Grids
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The Prototype Laurence Field IT/SDC 11 November 2014.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
System Center 2012 Setup The components of system center App Controller Data Protection Manager Operations Manager Orchestrator Service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep ,
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
1 port BOSS on Wenjing Wu (IHEP-CC)
BESIII distributed computing and VMDIRAC
FESR Consorzio COMETA Grid Introduction and gLite Overview Corso di formazione sul Calcolo Parallelo ad Alte Prestazioni (edizione.
The Data Bridge Laurence Field IT/SDC 6 March 2015.
Introduction to CVMFS A way to distribute HEP software on cloud Tian Yan (IHEP Computing Center, BESIIICGEM Cloud Computing Summer School.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
The EDGeS project receives Community research funding 1 SG-DG Bridges Zoltán Farkas, MTA SZTAKI.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Successful Common Projects: Structures and Processes WLCG Management.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.
1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Predrag Buncic (CERN/PH-SFT) Virtualizing LHC Applications.
1 Volunteer Computing at CERN past, present and future Ben Segal / CERN (describing the work of many people at CERN and elsewhere ) White Area lecture.
The GridPP DIRAC project DIRAC for non-LHC communities.
1 BOINC + CernVM Ben Segal / CERN (describing the work of many people at CERN and elsewhere ) Pre-GDB on Volunteer Computing CERN, November 11, 2014.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen Budapest
Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
OpenNebula: Experience at SZTAKI Peter Kacsuk, Sandor Acs, Mark Gergely, Jozsef Kovacs MTA SZTAKI EGI CF Helsinki.
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
The GridPP DIRAC project DIRAC for non-LHC communities.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Cloud Computing Application in High Energy Physics Yaodong Cheng IHEP, CAS
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,
Volunteer Clouds and Citizen Cyberscience for LHC Physics Artem Harutyunyan / CERN Carlos Aguado Sanchez / CERN, Jakob Blomer / CERN, Predrag Buncic /
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Predrag Buncic, CERN/PH-SFT The Future of CernVM.
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
Virtualization and Clouds ATLAS position
Virtualisation for NA49/NA61
NA61/NA49 virtualisation:
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Belle II Physics Analysis Center at TIFR
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
Introduction to CVMFS A way to distribute HEP software on cloud
Virtualisation for NA49/NA61
TYPES OF SERVER. TYPES OF SERVER What is a server.
WLCG Collaboration Workshop;
Cloud Computing R&D Proposal
Ivan Reid (Brunel University London/CMS)
Exploit the massive Volunteer Computing resource for HEP computation
Presentation transcript:

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

2 Outline ATLAS computing model (PanDA) Extending ATLAS computing model to use Cloud computing resources Challenges Solution Work Done

3 1.Submit jobs to PanDA server 2.Pilots are submitted to work nodes 3.Pilot checks environment, fetch jobs from PanDA server Storage Element Logical File Catalog 4.Pilot upload and register output files after job done 5.Pilot updates job status to PanDA server 6. PanDA server managers the final data transfer PanDA - the Production and Distributed Analysis system for the ATLAS Experiment

4 Extending ATLAS computing model to use Cloud Computing resources What are Clouds (in nowadays common terms)? Virtualized computing resources provided by academic and commercial institutions (e.g. CERN lxcloud, Amazon EC2) The resources provided by users participating in volunteer computing projects (e.g. BOINC) The goal: Run ATLAS production jobs on Cloud Computing resources.

5 Challenges! Transparency: users and production operators should not notice the difference The whole set of Cloud resources should appear to PanDA server as just another Grid site Credentials (which are essential for the functioning of PanDA pilot) can not be brought into the ‘untrusted’ environment (e.g. to the machines of the volunteers)

6 Solve the challenge using CernVM CernVM Provides a lightweight virtual machine image containing the applications of LHC experiments The application software is distributed through HTTP based content delivery network and is cached locally Provides Co-Pilot: a framework for the delivery and execution of the workload on remote virtual machines

7 Co-Pilot Job Manager Co-Pilot Storage Manager Storage Element Logical File Catalog Co-Pilot Client 1. submit PanDA job 2. submit Co-Pilot job 3. Agent get a Co-Pilot job which launches the PanDA pilot 4. Pilot fetch PanDA job and runs it 5. uploads output to temporary storage after job finished 6. uploads and register output files 7 update job final status to PanDA server Cloud resources provided through VMs running Co-Pilot Agent CernVM Co-Pilot Integration!

8 WorkDone (1) Setup CERNVM site (part of ATLAS Grid infrastructure) Is a dynamic virtual cluster formed by virtual machines running CernVM Co-Pilot Agents Is configured according to ATLAS computing conventions Appears to ATLAS Grid central services as a Tier 2 site

9 Work Done(2) Adaptation of PanDA Pilot: Adding support for the heterogeneous structure of the software repository Adding support for saving job output metadata and job status files Development of Co-Pilot Storage Manager A component running in the trusted environment and acting as a proxy between Co-Pilot agents and PanDA Grid services

10

11 Thanks!

12 Solve the challenge using CernVM CernVM Co-Pilot is to help to run ATLAS PanDA job in a non-credentialed computing environment. CernVM Co-Pilot Components: Co-Pilot client: submit jobs to Co-Pilot JobManager Co-Pilot Server: Co-Pilot Job Manager: dispatch jobs to Co-Pilot Agents Co-Pilot Storage sManager: upload /register output files, change job status with credential Co-Pilot Agent: runs the jobs on non-credentialed computer nodes

13 Ingredients CernVM Provides an ultralight image for different hyper-visors ATLAS software is distributed by CVMFS, cached locally Co-Pilot Co-Pilot Agent is distributed with CernVM image schedule jobs to CernVM virtual clusters

14 Co-Pilot Storage Manager How CoPilot SM(Storage Manager) works? receives “JobDone” message from Co-Pilot agent(JobID is included) SM calls the Co-Pilot_Data_Mover which extracts metadata of job output from pilot log, upload files to designated SE and register them to designated LFC catalog SM verify the status of file uploading and registration SM calls Co-Pilot_Job_Status_Updater which update the status to PanDA server(finished or failed) Both Co-Pilot_Data_Mover and Co- Pilot_Job_Status_Updater are python scripts using libraries from pilot source code