Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Communicating Machine Features to Batch Jobs GDB, April 6 th 2011 Originally to WLCG MB, March 8 th 2011 June 13 th 2012.
HEPiX Virtualisation Working Group Status, July 9 th 2010
CERN IT Department CH-1211 Genève 23 Switzerland t CERN-IT Plans on Virtualization Ian Bird On behalf of IT WLCG Workshop, 9 th July 2010.
INFSO-RI An On-Demand Dynamic Virtualization Manager Øyvind Valen-Sendstad CERN – IT/GD, ETICS Virtual Node bootstrapper.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
1 port BOSS on Wenjing Wu (IHEP-CC)
BESIII distributed computing and VMDIRAC
CERN IT Department CH-1211 Genève 23 Switzerland t ITIL at CERN Tony Cass HEPiX LBL, 29 th October 2009.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
WLCG Cloud Traceability Working Group face to face report Ian Collier 11 February 2015.
WLCG Networking Tony Cass, Edoardo Martelli 11 th April 2015.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
Cloud Status Laurence Field IT/SDC 09/09/2014. Cloud Date Title 2 SaaS PaaS IaaS VMs on demand.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Update on Windows 7 at CERN & Remote Desktop.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 November 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN-IT Update Ian Bird On behalf of IT Multi-core and Virtualisation Workshop,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud Security - what is needed Linda Cornwall (STFC) and the.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
Security Policy Update David Kelsey UK HEP Sysman, RAL 1 Jul 2011.
CERN IT Department CH-1211 Genève 23 Switzerland t 24x7 Service Support Tony Cass LCG GDB, 24 th November 2009.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
EGI-InSPIRE RI EGI EGI-InSPIRE RI Service Operations Security Policy the new generalised site operations security policy.
HEPiX Virtualisation Working Group Status, February 10 th 2010 April 21 st 2010.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES CVMFS deployment status Ian Collier – STFC Stefan Roiser – CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland t Operating systems and Information Services OIS Proposed Drupal Service Definition IT-OIS.
JSPG Update David Kelsey MWSG, Zurich 31 Mar 2009.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
HEPiX Virtualisation Working Group Status, February 10 th 2010 April 21 st 2010 May 12 th 2010.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Draft Security Virtualisation Policy (for Romain Wartel – CERN) EGI Technical.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Improving resilience of T0 grid services Manuel Guijarro.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud and Software Vulnerabilities Linda Cornwall, STFC 20.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
The HEPiX Virtualisation Working Group Towards a Grid of Clouds Tony Cass CHEP 2012 May 24 th 2012.
Virtual Machines on BiG Grid INFN Annual Meeting May 2010 Sander Klous, Nikhef.
HEPiX Virtualisation working group Andrea Chierici INFN-CNAF Workshop CCR 2010.
CERN IT Department CH-1211 Genève 23 Switzerland The CERN internal Cloud Sebastien Goasguen, Belmiro Rodrigues Moreira, Ewan Roche, Ulrich.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
CERN IT Department CH-1211 Genève 23 Switzerland Batch virtualization project at CERN The batch virtualization project at CERN Tony Cass,
Laurence Field IT/SDC Cloud Activity Coordination
C Loomis (CNRS/LAL) and V. Floros (GRNET)
Update on revised HEPiX Contextualization
Cloud Challenges C. Loomis (CNRS/LAL) EGI-TF (Amsterdam)
Virtualization and Clouds ATLAS position
HEPiX Virtualisation working group
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
MCSA VCE
PES Lessons learned from large scale LSF scalability tests
Discussions on group meeting
WLCG Collaboration Workshop;
VMDIRAC status Vanessa HAMAR CC-IN2P3.
Presentation transcript:

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12

CERN.ch The HEPiX virtualisation working group  The HEPiX virtualisation working group was formed to facilitate the instantiation of user-generated virtual machine images at HEPiX (and WLCG) sites.  Users were expressing such a wish in 2008/9, but sites were worried about issues such as uncontrolled root access and the maintenance of the traceability logs required by Grid security policies. 2 This, at least, is still an issue.

CERN.ch Image endorsement  The HEPiX VWG developed a policy that introduced the concept of image endorsers: people who would guarantee that generated images could be used safely at sites.  Amongst other things, such images would –have no embedded user credentials, and –enable sites to contextualise the images to enable the required logging and make other necessary customisations. »Sites agree, however, not to modify the software environment of the image.  Sites are free to trust (or not) specific image endorsers but, if they do trust someone in this role, it is expected that any images endorsed by this person can be used at that site without the need for inspection or manual approval.  The HEPiX VWG policy became the basis of an approved JSPG policy document, “Policy Trusted Virtual Machines”. 3

CERN.ch Current Status  The endorsement policy is agreed.  Technical arrangements have been defined for –image contextualisation »these are compatible with EC2/OpenNebula/OpenStack –exchange of information between the site infrastructure and a running virtual machine »e.g. remaining lifetime, that the virtual machine can be terminated, …  A framework for image endorsers to publish and distribute images has been developed. –This has been integrated with StratusLab’s marketplace at LAL and is being integrated with OpenStack Glance at CERN.  CERNVM images are compatible with the HEPiX VWG policies –and there has been a security review of the underlying technology. 4 Many thanks to Owen, Michel, Belmiro & Ulrich HEPiX vwg model (and s/w) endorsed by the EGI federated cloud task force.

CERN.ch Job done then. What now? 5

CERN IT Department CH-1211 Genève 23 Switzerland How this could be used Central Task Queue Site A Site B Site C Shared Image Repository (VMIC) User VO service Instance requests Commercial cloud Payload pull Image maintainer Cloud bursting Slide courtesy of Ulrich Schwickerath

CERN IT Department CH-1211 Genève 23 Switzerland t A Vision for Virtualisation in WLCG Tony Cass WLCG GDB, 9/9/9

CERN IT Department CH-1211 Genève 23 Switzerland t Goals Enable experiments/users to choose environment for job execution. Ensure sites have control/traceability over resource usage. Virtualisation Vision- 8

CERN IT Department CH-1211 Genève 23 Switzerland t Approach Step-by-step: Build on – established successes – established trust But end goal in view. Prepare for this now with – technical agreements/developments – user behaviour (especially explicit statement of resource requirements) Virtualisation Vision- 9

CERN IT Department CH-1211 Genève 23 Switzerland t Approach Five steps Steps 1-3 – realistic – relatively uncontroversial(?) – achievable by end-2010? Steps 4 & 5 – kite-flying – probably controversial – interesting Virtualisation Vision- 10

CERN IT Department CH-1211 Genève 23 Switzerland t Step 1 Users can choose between virtual images created at sites. Not really any different from now; could be rephrased “sites provide virtual machines for job execution, not real hardware”. Key issue is (full) understanding of resource requirements – OS type, memory, (range of) #cores,... Virtualisation Vision- 11 Not done. Sites may be using virtual machines but this is transparent to users. And I’m not sure we’re any nearer a negotiation on core needs. Let’s just forget this step now.

CERN IT Department CH-1211 Genève 23 Switzerland t Step 2 Distribution of virtual machine images between sites (or from CERN...). – Image limited to minimalist operating system (SL4/5/6...) Requires – transparent process for image generation guaranteeing content – mechanism for sites to hook into local monitoring and batch scheduling. – trusted and verifiable method of image distribution Virtualisation Vision- 12 HEPiX Not done but could be.

CERN IT Department CH-1211 Genève 23 Switzerland t Step 3 Distributed virtual image includes experiment software environment – So users can choose ATLAS version X on OS Y. Requires “transparent process for image generation” to be extended to include experiment software. – Snapshot of experiment build servers at CERN? Removes need for pilot jobs to verify (or create) correct environment. Virtualisation Vision- 13 CVMFS delivers this.

CERN IT Department CH-1211 Genève 23 Switzerland t What about CernVM? Instantiation of CernVM machines being discussed between IT and PH teams; could be an option at CERN. But scalability and verifiability of CernVM distribution for widespread use as remote batch image is far from evident. – Not excluded, but more likely after successful experience with static images. Virtualisation Vision- 14 We took too long (not) testing static images! This works…

CERN IT Department CH-1211 Genève 23 Switzerland t Step 4 Distributed virtual image includes client to connect directly to experiment pilot job framework (Dirac, PanDA). Initially with virtual machine images instantiated according to jobs arriving at sites. Later, sites instantiate virtual machines according to observed load and local policy – Lots of busy ATLAS machines? Start more... Requires some way for pilot job frameworks to know (remaining) lifetime of virtual machine. – VM unlikely to be updated (security patches...), so lifetime will be limited. Virtualisation Vision- 15 Let’s work on this now Let’s work on this together now from cvmfs

CERN.ch Step 4 issues  Moving credentials into VM images  What role for pilot factories?  Can we avoid queues of virtual machine instantiation requests at sites?  How to streamline (minimise…) communications between sites and experiments? …… 16

Let the discussion begin!

CERN IT Department CH-1211 Genève 23 Switzerland t Step 5 Experiment pilot job frameworks replaced by commercial/public domain schedulers. – Virtual LSF cluster for ATLAS – Virtual SGE cluster for CMS –... Virtualisation Vision- 18 SLURM today?