Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
INFSO-RI An On-Demand Dynamic Virtualization Manager Øyvind Valen-Sendstad CERN – IT/GD, ETICS Virtual Node bootstrapper.
Hosted VMM Architecture Advantages: –Installs and runs like an application –Portable – host OS does I/O access –Coexists with applications running on.
Virtualization for Cloud Computing
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
To run the program: To run the program: You need the OS: You need the OS:
CERN IT Department CH-1211 Genève 23 Switzerland t Virtualization with Windows at CERN Juraj Sucik, Emmanuel Ormancey Internet Services Group.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Improving Network I/O Virtualization for Cloud Computing.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
KOLKATA Grid Site Name :- IN-DAE-VECC-02Monalisa Name:- Kolkata-Cream VO :- ALICECity:- KOLKATACountry :- INDIA Shown many data transfers.
Alice LCG Task Force Meeting 16 Oct 2008Alice LCG Task Force Meeting 16 Oct 2008 BARBET Jean-Michel - 1/20BARBET Jean-Michel - 1/20 Alice T1/T2 Tutorial.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
BaBar Cluster Had been unstable mainly because of failing disks Very few (
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
Predrag Buncic (CERN/PH-SFT) CernVM Status. CERN, 24/10/ Virtualization R&D (WP9)  The aim of WP9 is to provide a complete, portable and easy.
Virtual Server Server Self Service Center (S3C) JI July.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Intro To Virtualization Mohammed Morsi
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
CERN Openlab Openlab II virtualization developments Havard Bjerke.
Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009
Virtualization for Cloud Computing
Guide to Operating Systems, 5th Edition
Dynamic Extension of the INFN Tier-1 on external resources
CSC227: Operating Systems
ALICE & Clouds GDB Meeting 15/01/2013
Use of HLT farm and Clouds in ALICE
Seamless Guest OS's and more!
Blueprint of Persistent Infrastructure as a Service
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Dag Toppe Larsen UiB/CERN CERN,
Torrent-based software distribution
Kolkata Status and Plan
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
Luca dell’Agnello INFN-CNAF
STORM & GPFS on Tier-2 Milan
GSIAF "CAF" experience at GSI
Статус ГРИД-кластера ИЯФ СО РАН.
Torrent-based software distribution
PES Lessons learned from large scale LSF scalability tests
Virtualization overview
Storage elements discovery
Virtualization in the gLite Grid Middleware software process
AliEn central services (structure and operation)
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Overview Introduction VPS Understanding VPS Architecture
Guide to Operating Systems, 5th Edition
Alice Software Demonstration
Virtualization Dr. S. R. Ahmed.
Presentation transcript:

Virtual machines ALICE

2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes) – GSI – Subatech – FZK

3 Services at CERN 3 hosts, 32 cores, 64GB RAM, 28 fast disks – Ubuntu & VirtualBox 25 guests – Build servers: (SLC5 32 & 64bit, Ubuntu) * (AliEn, AliROOT) – Experimental build servers for development and testing – 12 other service machines Vanilla operating systems everywhere

4 (Software distribution) SLC5 libc-compatible binaries – Packaged with all dependencies Largest set to install: 433MB – 43MB : AliEn + system dependencies – 270MB : AliROOT – 120MB : ROOT Delivered to the WNs either as – Shared file system (NFS) - unpacked – Torrent - tarballs Same binaries available to both jobs and users – Very easy to switch from local to PROOF to Grid

5 Services at CERN Excellent solution for compacting rack space (power, investment) – Good environment for building, testing, prototyping Applies only for non-demanding services – Friendly enough to coexist with the rest of the guests on the same host – Some hiccups Unreliable network throughput Time flow issues with SLC default kernels under VB

6 Services at CERN Large number of concurrent VMs require good quality hardware – Replacement of the entire storage plane in two high-end servers to be able to support the current number of guests – Would be there also for processes on the physical host

7 Worker nodes at sites - CNAF Francesco Noferini Virtual machines used for WNs and services – Not for storage, stability issues with GPFS KVM : 1 VM / core, up to 16 VMs per machine Happy with the network performance – However ping between the two VoBoxes: rtt min/avg/max/mdev = 9.166/11.290/15.837/1.360 ms Policies define the OS flavor for each experiment (role) and the appropriate VM images are started – Though recycling them when possible

8 Worker nodes at sites - issues SpecINT conversion factor cannot be determined $ cat /proc/cpuinfo cpu family: 6 model: 6 model name: QEMU Virtual CPU version stepping: 3 cpu Mhz: cache size: 32 KB Correct accounting not possible Even if it were published, could it be trusted? (CPU time in particular) Pentium II Mendocino(300 to 500MHz)

9 Worker nodes at sites – issues Fixed number of machines with fixed resources => inflexible limits on the jobs – Jobs which could otherwise succeed are killed for overshooting their resources (even by a small fraction) – Not possible to dynamically adjust the VM resources with regards to the available resources on the host

10 Site services – GSI experience Mykhaylo Zynovyev, Victor Penso AliEn and gLite services are running stable on VMs since 2006 Tools: Xen/KVM, OpenNebula, Chef, Lustre, Torque Tests with ALICE analysis trains running on VMs have shown acceptable performance – Trains are CPU bound – Data is accessed from Lustre – Performance overhead is within 10%

11 Site services – GSI plans Provide users with infrastructure to submit and manage private virtual analysis clusters on demand – To be used with PoD ( for interactive processinghttp://pod.gsi.de/ – To be used with job scheduler for batch processing Make use of OCCI (Open Cloud Computing Interface) API for VM management tools

12 Site services - Subatech Jean-Michel Barbet 2 VMWare servers mounting the same SAN storage – Allowing hot-move of the running machines, high availability even when upgrading hosts 10 guests (LCG & CREAM CEs, BDII, DPM head, Quattor, PBS, AliEn VoBox) – Disk servers and MySQL – only on physical hosts due to poor performance on VMs

13 Site services - Subatech Very easy to clone and test new environment; snapshots as backup before every significant change “Pick and choose” policy on what to run on VMs and physical hosts

14 Site services - FZK Artem Trunov Experimenting with the xrootd redirector – KVM, 2 hosts, 2 guests, shared GPFS filesystem for the VM images – Light service, so no problems expected

15 Bottom line We like a lot the virtual machines – Vanilla virtualization is an excellent tool to build diverse (OS) build and test systems – Careful selection of hardware to avoid overloads – The technology is used where applicable – not a 'universal solution to all problems' Site services and WNs – transparency is a must – We'd rather not know that something is running on a VM

16 Bottom line Site services and WNs – Multitude of adopted virtualization platforms (with their positive and negative sides) – Mastering storage from a VM is still an open issue, especially data servers – Generally accepted for services that are not I/O demanding, also not for Dbs – The adoption of virtualization technology is very uneven and does not depend on the site size (T1 / T2)