ELASTIC LSF EXTENSION AT CNAF. credits Vincenzo Ciaschini Stefano Dal Pra Andrea Chierici Vladimir Sapunenko Tommaso Boccali.

Slides:



Advertisements
Similar presentations
Ed Duguid with subject: MACE Cloud
Advertisements

Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
Introducing Amazon S3 and EC2 Justin Mason
University of Notre Dame
System Center 2012 R2 Overview
Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
Take your CMS to the cloud to lighten the load Brett Pollak Campus Web Office UC San Diego.
© 2008 AT&T Intellectual Property. All rights reserved. CloudNet: Where VPNs Meet Cloud Computing Flexibly and Dynamically Timothy Wood Kobus van der Merwe,
Computer Hardware Chapter 5. Motherboard 1: Ports 3: Expansion slots 6: Central processing unit.
A comparison between xen and kvm Andrea Chierici Riccardo Veraldi INFN-CNAF.
Private Cloud or Dedicated Hosts Mason Mabardy & Matt Maples.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
VMWare Workstation Installation. Starting Vmware Workstation Go to the start menu and start the VMware Workstation program. *Note: The following instructions.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
 Configuring a vSwitch Cloud Computing (ISM) [NETW1009]
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
+ discussion in Software WG: Monte Carlo production on the Grid + discussion in TDAQ WG: Dedicated server for online services + experts meeting (Thusday.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Adam Duffy Edina Public Schools.  Traditional server ◦ One physical server ◦ One OS ◦ All installed hardware is limited to that one server ◦ If hardware.
Eucalyptus: An Open-source Infrastructure for Cloud Computing Rich Wolski Eucalyptus Systems Inc.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Loomis (CNRS/LAL) M.-E. Bégin (SixSq.
Virtualization for the LHCb Online system CHEP Taipei Dedicato a Zio Renato Enrico Bonaccorsi, (CERN)
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
1 Worker Node Requirements TCO – biggest bang for the buck –Efficiency per $ important (ie cost per unit of work) –Processor speed (faster is not necessarily.
Mastering Windows Network Forensics and Investigation Chapter 17: The Challenges of Cloud Computing and Virtualization.
Catalin Condurache STFC RAL Tier-1 GridPP OPS meeting, 10 March 2015.
VMWare Workstation Installation. Starting Vmware Workstation Go to the start menu and start the VMware Workstation program. *Note: The following instructions.
Scalable and elastic Enterprise scale and performance for the largest workloads Shared- nothing live migration Hyper-V Network.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
Opportunistic Computing Only Knocks Once: Processing at SDSC Ian Fisk FNAL On behalf of the CMS Collaboration.
© 2014 kCura. All rights reserved. vCloud Hybrid Services VMUG
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
Performance analysis comparison Andrea Chierici Virtualization tutorial Catania 1-3 dicember 2010.
WNoDeS – a Grid/Cloud Integration Framework Elisabetta Ronchieri (INFN-CNAF) for the WNoDeS Project
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Update on Computing/Cloud Marco Destefanis Università degli Studi di Torino 1 BESIII Ferrara, Italy October 21, 2014 Stefano Bagnasco, Flavio Astorino,
Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
1 1 – Statistical information about our resource centers; ; 2 – Basic infrastructure of the Tier-1 & 2 centers; 3 – Some words about the future.
Using the CMS Higher Level Trigger Farm as a Cloud Resource David Colling Imperial College London.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
A comparison between xen and kvm Andrea Chierici Riccardo Veraldi INFN-CNAF CCR 2009.
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
Scalable containers with Apache Mesos and DC/OS
DPM at ATLAS sites and testbeds in Italy
Matt Lemons Nate Mayotte
AWS Integration in Distributed Computing
Operations and plans - Polish sites
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
Quattor in Amazon Cloud
2TCloud - Veeam Cloud Connect
3.2 Virtualisation.
Design and Implement Cloud Data Platform Solutions
2V0-622 Dumps
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
VMware VM Replication for High Availability in Vembu VMBackup
2V0-622 PDF Dumps Free
Robert Down & Pranay Sadarangani Nov 8th 2011
Exploring Multi-Core on
Windows Azure SDK 1.7 and New Features
Presentation transcript:

ELASTIC LSF EXTENSION AT CNAF

credits Vincenzo Ciaschini Stefano Dal Pra Andrea Chierici Vladimir Sapunenko Tommaso Boccali

idea Try and extend CNAF’s LSF to opportunistic resources / commercial resources Opportunistic: Tier3 Bologna, with Openstack resources Outside CNAF’s FW Commercial: Tests with Aruba (second italian cloud provider) I speak just about the second

Aruba ~20 MW installed in Italy O(50) MW installed in France, Germany, UK, CZ Not in the Computing Crunching business, mostly serving as DB/Web/Etc provider for Big Clients Centers are big as CPU installment, quite modest as storage At or below a CMS Tier1 Using VMWare vSphere are virtualization engine We are playing with the center in Arezzo (~150 km from CNAF), which is O(6 MW), and is connected 4x20 Gbit/s links to commercial network providers

Key points We do not have to worry about job’s lifetime vSphere can decrease the “virtual clock” of a machine to O(100 MHz) No jobs die, they become very slow; no sockets get closed RAM and (local VM) disk are not problematic, the host machines are very rich At the moment, no “fee” for networking I can only guess up to the point we disturb the real activities CPU usage by “real customers” is O(10%); we are trying to use the rest and accept to be clocked down when the real customer needs CPU cycles.

Elastic Expansion of CNAF Each machine gets a private IP into Aruba’s network We set up a “tun” kernel tunnel to CNAF’s CEs, and ARGUS At CNAF, there is the receiver of this tunnel, which also assigns tunnel IPs to the Aruba machines We host in Aruba: A squid for Frontier/CVMFS caching A GPFS AFM client It is a special GPFS client which “accepts” high latency connections and caches the results (only RO) This is used to serve the LSF environment Each WN uses the same image used at CNAF (extracted from quattor), with a special config which points to the services above CVMFS configs Special SITECONF for CMS

Aruba CNAF Rest of the world CE LSF master ARGUS SQUID, AFM Conditions, CVMFS Data via Xrootd, StageOut via SRM Pilot/glideInWMS/Cond or/cmsRun Tunnel

So … It works, finished testing all the pieces, now we would like to ramp it For the moment, all “free” Still to understand/discuss the economic model In the end what simplifies a lot the picture is No need to kill / restart machines. Slowing down for seconds seem to be quite acceptable for our jobs We are not paying for network (that is their model for all the customers, not just in this phase) Still, it was unclear which fraction of the 80 Gbit/s we could try and use If all the rest makes sense, we could bring there a private GARR link (just speculations at the moment) They offered the possibility to host machines at their site ($=?) We are thinking to send there a O(200 TB) xrootd caching proxy if the other tests are ok Speaking about the AWS test, what we did seems well replicable on AWS (indeed, already working on OS) We would like to have the chance to test it also there