HTCondor Private Cloud Integration Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

Slides:



Advertisements
Similar presentations
HTCondor and the European Grid Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Cloud & Virtualisation Update at the RAL Tier 1 Ian Collier Andrew Lahiff STFC RAL Tier 1 HEPiX, Lincoln, NEBRASKA, 17 th October 2014.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
HTCondor at the RAL Tier-1
HTCondor within the European Grid & in the Cloud
CMS Diverse use of clouds David Colling
COMS E Cloud Computing and Data Center Networking Sambit Sahu
CMS Experience Provisioning Cloud Resources with GlideinWMS Anthony Tiradani HTCondor Week May 2015.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Two Years of HTCondor at the RAL Tier-1
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
HTCondor at the RAL Tier-1 Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Ashish Patro MinJae Hwang Thanumalayan S. Thawan Kooburat.
HTCondor at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier pre-GDB on Batch Systems 11 March 2014, Bologna.
Tier-1 Batch System Report Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier 5 June 2013, HEP SYSMAN.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Cloud services at RAL, an Update 26 th March 2015 Spring HEPiX, Oxford George Ryall, Frazer Barnsley, Ian Collier, Alex Dibbo, Andrew Lahiff V2.1.
HTCondor and Beyond: Research Computer at Syracuse University Eric Sedore ACIO – Information Technology and Services.
Grid job submission using HTCondor Andrew Lahiff.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
Grid and Cloud Computing Globus Provision Dr. Guy Tel-Zur.
Multi-core jobs at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly February 25 th 2014.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Two Years of HTCondor at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier STFC Rutherford Appleton Laboratory 2015 WLCG Collaboration.
A Personal Cloud Controller Yuan Luo School of Informatics and Computing, Indiana University Bloomington, USA PRAGMA 26 Workshop.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
HTCondor & ARC CEs Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier GridPP March 2014, Pitlochry, Scotland.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Virtualisation at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Annecy, 23rd May 2014.
A Year of HTCondor at the RAL Tier-1 Ian Collier, Andrew Lahiff STFC Rutherford Appleton Laboratory HEPiX Spring 2014 Workshop.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Tier 1 Experience Provisioning Virtualized Worker Nodes on Demand Ian Collier, Andrew Lahiff UK Tier 1 Centre, RAL ISGC 2014.
STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Experience integrating a production private cloud in a Tier 1 Grid site Ian Collier Andrew Lahiff, George Ryall STFC RAL Tier 1 ISGC 2015 – March 20 th.
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
Integrating HTCondor with ARC Andrew Lahiff, STFC Rutherford Appleton Laboratory HTCondor/ARC CE Workshop, Barcelona.
2007/05/22 Integration of virtualization software Pierre Girard ATLAS 3T1 Meeting
3 Compute Elements are manageable By hand 2 ? We need middleware – specifically a Workload Management System (and more specifically, “glideinWMS”) 3.
Virtualisation & Containers (A RAL Perspective)
HTCondor Annex (There are many clouds like it, but this one is mine.)
Running LHC jobs using Kubernetes
HTCondor Networking Concepts
Elastic Computing Resource Management Based on HTCondor
HTCondor Networking Concepts
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
HTCondor at RAL: An Update
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
ATLAS Cloud Operations
Workload Management System
Moving from CREAM CE to ARC CE
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
HTCondor Command Line Monitoring Tool
Basic Grid Projects – Condor (Part I)
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
Presentation transcript:

HTCondor Private Cloud Integration Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014

Clouds & batch systems Some sites might be (or be thinking about) –Moving some/all services to an internal private cloud –Setting up a private cloud in parallel to their batch system –Have access to a private cloud Problem: partitioned resources –Worker nodes (batch system) –Hypervisors (cloud) Likely to be a common situation at sites providing both grid & cloud resources

Clouds & batch systems Ideal situation: completely dynamic –If batch system busy but cloud not busy Expand batch system into the cloud –If cloud busy but batch system not busy Expand size of cloud, reduce amount of batch system resources cloudbatch cloudbatch

Clouds & batch systems Could just manually create virtual WNs –But it would be better to dynamically create them as they are needed Need some method of creating virtual WNs –Could use existing products glideinWMS Cloud Scheduler –Write your own A few examples on github –Cloud autoscaling service –Try to use existing HTCondor functionality –…

Using HTCondor Advertise appropriate offline ClassAd(s) to the collector –Hostname used is a random string –In our use case these represents types of VMs, rather than specific machines E.g. for VO-specific VMs, have an offline ClassAd for each type of VM condor_rooster –Enable this daemon –Configure to run appropriate command to instantiate a VM HTCondor pool password inserted into the VM Volatile disks created on hypervisor’s local disk for job scratch area

Using HTCondor When there are idle jobs –Negotiator can match jobs to the offline ClassAd Configured so that online machines are preferred to offline –condor_rooster daemon notices this match Instantiates a VM –Image used is a setup as a worker node, with HTCondor installed –VM starts up & joins the HTCondor pool

Provisioning worker nodes condor_collectorcondor_negotiator Worker nodes condor_startd condor_rooster Virtual worker nodes condor_startd ARC/CREAM CEs condor_schedd Central manager

VM lifetime Using short-lived VMs –Only accept jobs for a limited time period before shutting down HTCondor on the worker node controls everything –START expression New jobs allowed to start only for a limited time period since the VM was instantiated New jobs allowed to start only if the VM is healthy Startd cron (healthcheck) –HIBERNATE expression VM is shutdown after machine has been idle for too long