GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.

Slides:



Advertisements
Similar presentations
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Advertisements

Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
ANTHONY TIRADANI AND THE GLIDEINWMS TEAM glideinWMS in the Cloud.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor-G: A Case in Distributed.
Collaborative Campus Grid - Practices and experiences in Leiden University Campus Grid (LUCGrid) Hui Li Feb 4, 2005.
Open Science Grid Frank Würthwein UCSD. 2/13/2006 GGF 2 “Airplane view” of the OSG  High Throughput Computing — Opportunistic scavenging on cheap hardware.
CMS Experience Provisioning Cloud Resources with GlideinWMS Anthony Tiradani HTCondor Week May 2015.
Minerva Infrastructure Meeting – October 04, 2011.
Utility Computing Casey Rathbone 1http://cyberaide.org.edu.
Condor and Distributed Computing David Ríos CSCI 6175 Fall 2011.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Ways to Connect to OSG Tuesday afternoon, 3:00 pm Lauren Michael Research Computing Facilitator University of Wisconsin-Madison.
Nimbus & OpenNebula Young Suk Moon. Nimbus - Intro Open source toolkit Provides virtual workspace service (Infrastructure as a Service) A client uses.
glideinWMS: Quick Facts  glideinWMS is an open-source Fermilab Computing Sector product driven by CMS  Heavy reliance on HTCondor from UW Madison and.
OSG Public Storage and iRODS
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
BESIII distributed computing and VMDIRAC
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
Application Programming in Cloud via Swift Swift Tutorial, CCGrid 2013, Hour 2 Ketan Maheshwari.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
HTCondor Private Cloud Integration Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Running User Jobs In the Grid without End User Certificates - Assessing Traceability Anand Padmanabhan CyberGIS Center for Advanced Digital and Spatial.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
3 Compute Elements are manageable By hand 2 ? We need middleware – specifically a Workload Management System (and more specifically, “glideinWMS”) 3.
Introduction to Distributed HTC and overlay systems Tuesday morning, 9:00am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California.
Introduction to the Grid and the glideinWMS architecture Tuesday morning, 11:15am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University.
UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.
UCS D OSG School 11 Grids vs Clouds OSG Summer School Comparing Grids to Clouds by Igor Sfiligoi University of California San Diego.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
A Cloudy Future Panel at CCGSC ‘08
Elastic Computing Resource Management Based on HTCondor
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
ATLAS Cloud Operations
Workload Management System
Provisioning 160,000 cores with HEPCloud at SC17
The CMS use of glideinWMS by Igor Sfiligoi (UCSD)
WLCG Collaboration Workshop;
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
Basic Grid Projects – Condor (Part I)
Presentation transcript:

GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013

Contents  Evolution of Computing Grids  Why GlideinWMS?  GlideinWMS Architecture  GlideinWMS & Cloud  Demo: From a new VM to running a job successfully August 07, 2013 Department Meeting 2

Evolution of Computing Grids August 07, 2013 Department Meeting 3 No Computers  Primitive Computers  Super Computers/Main Frames  Batch Computing Personal Computers  Local Computing Clusters  Computing Grids  

Why GlideinWMS? August 07, 2013 Department Meeting 4 Computing Clouds  Similar to Grid Sites  Well maintained but NOT FREE GlideinWMS Pilot-based WMS that creates on demand a dynamically-sized overlay condor batch system on Grid & Cloud resources to address the complex needs of VOs in running application workflows Local Computing Facilities (Accessible but LIMITED RESOURCES) Grid Computing Facilities (WILD WILD WEST but Virtually Infinite Resources) Familiar setup & interface Homogenous Resources (Condor Batch System) In case of problems, assistance easily accessible Different administrative boundaries Heterogeneous Resources Some sites maintained well compared to others Local clusters maybe limited & busy when you need them Large number of opportunistic computing cycles available for use

GlideinWMS Architecture Components  Glidein Factory & WMS Pool  VO Frontend  Condor Central Manager & Scheduler GlideinWMS in Action  User submits a job  VO Frontend periodically queries the condor pool and requests factory to submit glideins  Factory looks up the requests and submits glideins to WMS Pool  Glidein starts running on a worker node at a Grid site  Glidein performs the required validation and on success starts condor startd  Condor startd reports to collector  Job runs on this resource as any other Condor batch job  On job completion, glidein exits and relinquishes the worker node August 07, Department Meeting VO Infrastructure Grid Site Worker Node Condor Scheduler Job VO Frontend glidein Condor Startd Condor Central Manager Condor Central Manager Glidein Factory & WMS Pool On Demand Dynamically Sized Overlay Condor Batch System

GlideinWMS: Grid Sites v/s Clouds Grid Site 1.Factory tells the glidein how to connect back to the user pool via condor JDF 2.Glidein - CondorG job that runs on a worker node 3.Glidein shuts down after its predetermined max lifetime or after inactivity for a while. 4.Grid Site admin manages the Worker Node, i.e. software stack installed & root access to the worker node. Cloud (EC2 Interface supported by Condor v8+) 1.Factory tells the glidein how to connect back to the user pool via condor JDF 2.Glidein - Service that runs in the VM launched by condor as a job using EC2. Requires glidein rpms to be installed on the VM image. 3.Glidein shuts down after its predetermined max lifetime or after inactivity for a while. Glidein shutdown triggers VM shutdown. 4.Cloud provider facilitates hosting the VM Image provided by the VO. VO manages the image, i.e. software stack installed & root access to the worker node. Based on the policy, Cloud provider can also be the VM maintainer/administrator August 07, Department Meeting VO Infrastructure Grid Site Worker Node Condor Scheduler Job VO Frontend glidein Condor Startd Glidein Factory & WMS Pool Cloud (OpenStack) Virtual Machine Glidein Service Glidein Service Condor Startd Condor Central Manager Condor Central Manager

Simplifying Grids & Clouds for Users From the User’s point of view –  Users interface to Condor batch system  Users with existing Condor-ready jobs can submit them to the grid sites and clouds with no or minimal changes  GlideinWMS shields the user from interfacing directly with the grid sites and clouds  Glidein validates the node before running a user job; reducing the failure rate of user jobs From VO’s point of view –  Can prioritize jobs from different users  Operate VO Frontend service & the Condor Pool (and optionally Glidein Factory + WMS Pool)  Can use existing Glidein Factories operated by or OSG Users focus on Science while operations team support the operations ! VO Infrastructure Grid Site Worker Node Condor Scheduler Job VO Frontend glidein Condor Startd Condor Central Manager Condor Central Manager Glidein Factory & WMS Pool

Working Live Demo: Oxymoron? August 07, 2013 Department Meeting 8  Launch a FermiCloud VM  Setup the required user accounts  Factory user, Frontend user, Condor user  Setup the required directory structure  HTTPD, monitoring and staging area  Install & Configure GlideinWMS on the VM  Install VDT, HTTPD, m2crypto, javascriptrrd  Install & Configure GlideinWMS services  Start GlideinWMS Services  Submit a job  Have GlideinWMS submit glidein  Job - Runs on the dynamically created resource