Mar 28, 20071/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio OSG Resource Selection Service (ReSS) Don Petravick for Gabriele Garzoglio.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Advertisements

Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
The Sam-Grid project Gabriele Garzoglio ODS, Computing Division, Fermilab PPDG, DOE SciDAC ACAT 2002, Moscow, Russia June 26, 2002.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
January 2008 Grid School / Florida, OSG Engagement VO 1 Open Science Grid Rosetta OSG Engagement VO Resource Selection on the Grid.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Daniel Vanderster University of Victoria National Research Council and the University of Victoria 1 GridX1 Services Project A. Agarwal, A. Berman, A. Charbonneau,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Middleware: gLite Information Systems (IS) EGEE Tutorial 23 rd APAN Meeting,
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
Publication and Protection of Site Sensitive Information in Grids Shreyas Cholia NERSC Division, Lawrence Berkeley Lab Open Source Grid.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Deploying and Operating the SAM-Grid: lesson learned Gabriele Garzoglio for the SAM-Grid Team Sep 28, 2004.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Overview of Monitoring and Information Systems in OSG MWGS08 - September 18, Chicago Marco Mambelli - University of Chicago
Grid Workload Management Massimo Sgaravatto INFN Padova.
LCG / ARC Interoperability Status Michael Grønager, PhD (UNI-C / NBI) January 19, 2006, Uppsala.
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
The SAM-Grid and the use of Condor-G as a grid job management middleware Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
GLUE Schema: LDIF to old classad mapping Gabriele Garzoglio Computing Division, Fermilab May 31, 2006.
OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.
The SAM-Grid / LCG Interoperability Test Bed Gabriele Garzoglio ( ) Speaker: Pierre Girard (
July 25, 20071/21 OSG Information Services Gabriele Garzoglio, Rob Quick, Chris Green OSG Information Services, VO Monitoring Services and Resource Selection.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
Mar 27, gLExec Accounting Solutions in OSG Gabriele Garzoglio gLExec Accounting Solutions in OSG Mar 27, 2008 Middleware Security Group Meeting Igor.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical: The Information Systems.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Sep 17, 20081/16 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Sep 17, 2008 Gabriele Garzoglio.
The Resource Selection Service (ReSS) Activity Gabriele Garzoglio Fermilab, Computing Division March 14, 2006.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
Parag Mhashilkar Computing Division, Fermilab.  Status  Effort Spent  Operations & Support  Phase II: Reasons for Closing the Project  Phase II:
OSG Consortium Meeting - March 6th 2007Evaluation of WMS for OSG - by I. Sfiligoi1 OSG Consortium Meeting Evaluation of Workload Management Systems for.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Information System testing for LCG-1
Open Science Grid Progress and Status
Workload Management System
Basic Grid Projects – Condor (Part I)
The gLite information system: Top BDII
Presentation transcript:

Mar 28, 20071/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio OSG Resource Selection Service (ReSS) Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007 Overview The ReSS Project (collaboration, architecture, …) ReSS Validation and Testing Project Status and Plan ReSS Deployment

Mar 28, 20072/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio The ReSS Project The Resource Selection Service implements cluster-level Workload Management on OSG. The project started in Sep 2005 Sponsors –DZero contribution to the PPDG Common Project –FNAL-CD Collaboration of the Sponsors with –OSG (TG-MIG, ITB, VDT, USCMS) –CEMon gLite Project (PD-INFN) –FermiGrid –Glue Schema Group

Mar 28, 20073/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Motivations Implement a light-weight cluster selector for push-based job handling services Enable users to express requirements on the resources in the job description Enable users to refer to abstract characteristics of the resources in the job description Provide soft-registration for clusters Use the standard characterizations of the resources via the Glue Schema

Mar 28, 20074/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Technology ReSS basis its central services on the Condor Match- making service –Users of Condor-G naturally integrate their scheduler servers with ReSS –Condor information collector manages resource soft registration Resource characteristics is handled at sites by the gLite CE Monitor Service (CEMon) –CEmon registers with the central ReSS services at startup –Info is gathered by CEMon at sites running Generic Information Prividers (GIP) –GIP expresses resource information via the Glue Schema model –CEMon converts the information from GIP into old classad format. Other supported formats: XML, LDIF, new classad –CEMon publishes information using web services interfaces

Mar 28, 20075/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Architecture Condor Match Maker Info Gatherer classads Condor Scheduler job What Gate? Gate 3 job CEMon CE Gate1 job-managers jobsinfo CLUSTER GIP CEMon CE Gate2 job-managers jobsinfo CLUSTER GIP CEMon CE Gate3 job-managers jobsinfo CLUSTER GIP Central Services Info Gatherer is the Interface Adapter between CEMon and Condor Condor Scheduler is maintained by the user (not part of ReSS)

Mar 28, 20076/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Resource Selection Example universe = globus globusscheduler = $$(GlueCEInfoContactString) requirements = TARGET.GlueCEAccessControlBaseRule == "VO:DZero" executable = /bin/hostname arguments = -f queue MyType = "Machine" Name = "antaeus.hpcc.ttu.edu:2119/jobmanager-lsf-dzero " Requirements = (CurMatches < 10) ReSSVersion = "1.0.6" TargetType = "Job" GlueSiteName = "TTU-ANTAEUS" GlueSiteUniqueID = "antaeus.hpcc.ttu.edu" GlueCEName = "dzero" GlueCEUniqueID = "antaeus.hpcc.ttu.edu:2119/jobmanager-lsf-dzero" GlueCEInfoContactString = "antaeus.hpcc.ttu.edu:2119/jobmanager-lsf" GlueCEAccessControlBaseRule = "VO:dzero" GlueCEHostingCluster = "antaeus.hpcc.ttu.edu" GlueCEInfoApplicationDir = "/mnt/lustre/antaeus/apps GlueCEInfoDataDir = "/mnt/hep/osg" GlueCEInfoDefaultSE = "sigmorgh.hpcc.ttu.edu" GlueCEInfoLRMSType = "lsf" GlueCEPolicyMaxCPUTime = 6000 GlueCEStateStatus = "Production" GlueCEStateFreeCPUs = 0 GlueCEStateRunningJobs = 0 GlueCEStateTotalJobs = 0 GlueCEStateWaitingJobs = 0 GlueClusterName = "antaeus.hpcc.ttu.edu" GlueSubClusterWNTmpDir = "/tmp" GlueHostApplicationSoftwareRunTimeEnvironment = "MountPoints,VO-cms-CMSSW_1_2_3" GlueHostMainMemoryRAMSize = 512 GlueHostNetworkAdapterInboundIP = FALSE GlueHostNetworkAdapterOutboundIP = TRUE GlueHostOperatingSystemName = "CentOS" GlueHostProcessorClockSpeed = 1000 GlueSchemaVersionMajor = 1 … Resource Description Job Description Abstract Resource Characteristic Resource Requirements

Mar 28, 20077/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Glue Schema to old classad Mapping Site Cluster CE1 SubCluster1 SubCluster2 CE2 VO1 VO2 VO3 … Mapping the Glue Schema “tree” into a set of “flat” classads: all possible combination of (Cluster, Subcluster, CE, VO)

Mar 28, 20078/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Glue Schema to old classad Mapping Site Cluster CE1 SubCluster1 SubCluster2 CE2 VO1 VO2 VO3 Site Cluster SubCluster1 CE1 VO1 classad … Mapping the Glue Schema “tree” into a set of “flat” classads: all possible combination of (Cluster, Subcluster, CE, VO)

Mar 28, 20079/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Glue Schema to old classad Mapping Site Cluster CE1 SubCluster1 SubCluster2 CE2 VO1 VO2 VO3 Site Cluster SubCluster1 CE1 VO1 classad Site Cluster SubCluster2 CE1 VO1 classad … Mapping the Glue Schema “tree” into a set of “flat” classads: All possible combination of (Cluster, Subcluster, CE, VO)

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Glue Schema to old classad Mapping Site Cluster CE1 SubCluster1 SubCluster2 CE2 VO1 VO2 VO3 Site Cluster SubCluster1 CE1 VO1 classad Site Cluster SubCluster2 CE1 VO1 classad Site Cluster SubCluster1 CE1 VO2 classad … Mapping the Glue Schema “tree” into a set of “flat” classads: All possible combination of (Cluster, Subcluster, CE, VO)

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Glue Schema to old classad Mapping Site Cluster CE1 SubCluster1 SubCluster2 CE2 VO1 VO2 VO3 Site Cluster SubCluster1 CE1 VO1 classad Site Cluster SubCluster2 CE1 VO1 classad Site Cluster SubCluster1 CE1 VO2 classad Site Cluster SubCluster2 CE1 VO2 classad Site Cluster SubCluster1 CE2 VO1 classad Site Cluster SubCluster2 CE2 VO1 classad … Mapping the Glue Schema “tree” into a set of “flat” classads: All possible combination of (Cluster, Subcluster, CE, VO)

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Impact of CEMon on the OSG CE We studied CEMon resource requirements (load, mem, …) at a typical OSG CEs –CEMon pushes information periodically We compared CEMon resource requirements with MDS-2 by running –CEMon alone (invokes GIP) –GRIS alone (Invokes GIP) queried at high- rate (many LCG Brokers scenario) –GIP manually –CEMon AND GRIS together Conclusions –running CEMon alone does not generate more load than running GRIS alone or running CEMon and GRIS –CEMon uses less %CPU than a GRIS that is queried continuously (0.8% vs. 24%). On the other hand, CEMon uses more memory (%4.7 vs. %0.5). More info at ceSelection/CEMonPerformanceEvaluation ceSelection/CEMonPerformanceEvaluation Typical Load Average Running CEMon Alone sec Background (spikes due to GridCat probe) sec

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio US CMS evaluates WMS’s Condor-G test with manual res. selection (NO ReSS) –Submit 10k sleep jobs to 4 schedulers –Jobs last 0.5 – 6 hours –Jobs can run at 4 Grid sites w/ ~2000 slots When Grid sites are stable, Condor-G is scalable and reliable Study by Igor Sfiligoi & Burt Holzman, US CMS / FNAL, 03/07 view/ResourceSelection/ReSS EvaluationByUSCMS 1 Scheduler view of Jobs Submitted, Idle, Running, Completed, Failed Vs. Time

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio ReSS Scalability Condor-G + ReSS Scalability Test –Submit 10k sleep jobs to 4 schedulers –1 Grid site with ~2000 slots; multiple classad from VOs for the site Result: same scalability as Condor-G –Condor Match Maker scales up to 6k classads Queued Running

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio ReSS Reliability Same reliability as Condor-G, when grid sites are stable Failures mainly due to Condor-G / GRAM communication problems. Failures can be automatically resubmitted / re- matched (not tested here) Succeeded Failed 20K jobs130 jobs Note: plotting artifact

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Project Status and Plans Development is mostly done –We may still add SE to the resource selection process ReSS is now the resource selector of Fermigrid Assisting Deployment of ReSS (CEMon) on Production OSG sites Using ReSS on SAM-Grid / OSG for DZero data reprocessing for the available sites Working with OSG VOs to facilitate ReSS usage Integrate ReSS with GlideIn Factory Move the project to maintenance

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio ReSS Deployment on OSG Click here for live URLhere

Mar 28, /18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio Conclusions ReSS is a lightweight Resource Selection Service for push-based job handling systems ReSS is deployed on OSG and used by FermiGrid More info at Selection/ Selection/