SUMS ( STAR Unified Meta Scheduler ) SUMS is a highly modular meta-scheduler currently in use by STAR at there large data processing sites (ex. RCF /

Slides:

Advertisements

Similar presentations

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

Advertisements

1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.

1 Generic logging layer for the distributed computing by Gene Van Buren Valeri Fine Jerome Lauret.

GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

Workload Management Massimo Sgaravatto INFN Padova.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 8: Implementing and Managing Printers.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.

UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

Tech talk 20th June Andrey Grid architecture at PHENIX Job monitoring and related stuff in multi cluster environment.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.

STAR scheduling future directions Gabriele Carcassi 9 September 2002.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Overview Why are STAR members encouraged to use SUMS ? Improvements and additions to SUMS Research –Job scheduling with load monitoring tools –Request.

1 Overview of the Application Hosting Environment Stefan Zasada University College London.

Aug 13 th 2003Scheduler Tutorial1 STAR Scheduler – A tutorial Lee Barnby – Kent State University Introduction What is the scheduler and what are the advantages?

Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.

PNPI HEPD seminar 4 th November Andrey Shevel Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)

David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

Grid Workload Management Massimo Sgaravatto INFN Padova.

- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.

Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.

Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.

TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.

February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.

FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.

Adrian Jackson, Stephen Booth EPCC Resource Usage Monitoring and Accounting.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.

K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.

Introduction to Active Directory

Tool Integration with Data and Computation Grid “Grid Wizard 2”

ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.

STAR Scheduling status Gabriele Carcassi 9 September 2002.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

CHEP ‘06Eric Hjort, Jérôme Lauret Data and Computation Grid Decoupling in STAR – An Analysis Scenario using SRM Technology Eric Hjort (LBNL) Jérôme Lauret,

STAR Scheduler Gabriele Carcassi STAR Collaboration.

10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.

David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.

1. 2 Introduction SUMS (STAR Unified Meta Scheduler) overview –Usage Architecture Deprecated Configuration Current Configuration –Configuration via Information.

Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)

CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

2004 Queue Scheduling and Advance Reservations with COSY Junwei Cao Falk Zimmermann C&C Research Laboratories NEC Europe Ltd.

Chapter 1 Overview of Databases and Transaction Processing.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

OpenPBS – Distributed Workload Management System

BOSS: the CMS interface for job summission, monitoring and bookkeeping

BOSS: the CMS interface for job summission, monitoring and bookkeeping

BOSS: the CMS interface for job summission, monitoring and bookkeeping

Vandy Berten Luc Goossens Alvin Tan

An innovative campus grid prototype

Michael P. McCumber Task Force Meeting April 3, 2006

Presentation transcript:

SUMS ( STAR Unified Meta Scheduler ) SUMS is a highly modular meta-scheduler currently in use by STAR at there large data processing sites (ex. RCF / PDSF). It is also used by other organizations such as Stony Brook University and as a back end to some PHENIX GUI applications. STAR has been using SUMS for 3.5 years now on both production and simulation jobs, but more importantly as a tool for users submission of request (jobs). The function of SUMS: –Run processes on large datasets (many files) that may be distributed across many nodes, clusters sites and batch systems. –Resolve the users abstract requests in to an actual set of jobs that can be run on a farm (s). –Resolve request for datasets. This is done using a catalog plug-in which resolves the users request for a data-set into LFN or PFN. –Write scripts and submit them to the batch system(s). –Imbed resource handling information for the batch system to use. –Group and Split work in the most efficient way possible.

Who contributes to SUMS research, development and administration PPDG – funding Jerome Lauret and Levente Hajdu – development and administration of SUMS at BNL Lidia Didenko – Testing for grid readiness David Alexander and Paul Hamill (Tech-X corp) - RDL deployment and prototype client and web service Eric Hjort, Iwona Sakrejda, Doug Olson – administration of SUMS at PDSF Valeri Fine – Job tracking. Andrey Y. Shevel - administration of SUM at Stony Brook University And Others Gabriele Carcassi – development and administration of SUMS Efstratios Efstathiadis – Queue monitoring, research

Benefits of using SUMS over submitting directly No knowledge of scripting required for splitting and submitting jobs. No knowledge of how to use the batch system needed. Datasets are resolved and chopped for the users. The user is totally shielded from the complications of using the distributed file system. There are safety measures in place to prevent users from downing the batch system by over using resources.

STARDATA24 QUEUE NODES JOBS STARDATA05 STARDATA02 RCF

STARDATA24 QUEUE NODES JOBS STARDATA05 STARDATA02 RCF

STARDATA24STARDATA02STARDATA05 SD24 virtual resource (102 units total) SD02 virtual resource (800 units total) SD05 virtual resource (2040 units total) Queued jobs Running jobs SD05 = 500 SD24 = 50 SD05 = 500 SD24 = 10 SD05 = 1000 SD24 = 2 SD05 = 30 SD24 = 3 SD05 = 30 SD02 = 600 SD05 = 800 SD24 = 50 SD05 = 700 SD02 = 450 SD05 = 750

Variables generated on the fly for users $JOBID – a unique ID is given to all jobs that SUMS will ever run. –Example 62338C856E6B2B0ABF F94CEA3_0 $ PROCESSID – The number of that job in the request, numbered 0,1,2,…n. $ SCRATCH – A area on the local system that users can use for temporary files. (temp space) –Example /tmp/$USER/$JOBID $ FILELIST – The location of a subset of data that SUMS has chopped from the dataset for processing by a given job. Others

JDL job XSD tree View

Job Parameters Required –Command - The command(s) to be run on the files –stdout Optional –Name –stderr –maxFilesPerProcess (max files per job) –minFilesPerProcess (min files per job) –minMemory –maxMemory –simulateSubmission –filesPerHour –minWallTime –maxWallTime –fileListSyntax –fromScratch

Sample job root4star -q –b /star/macro/runMuHeavyMaker.C\(\"$SCRATCH/heavy.MuDst.root\",\"$FILELIST\"\) <input URL="catalog:star.bnl.gov? production=P04ik, trgsetupname=proHigh, filetype=daq_reco_MuDst, tpc=1,ftpc=1, sanity=1“, nFiles="all"/>

Configuring SUMS SUMS uses java standardized xml de-serialization for its configuration. Over the years we have found this to be the ideal balance between ease of use and the power to define complex systems abstractly. Pre-initialized scheduler objects are defined by the administrator. One configuration file can hold many different instances of the same object. By default the user will be given the default objects, or they can specify other objects that have been customized for the special needs of there jobs. Objects include: –JobInitializer –Policy –Queue –Dispatcher –Application –Statistics recorder –Others

JobInitializer The job initializer is the module through which the user submits his job. JobInitializers currently available: –Local command line –command line (web service) Tested still in beta –GUI (web service) Tested still in beta

Dispatchers A scheduler plug-in module, that implements the dispatcher interface, that converts job objects to a “real” job actually submitted to the batch system Currently available dispatchers: –Boss –Condor –CondorG –Local (new) –LSF –PBS –SGE (new but heavily tested by PDSF)

Virtual Queues Defines a “place” (queue, pool, meta queue, service,etc.) that a job can be submitted to. Defines properties of that place. Each Virtual Queue points to one dispatcher object.

Virtual Queues

A typical queue configuration: localQueue star_cas_dd rcas.rcf.bnl.gov LSF local

Policies Resolves request for data sets. Chops dataset and creates jobs to work on each peace. –Tries to split in most optimal way –Groups files based on where they have to be processed, in case of a files on distributed disk. –The size of each sub-data set is based on the users min and max data set size requirements and the time requirements of the queue calculated from files per hour, if the users supplies this parameter. Brakes request into jobs Assigns job objects to queue objects by using a algorithm unique to each policy class.

Policies

Example of a custom policy used by the STAR resonance group. The algorithm for deciding where jobs go is “PassivePolic” the queues used are NSFlocalQueueObj, NFSQueueObj, HBT_group_Queue

Policies PassivePolicy – A simplistic policy that allows the administrator to set the order in which queues will be tried. The order is set by a property of the queue called “search order priority”. If two or more queues have the same search order priority they will be tried in a round robin fashion. ClusterAssignmentByMonitorPolicy – The first “monitoring policy” every tested. It detects the load of each cluster and then uses an equation to determine what percentage of jobs should go to that cluster. AssignmentByQueueMonitorPolicy – A “monitoring policy” that works at the queue level. Performance is better then ClusterAssignmentByMonitorPolicy. It monitors the waiting time and throughput of each queue using a plug-in developed for MonaLisa, to determine the best (fastest) queue to submit to. Unlike other schedulers that attempt to model every single variable. This policy only uses a handful of variables that reflect the state of possibly hundreds or thousands of factors.

Passive Policy Monitoring Policy

Reports, Logs and Statistics Logs and statistics collection is optional and the users report file is always generated. Reports –Reports are put in the users directory they give information about the internal workings of SUMS to the user. –Reports information about every job that was processed. –The user decides when to delete these. Logs –Holds information in a central area more valid to the administrator, for diagnosing problems. –The administrator decides when to delete these. Statistics –General information about how many people are using SUMS and what options there using.

Job tracking / monitoring /crash recovery Dispatchers in SUMS currently provide 3 functions: –Submit Job(s) –Get Status of job(s) –Kill Job(s)

Job tracking / monitoring /crash recovery To implement this in the most simplest care free way possible it was decided no central data base should be used to store this information. The information should be given to the users directly. The benefits are: –No db’s need to be set up on sites running SUMS. This automatically eliminates all securely and administration considerations. –The user decides when they no longer need this data. As the data is now in the user file system. As a file generated by SUMS

RDL Request Definition Language – An XML based language under development by STAR in collaboration with other scientific groups and private industry for describing not only one job, but many jobs and the relationships between them geared towards web services with advanced gui clients.

RDL Terminology on the layers of abstraction are not very clear all inclusive definitions are hard to come by. Note: These are only guide lines. Abstract / Meta / Composite request – defines a group of requests performing a common task. The order in which they run many be important. The output of one request may be the input to another request in the same meta request. example: Make a new dataset by running a program. When it is done sum the output and render a histogram. Request or meta job – defines a group of [0 to many] jobs that have a common function and can be run simultaneously. example: Take a data set and run an application on it. Physical Job – The unit of work the batch system deals with. example: Take a dataset and run an application on it.

RDL V.S. JDL Submitting on a grid landscape Supports submit of multiple jobs Supports submit of multiple request Separates task and application Supports work flow XML format RDL JDL