Presentation is loading. Please wait.

Presentation is loading. Please wait.

+= 1 The STAR Unified Meta-Scheduler (SUMS) A front end around evolving technologies for user analysis and data production. J é r ô me Lauret, Gabriele.

Similar presentations


Presentation on theme: "+= 1 The STAR Unified Meta-Scheduler (SUMS) A front end around evolving technologies for user analysis and data production. J é r ô me Lauret, Gabriele."— Presentation transcript:

1 += 1 The STAR Unified Meta-Scheduler (SUMS) A front end around evolving technologies for user analysis and data production. J é r ô me Lauret, Gabriele Carcassi, Levente Hajdu Efstratios Efstathiadis, Lidia Didenko, Valeri Fine Iwona Sakrejda, Doug Olson

2 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 2 Outline  Project overview STAR Experiment Problematic Solution  Design and architecture Basic principles Building blocks Add-on (usage tracking) Usage Grid experience  Schedulers Key features MonaLISA policy  Contributions GUI, dispatchers  Future work & Conclusion

3 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 3 Project overview

4 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 4 The STAR Experiment  The Solenoidal Tracker At RHIC http://www.star.bnl.gov/ is an experiment located at BNL (USA) http://www.star.bnl.gov/ A collaboration of 546 people wide, spanning over 12 countries A PByte scale experiment overall (raw, reconstructed events, simulation) with large amount of files (several Million)  Run4 alone (2003-2004) has produced 200 TB of raw data  Rich set of data analysis and simulation problems  Expecting 200 TB of reconstructed data  40 TB of MuDST (1 pass)  Files copied to Tier1 using SRM tools (see Track 4, 344Track 4, 344 ?

5 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 5 Problematic  Ongoing analysis Past and new sets of data are constantly analyzed  Data spread at many location sites and storage type, some on distributed disk local to each machine not easily accessible  Evolving technologies Distributed computing (re) shapes itself as we make progress: Condor-G, portals, Meta-Schedulers, Web Services, Grid Services, … Batch technologies themselves evolve Users have to adapt within a productive environment and ever growing scientific program May be fine for new experiment, not for running ones

6 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 6 Solution  Allow user to pursue scientific endeavor without disruption Make use of current/available resources Ensure same productivity (subjective without matrix) Develop a front end shielding the user from technology details and changes – Job concept Abstraction Attract users to migrate to new framework & Grid => data management, file relocation => Catalog  Design a tool/framework allowing for evolution Changing underlying technology should NOT mean change in user’s daily routine Framework should allow for testing ideas, plug-in of new components (Dispatcher for Local Resource Managers = LRMS), moving users to distributed computing with no extraneous knowledge

7 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 7 And so SUMS was born …  Project started in 2002 Light developer team (<> ~ 1.0 FTE) Surrounding activities have enriched the project and spawned activities and collaborations (Monitoring, U-JDL, Resource Brokering studies, …)  Historically STAR project, design and prototype responsibility taken by WSU. Project enhanced and brought to user community (Gabriele Carcassi) Current development & design(Levente Hajdu)  Entirely written in Java Portable, modular class based design Project management, auto-documentation, …

8 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 8 Design / Architecture - Opened

9 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 9 Basic principles  Users do NOT write shell scripts and submit series of tag=value  Instead, they write an XML – U-JDL Describing their “intent” to work on files, a DataSet, collections, etc … They do not have to know where those files are located (LFN or collections may convert to PFN) They do not have to handle the gory details of resource management (bsub –R …) They do not need to think where their job will best fit, their input to SUMS are rates or ranges indications  Following a prescribed schema and … % star-submit MyJob.xml % star-submit-template –template MyTemplateJob.xml –entities jobname=test,year=2004

10 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 10 What it does … User Input … () … Policy …. dispatcher

11 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 11 Architecture / building blocks Main boxes are java classes The framework chooses the blocks to use depending on user options ( % … -policy XXX ) Interface between blocks are identical Implementations of the Policy class = the heart of SUMS (decision making, planning, resource brokering, …) Extendable, adaptable

12 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 12 Job Initializer XML is validated, request objects created …

13 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 13 Queues  Queue concept is “opened” Queue can be a LRMS queue (PBS, LSF, SGE, …) Queue can be a Pool or a DRMS (Condor, Condor-G, …) A Web or Grid Service … anything for which a dispatcher can be written  The object container is defined or defines Defined by a name (may be logical) Associated to a dispatcher (has a pointer to a dispatcher object) – LSFDispatcher uses logical name = queue name Has resource requirements  CPUtime limits, memory limits, the type of storage it can access, storage limits  Base rule: they can be undefined -1 (to be expected from Policy stand point)

14 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 14 Policies  Policies integrate pre-defined queues Serialized XML as local configuration A policy can make use of as many queues as necessary  Queues may have a type (LSF, PBS, Condor, …) a scope (Local, Distributed, …) Allows SUMS to decide which one to take depending on RB decision  Queues can be given an initial weight (for example, used for ordering if weight = priority)  Queues have a weight-incremental  Complex policies may order queues as necessary (your choice) – Default order by weight (priority)

15 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 15 Policy note – job splitting  element can take several form Transition formats: PFN, PFN (wildcard)  Locally distributed PFN support  List support  Dataset, MetaData support  … LFN support on the way …  Preferred STAR usage: map MetaData/Collections or LFN to PFN, dispatch jobs --- BUT THERE ARE TWO WAYS --- PFN converted (URL syntax do not end up in final lists, APPS work as usual) Lists are formatted and passed to APPS as URL, APPS need to sort URL Example: rootd syntax like URL passed as-is

16 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 16 Dispatchers High level dispatcher do a redirect to - PBS - LSF - SGE - Condor - Condor-G - BOSS - …

17 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 17 Add-On – Usage monitoring  Needed usage feedback - Monitoring user’s usage to Allow for a better targeted tool Focus can be made on most used/preferred feature CS fantasy trimmed down  Serves better the user community Eliminates divergence and re-focus Practicality first, SciFi later … Ensures equity of usage Helps re-focusing tutorials & documentation  JSP based (tomcat) with MySQL back-end All options and usage are recorded

18 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 18 Example of useful information … Implemented two ways of accessing locally distributed files. Is it used ?? Added SGE dispatcher a few weeks ago … Which storage type is most used … may very well be a $$ / accessibility question

19 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 19 Example II-a PDSF BNL 4500 jobs /day Peaks at 20k

20 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 20 Example II-b Pessimistic graph is an integral count over time. It shows that after first usage, users keep using SUMS … NB: Drop from the beginning of the summer indicates Vacation time Conference time Lack of new data  (this is not the best period for SUMS commercial but informative nonetheless) See more statistics at http://www.star.bnl.gov/STAR/comp/Grid/scheduler/ http://www.star.bnl.gov/STAR/comp/Grid/scheduler/

21 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 21 Physicist usage  As far as we know, 85% of active users using SUMS  Publications selection / confirmed as 100% SUMS analysis based J. Gonzales - Nuclear Experiment, abstract nucl-ex/0408016, Pseudorapidity Asymmetry and Centrality Dependence of Charged Hadron Spectra in d+Au Collisions at sqrt(S NN )=200 GeV (submitted to PRC) L. S. Barnby – QM Proceedings - 2004 J. Phys. G: Nucl. Part. Phys. 30 S1121-S1124 T. Henry - Full jet reconstruction in d+Au and p+p collisions at RHIC, Journal of Physics G: Nuclear Physics (volume 30, issue 8) S1287 J.S. Lange - Proceedings 19th Winter Workshop on Nuclear Dynamics (2003), nucl-ex/0306005 - Review of search for heavy flavor (c,b quarks) production in leptonic decay channels in Au+Au collisions at sqrt(s NN )=200 GeV at the STAR Experiment at RHIC. A. Tang - Anisotropy at RHIC: the first and the fourth harmonic … http://www.star.bnl.gov/central/publications/ (7 papers / analysis submitted in the past 3 months) http://www.star.bnl.gov/central/publications/

22 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 22 Grid experience  Use of SUMS for Grid job submissions possible Modulo RSL extensions tags MUST specify path as relative path (“bla.root”, “blop/test.dat”, …) attribute fromScratch / toURL designed to bring the files back ( globus-url-copy )  Grid experience has been a challenge Cryptic messages, had a problem with a globus error 74: no clue of what it was for months, no Grid Help-desk, no knowledge base index. Turned out to be a firewall issue, burst of massive job death  Nonetheless ¼ of Run4 simulation production made on grid  100,000 events generated, analysis ongoing Success rate  85% when all goes well  60% when lots of jobs are submitted (above issue) Planning to run on larger scale platform, Grid3+ and/or OSG-0 with (hopefully) better ways to track errors/problems

23 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 23 Schedulers

24 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 24 Schedulers  Can a user front end to other LRMS/DRMS be called a “scheduler” ??  Is using the local resource within the same paradigm than globally distributed resources ? Traditional - LRMSDistributed - DRMS Job Mostly SerializedPossibly following a Work-Flow Data File basedData sets, collections, … Scheduling One LRMS usedMany – Issues are consistencies, QoS, unified information (from/to) AAA Handled by LRMSVO based, ownership is itself an issue Resources Dedicated or local policy managed (priority, usage throttle, …) Common, no global policies but agreements or statement of understanding

25 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 25 Schedulers  Key features for a scheduler Keep global accounting Scheduling decisions may be based on  Resource availability, respect of local policies, fairshare (cluster autonomy)  Advance reservation, best use of resources  Network and data cache, data availability …… Job migration, moving jobs to/from a trusted cluster Spanning and workflow Human readable messages …  Scheduling algorithm can be complex Attempts to predict (Weather Services) has been proven difficult Dedicated Global accounting and standard messages possible Mixed of LRMS and DRMS capabilities (user autonomy) not common Complex algorithm takes into account so many parameters …  Empirical approach Inspect queue behavior, send jobs, see how queue reacts … re- adjust Self-sustained system Adapts to network/resource/load changes ??

26 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 26 Empirical approach (?) LSF Monitoring Policy  Information fed by agents to ML  Information is recovered by SUMS module Scheduling decisions made based on load and “queue” or “pool” response time Self-sustained system (no need for %tage based submission branching) Hopefully no need for complex algorithm Respond as resources, priorities, bandwidth adjusts  Results / details in Efstratios Efstathiadis presentation, Track 4 - 393 Track 4 - 393

27 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 27 Contributions  RHIC/Phenix collaboration have tested and using SUMS Contributions included addition of dispatchers (PBS, BOSS) – Andrey Shevel Development includes creation of GUI front end for end-users – Mike Reuter  Job tracking and monitoring SUMS allows for dispatching to ANY queues BOSS (from CMS) a possible solution as “a” dispatcher Implemented / contributed by Andrey Shevel (Phenix/SUNY-SB) – Track 5, 86 BODE trackingTrack 5, 86

28 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 28 Future work  High Level User JDL work Started with a document on RDL (PPDG-39)PPDG-39 Motivation  Current U-JDL simple enough but has its limitations Extension to new resource requirement possible but inelegant U-JDL considers most (but not all) data sets Lacks concept of tasks and sandboxes Workflow diagram are only AND (sequential) implemented (need OR, conditional branching etc …)  SBIR with Tech-X (David Alexander) Deliverables  Enhanced and complete U-JDL (AJHDL)  A WSDL for creating a Grid Service Reviewed most available high level JDL  Job Submission Description Language (JSDL) (GGF)  Analysis Job Description Language (AJDL) (Atlas)  User Request Description Language (URDL) (PPDG-39 / Jlab/STAR)  Job Description Language (JDL) (DataGrid)  Job Description Language (JDL) (JLab)  …

29 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 29 Future work  We promised our users the U-JDL will not change For what they know, it won’t (XSLT, schema transformation) But the ones using AJHDL will have access to more features  We are working on job tracking  We are working on the concept of Meta-Log (application level monitoring) Seems to be forgotten Valeri Fine – Poster, 480

30 Sept 27 th - Oct 1 st 2004J érôme LAURET, RHIC-STAR/BNL 30 Conclusions  SUMS is NOT a batch system A toy (real needs, real use, real Physics)  SUMS is A front end to local and distributed RMS acting like a client to multiple, heterogeneous RMS A flexible opened architecture, object oriented framework in which with plug-and-play features A good environment for further developing  Standards (such as High level JDL)  Scalability of other components (ML work, immediate use) Used in STAR for real Physics (usage and publication list) Used for Distributed / Grid Simulation job submission Used successfully by other experiments A mean to make active users transition to distributed computing and recover under-used resources … …


Download ppt "+= 1 The STAR Unified Meta-Scheduler (SUMS) A front end around evolving technologies for user analysis and data production. J é r ô me Lauret, Gabriele."

Similar presentations


Ads by Google