EGEE is a project funded by the European Union under contract IST-2003-508833 Job Description Language - more control over your Job Assaf Gottlieb University.

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FESR Consorzio COMETA - Progetto PI2S2 The gLite Workload Management System Annamaria Muoio INFN Catania Italy
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators Architecture of the gLite.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
The EDG Workload Management System – n° 1 The EDG Workload Management System.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Grid Infrastructure.
INFSO-RI Enabling Grids for E-sciencE Grid Infrastructure & Related Projects Eddie Aronovich Tel-Aviv University, School of CS
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA
Job Submission The European DataGrid Project Team
EGEE Summer School Grid Systems – 3-8 July Job submission into the LHC Grid (Job Management + JDL) EGEE is funded by the European Union under.
EGEE is a project funded by the European Union under contract IST Job Submission José R. Valverde EGEE NA4 Biomed Applications CNB/CSIC EMBnet/CNB.
Computational grids and grids projects DSS,
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – M. Sgaravatto – n° 1 The EU DataGrid Workload Management System: towards.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid, Hands-on on WMS (Review and Summary)
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Job Submission The European DataGrid Project Team
EGEE is a project funded by the European Union under contract IST Middleware components in EGEE Mike Mineter NeSC Training team
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Architecture of the gLite WMS (Workload Management System) Hands-on Paola Celio Universita’ Roma TRE INFN Roma TRE Sevilla Septembre 2007.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
M. Sgaravatto – n° 1 Overview of release 2 of the EDG WP1 Workload Management System deployed in the INFN production Grid Massimo Sgaravatto INFN Padova.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
INFSO-RI Enabling Grids for E-sciencE Job Submission Tutorial (material from INFN Catania)
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
Induction: General components of Grid middleware and User Interfaces –April 26-28, General components of Grid middleware and User Interfaces Roberto.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals Giuseppe La Rocca INFN – Catania gLite Tutorial at the EGEE User Forum CERN.
EGEE is a project funded by the European Union under contract IST Job Description Language – How to control your Job Nadav Grossaug IsraGrid.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Job Submission The European DataGrid Project Team
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
User Interface UI TP: UI User Interface installation & configuration.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
EGEE is a project funded by the European Union under contract IST GENIUS and GILDA Guy Warner NeSC Training Team Induction to Grid Computing.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Architecture of the gLite WMS
Workload Management System on gLite middleware
Workload Management System ( WMS )
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Job Submission in the DataGrid Workload Management System
Introduction to Grid Technology
Workload Management System
Job Description Language
5. Job Submission Grid Computing.
The gLite Workload Management System
Job Description Language (JDL)
Presentation transcript:

EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University of Tel-Aviv EGEE tutorial, Ra’anana,

EGEE tutorial, Ra’anana, Outline Introduction Job submission services – what is really going on inside... JDL syntax

EGEE tutorial, Ra’anana, The use of jobs for running applications Jobs are the way users execute applications on the grid. Information to be specified when a job has to be submitted:  Job characteristics  Job requirements and preferences on the computing resources Also including software dependencies  Job data requirements Information specified using a Job Description Language (JDL)  Based upon Condor’s CLASSified ADvertisement language (ClassAd) Fully extensible language A ClassAd is a sequence of attributes separated by semi-colon (;).

EGEE tutorial, Ra’anana, User Interface (UI) User Interface (UI):The place where users logon to the Grid Computing Element (CE) Computing Element (CE): A batch queue on a farm of computers where the user Job gets executed Storage Element (SE) Storage Element (SE): A storage server where Grid files are stored (read/write/copy) or replicated. Resource Broker (RB) Resource Broker (RB): Matches the user requirements with the available resources on the Grid Catalogues(MDS/RLS) Catalogues(MDS/RLS): A storage server where Grid files are stored (read/write/copy) or replicated. RLS How does it work? Main components

EGEE tutorial, Ra’anana, EGEE/LCG Workload Management System Workload Management System (WMS) The user interacts with Grid via a Workload Management System (WMS) The Goal of WMS is the distributed scheduling and resource management in a Grid environment. What does it allow Grid users to do?  To submit their jobs  To execute them on the “best resources” The WMS tries to optimize the usage of resources  To get information about their status  To retrieve their output

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) submitted Job Status

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status submitted Job Status edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other.GlueCEPolicyMaxWallClockTime > 10000; Rank = other.GlueCEStateFreeCPUs; Job Description Language (JDL) to specify job characteristics and requirements

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage waiting submitted Input Sandbox files Job NS: network daemon responsible for accepting incoming requests

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage waiting submitted WM: responsible to take the appropriate actions to satisfy the request

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage waiting submitted Match- Maker/ Broker Where must this job be executed ?

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage waiting submitted Match- Maker/ Broker Matchmaker: responsible to find the “best” CE where to submit a job

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage waiting submitted Match- Maker/ Broker Where are (which SEs) the needed data ? What is the status of the Grid ?

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage waiting submitted Match- Maker/ Broker CE choice

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage waiting submitted Job Adapter JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.)

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage JC: responsible for the actual job management operations (done via CondorG) submitted waiting ready

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status Job Status RB storage Job Input Sandbox files submitted waiting ready scheduled

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node Job Status RB storage submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node Job Status RB storage Output Sandbox files submitted waiting ready scheduled running done

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node Job Status RB storage submitted waiting ready scheduled running done edg-job-get-output

EGEE tutorial, Ra’anana, Job Submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node Job Status RB storage submitted waiting ready scheduled running done Output Sandbox files cleared

EGEE tutorial, Ra’anana, Job monitoring UI Log Monitor Logging & Bookkeeping Network Server Job Contr. - CondorG Workload Manager Computing Element RB node LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB LB: receives and stores job events; processes corresponding job status Log of job events edg-job-status edg-job-get-logging-info Job status

EGEE tutorial, Ra’anana, A typical job workflowReplicaCatalogue UI JDL Logging & Book-keeping ResourceBroker Job Submission Service StorageElement ComputeElement InformationService Job Status DataSets info Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” Input “sandbox” + Broker Info Globus RSL Output “sandbox” Job Status Publish grid-proxy-init Expanded JDL SE & CE info

EGEE tutorial, Ra’anana, Essential JDL - syntax An attribute is a pair (key, value), where value can be a Boolean, an Integer, a list of strings,....  = ; In case of literal string for values:  if a string itself contains double quotes, they must be escaped with a backslash Arguments = " \"Hello\" 10";  the character “'” cannot be specified in the JDL  special characters such as &, |, >, < are only allowed if specified inside a quoted string if preceded by triple \ –Arguments = " -f file1\\\&file2 " ; Comments must be preceded by a sharp character (#) or have to follow the C++ syntax The JDL is sensitive to blank characters and tabs  they should not follow the semicolon (;) at the end of a line

EGEE tutorial, Ra’anana, Essential JDL The supported attributes are grouped in two categories:  Job Attributes Define the job itself  Resources Taken into account by the RB for carrying out the matchmaking algorithm (to choose the “best” resource where to submit the job) Computing Resource –Used to build expressions of Requirements and/or Rank attributes by the user –Have to be prefixed with “other.” Data and Storage resources –Input data to process, SE where to store output data, protocols spoken by application when accessing SEs

EGEE tutorial, Ra’anana, Essential JDL (contd.) At least one has to specify the following attributes:  the name of the executable  the files where to write the standard output and standard error of the job (recommended, not mandatory)  the arguments to the executable, if needed  the files that must be transferred from UI to WN and viceversa [ Executable = “ls -al”; StdError = “stderr.log”; StdOutput = “stdout.log”; OutputSandbox = {“stderr.log”, “stdout.log”}; ]

EGEE tutorial, Ra’anana, Job Description Language: relevant attributes JobType  Normal (simple, sequential job), Interactive, MPICH, Checkpointable  Or combination of them Executable (mandatory)  The command name Arguments (optional)  Job command line arguments StdInput, StdOutput, StdError (optional)  Standard input/output/error of the job Environment (optional)  List of environment settings InputSandbox (optional)  List of files on the UI local disk needed by the job for running  The listed files will automatically staged to the remote resource OutputSandbox (optional)  List of files, generated by the job, which have to be retrieved VirtualOrganisation (optional)  A different way to specify the VO of the user

EGEE tutorial, Ra’anana, Requirements  Other possible requirements values are below reported: other.GlueCEInfoLRMSType == “PBS” && other.GlueCEInfoTotalCPUs > 1 (the resource has to use PBS as the LRMS and whose WNs have at least two CPUs) Member(“CMSIM-133”, other.GlueHostApplicationSoftwareRunTimeEnvironment) (a particular experiment software has to run on the resource and this information is published on the resource environment) –The Member operator tests if its first argument is a member of its second argument RegExp(“cern.ch”, other.GlueCEUniqueId) (the job has to run on the CEs in the domain cern.ch) (other.GlueHostNetworkAdapterOutboundIP == true) && Member(“VO-alice-Alien”, other.GlueHostApplicationSoftwareRunTimeEnvironment) && Member(“VO-alice- Alien-v4-01-Rev-01”, other.GlueHostApplicationSoftwareRunTimeEnvironment) && (other.GlueCEPolicyMaxWallClockTime > 86000) (the resource must have some packages installed VO-alice-Alien and VO-alice-Alien-v4-01-Rev-01 and the job has to run for more than seconds) Job Description Language: relevant attributes

EGEE tutorial, Ra’anana, Rank  Expresses preference (how to rank resources that have already met the Requirements expression)  It is expressed as a floating-point number  The CE with the highest rank is the one selected  Specified using GLUE attributes of resources published in the Information Service  If not specified, default value defined in the UI configuration file is considered Default: - other.GlueCEStateEstimatedResponseTime (the lowest estimated traversal time) Default: other.GlueCEStateFreeCPUs (the highest number of free CPUs)  Other possible rank value is below reported: (other.GlueCEStateWaitingJobs == 0 ? other.GlueCEStateFreeCPUs : - other.GlueCEStateWaitingJobs) (the number of waiting jobs is used if this number is not null and the rank decreases as the number of waiting jobs gets higher; if there are not waiting jobs, the number of free CPUs is used) Job Description Language: relevant attributes

EGEE tutorial, Ra’anana, Example of JDL file [ JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = (other. GlueHostOperatingSystemName == “linux") && (other.GlueCEPolicyMaxWallClockTime > 10000); Rank = other.GlueCEStateFreeCPUs; ]

EGEE tutorial, Ra’anana, Job Submission edg-job-submit [–r ] [-c ] [-vo ] [-o ] -r the job is submitted directly to the computing element identified by -c the configuration file is pointed by the UI instead of the standard configuration file -vo the Virtual Organisation (if user is not happy with the one specified in the UI configuration file) -o the generated edg_jobId is written in the Useful for other commands, e.g.: edg-job-status –i (or edg_jobId) -i the status information about edg_jobId contained in the are displayed

EGEE tutorial, Ra’anana, Possible job states Flag Meaning SUBMITTEDsubmission logged in the LB WAITjob match making for resources READYjob being sent to executing CE SCHEDULEDjob scheduled in the CE queue manager RUNNINGjob executing on a WN of the selected CE queue DONEjob terminated without grid errors CLEAREDjob output retrieved ABORTjob aborted by middleware, check reason

EGEE tutorial, Ra’anana, Other (most relevant) UI commands edg-job-list-match  Lists resources matching a job description  Performs the matchmaking without submitting the job edg-job-cancel  Cancels a given job edg-job-status  Displays the status of the job edg-job-get-output  Returns the job-output (the OutputSandbox files) to the user edg-job-get-logging-info  Displays logging information about submitted jobs (all the events “pushed” by the various components of the WMS)  Very useful for debug purposes

EGEE tutorial, Ra’anana, Let’s try it out! (But questions first…)

EGEE tutorial, Ra’anana, Additional material: Relevant Glue Attributes State (objectclass GlueCEState)  GlueCEStateRunningJobs: number of running jobs  GlueCEStateWaitingJobs: number of jobs not running  GlueCEStateTotalJobs: total number of jobs (running + waiting)  GlueCEStateStatus: queue status: queueing (jobs are accepted but not run), production (jobs are accepted and run), closed (jobs are neither accepted nor run), draining (jobs are not accepted but those in the queue are run)  GlueCEStateWorstResponseTime: worst possible time between the submission of a job and the start of its execution  GlueCEStateEstimatedResponseTime: estimated time between the submission of a job and the start of its execution  GlueCEStateFreeCPUs: number of CPUs available to the scheduler

EGEE tutorial, Ra’anana, Relevant Glue Attributes (contd.) Policy (objectclass GlueCEPolicy)  GlueCEPolicyMaxWallClockTime: maximum wall clock time available to jobs submitted to the CE, in seconds  GlueCEPolicyMaxCPUTime: maximum CPU time available to jobs submitted to the CE, in seconds  GlueCEPolicyMaxTotalJobs: maximum allowed total number of jobs in the queue  GlueCEPolicyMaxRunningJobs: maximum allowed number of running jobs in the queue  GlueCEPolicyPriority: information about the service priority

EGEE tutorial, Ra’anana, Relevant Glue Attributes (contd.) Architecture (objectclass GlueHostArchitecture)  GlueHostArchitecturePlatformType: platform description  GlueHostArchitectureSMPSize: number of CPUs Processor (objectclass GlueHostProcessor)  GlueHostProcessorVendor: name of the CPU vendor  GlueHostProcessorModel: name of the CPU model  GlueHostProcessorVersion: version of the CPU  GlueHostProcessorOtherProcessorDescription: other description for the CPU  […]

EGEE tutorial, Ra’anana, Relevant Glue Attributes (contd.) Application software (objectclass GlueHostApplicationSoftware)  GlueHostApplicationSoftwareRunTimeEnvironment: list of software installed on this host Main memory (objectclass GlueHostMainMemory)  GlueHostMainMemoryRAMSize: physical RAM  GlueHostMainMemoryVirtualSize: size of the configured virtual memory Benchmark (objectclass GlueHostBenchmark)  GlueHostBenchmarkSI00: SpecInt2000 benchmark  GlueHostBenchmarkSF00: SpecFloat2000 benchmark Network adapter (objectclass GlueHostNetworkAdapter)  […]  GlueHostNetworkAdapterOutboundIP: permission for outbound connectivity  GlueHostNetworkAdapterInboundIP: permission for inbound connectivity

EGEE tutorial, Ra’anana, Rank Possible rank values are below reported:  -other.GlueCEStateEstimatedResponseTime (the lowest estimated traversal time)  other.GlueCEStateFreeCPUs (the highest number of free CPUs) Bad idea: number of free CPU published per QUEUE, not per VO  (other.GlueCEStateWaitingJobs == 0 ? other.GlueCEStateFreeCPUs : -other.GlueCEStateWaitingJobs) (the number of waiting jobs is used if this number is not null and the rank decreases as the number of waiting jobs gets higher; if there are not waiting jobs, the number of free CPUs is used)