EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

Workload Management David Colling Imperial College London.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Job Submission The European DataGrid Project Team
David Colling Imperial College London Running your jobs everywhere.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Job Submission The European DataGrid Project Team
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto - INFN Padova Francesco Prelz – INFN Milano.
“Grey areas” of the new architecture Massimo Sgaravatto INFN Padova.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – M. Sgaravatto – n° 1 The EU DataGrid Workload Management System: towards.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
The Plan for this morning: Description of the EDG WP 1 software: How it works, basic commands, how to get started etc Example of how to submit jobs: From.
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Job Submission The European DataGrid Project Team
Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL The.
Grid checkpointing in the European DataGrid Project Alessio Gianelle – INFN Padova Rosario Peluso – INFN Padova Francesco Prelz – INFN Milano Massimo Sgaravatto.
M. Sgaravatto – n° 1 Overview of WP1 Workload Management System in EDG 2.x Massimo Sgaravatto INFN Padova - DataGrid WP1
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
M. Sgaravatto – n° 1 Overview of release 2 of the EDG WP1 Workload Management System deployed in the INFN production Grid Massimo Sgaravatto INFN Padova.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
Job Submission The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
EU 2nd Year Review – Feb – WP1 Demo – n° 1 WP1 demo Grid “logical” checkpointing Fabrizio Pacini (Datamat SpA, WP1 )
Workload Management Workpackage
WP1 WMS release 2: status and open issues
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Job Submission in the DataGrid Workload Management System
Introduction to Grid Technology
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
The EU DataGrid Job Submission Services
Basic Grid Projects – Condor (Part I)
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova

WP1 Workload Management System Ability to submit a job (described via the Condor ClassAD-based Job Description Language, or JDL) to the DataGrid testbed from any user machine The WP1 client allows to monitor and control (terminate) the job, and to transfer a "small" amount of data to and from the client machine and the executing machine WP1's RB chooses an appropriate computing resource for the job, based on the constraints specified in the JDL where the submitting user has proper authorization that matches the characteristics specified in the job ClassAD (Architecture, computing power, application environment, etc.) where the specified input data (and possibly the chosen output SE) are determined to be "close enough" by the appropriate resource administrators Throughout this process, WP1's Logging and Bookkeeping services maintain a "state machine" view of each job

dg-job-submit myjob.jdl Myjob.jdl Executable = "$(CMS)/exe/sum.exe"; InputData = "LF:testbed "; ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it"; DataAccessProtocol = "gridftp"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other.Architecture == "INTEL" && other.OpSys== "LINUX Red Hat 6.2"; Rank = other.FreeCPUs;

WP1 current activities Support and bug fixes for EDG 1.2 release Addressing RB malfunctions under heavy load Crisis threshold raised from ~ 300 to ~ 600 simultaneous jobs Other problems (in the underlying Globus services affecting WP1 software) Globus GASS cache problems (under stress conditions) Problems with MDS and therefore also in II: stuck when a remote GRIS can’t be contacted Possible patch under investigation Working on year 2 new functionalities

New funct’s in release 1.2 Automatic proxy renewal  User credential renewed in RB/JSS and CE securely, without user interaction Use of Globus MyProxy Interim (working !) solution "Cleaner" solution later when mechanisms to forward the “fresh” proxy to the jobmanager available in the standard Globus distribution (GRAM 1.6 ?) and exploited in CondorG Automatic job resubmission  If job fails for a “Grid problem” (e.g. Globus GASS cache problems, gridftp fail transfer failed, etc.) job rescheduled (possibly on a different CE) and resubmitted

New funct’s in rel 1.3 (end of June) APIs for the applications C++ as first step Java bindings later – if needed by applications Ability to submit MPI jobs Starting considering MPI jobs within a single CE MPICH implementation with LSF and PBS

Year 2 activities Working on review of architecture Increase reliability and flexibility Single thread, one-shot service RB, plugged into CondorG RB only offering matchmaking functionalities, just responsible to find the best CE’s Simplification (e.g. minimize duplication of persistent information [relying on CondorG queue]) Support new functionalities Favor interoperability with other Grid frameworks, by allowing exploiting WP1 modules (i.e. RB) also “outside” the WP1 WMS RB can be called for example by CondorG Coordination between EDG WP1 and PPDG to define common guidelines

Other year 2 activities Support for interactive jobs Jobs running on some CE worker node where a channel to the submitting (UI) node is available for the standard streams (proof like applications) Working on a solution based on Condor bypass Support for job dependencies Integration of Condor DAGMan “Lazy” scheduling: job (node) bound to a resource (by RB) just before that job is submitted On-going discussions with Condor people to agree on a common recipe for ClassAds representation of DAGs Integration of EDG WP2 Query Optimisation Service Help for RB to find the best CE based on data location Trigger of input data transfer

Other year 2 activities Support for “trivial” job checkpointing User defines what is a state of a job of his ( pairs) and can save a state Computation can be restarted from a previously saved state Support for job partitioning Use of job checkpointing and DAGMan mechanisms Original job partitioned in sub-jobs which can be executed in parallel At the end each sub-job must save a final state, then retrieved by a job aggregator, responsible to collect the results of the sub-jobs and produce the overall output Integration of advance reservation/co-allocation Globus GARA based approach

Other year 2 activities Grid Accounting Based upon a computational economy model Users pay in order to execute their jobs on the resources and the owner of the resources earn credits by executing the user jobs To have a nearly stable equilibrium able to satisfy the needs of both resource `producers' and `consumers' To credit of job resource usage to the resource owner(s) after execution GUI Python-based GUI already implemented Java components in the works Possible use of EDG WP3 R-GMA for L&B services Tests and integration on going Discussions with WP3 folks to have some missing pieces Improving error reporting

Other year 2 activities Matchmaking in the RB considering also SE characteristics (besides CE) Gangmatchig (matchmaking between multiple [>2] entities) On-going discussions with Condor people to decide how to proceed Use of the new Glue IS Schema Goal: same IS schema between HENP US and EU Grid projects

Other info