Grid Workload Management Massimo Sgaravatto INFN Padova.

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

INFN & Globus activities Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
WP 1 (Globus) Status Report Massimo Sgaravatto INFN Padova for the INFN Globus group
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
NorduGrid Grid Manager developed at NorduGrid project.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto - INFN Padova Francesco Prelz – INFN Milano.
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Computational grids and grids projects DSS,
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Review of Condor,SGE,LSF,PBS
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EU 2nd Year Review – Feb – WP1 Demo – n° 1 WP1 demo Grid “logical” checkpointing Fabrizio Pacini (Datamat SpA, WP1 )
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Workload Management Workpackage
First proposal for a modification of the GIS schema
WP1 activity, achievements and plans
Basic Grid Projects – Condor (Part I)
Wide Area Workload Management Work Package DATAGRID project
GRID Workload Management System for CMS fall production
Presentation transcript:

Grid Workload Management Massimo Sgaravatto INFN Padova

Grid Workload Management WP Goal: define and implement a suitable architecture for distributed scheduling and resource management in a GRID environment Large heterogeneous environment Large numbers (thousands) of independent users Many challenging issues : Optimizing the choice of execution location based on the availability of data, computation and network resources Uniform interface to possible different local resource management systems under different administrative domains Priorities, policies on resource usage Reliability, scalability, … …

Approach We need much more experience with the various grid issues The application requirements are not completely defined yet. They will evolve as more familiarity with the grid model is acquired  Fast prototyping instead of a classic top-down approach

Current activities Report on current technology on Grid scheduling and resource management Globus resource management Condor Survey on Grid scheduling systems Focus on the implementation of a first prototype workload management system This part will be plugged together with the other parts implemented by the other WP’s to form the project month 9 (September) deliverable Grid accounting

Functionalities foreseen for the 1 st release First version of job description language (JDL) First version of resource broker Job submission service First version of bookkeeping and logging services First user interface

Block diagram of the currently foreseen components of the workload management system Not a real architecture Functional interactions among the various components Dependencies on “external” functionalities

Job Description Language (JDL) First release of job description language (JDL) used when the job is submitted, to specify the job characteristics (application, input data set id, resources [required and preferable], …) A document describing the syntax and semantics of a “prototype” JDL, based on Condor ClassAds was prepared Ready to collect feedback from applications

Resource Broker First version of resource broker, that chooses the computing resources (queues or “single” nodes) where to submit jobs, considering Access policies (grid-mapfiles in the Globus based prototype) Characteristics and status of resources Availability of input data set Availability of the required run time/application environments Resources required specified in the JDL Resources required published in an Information Space (Globus GIS in the first prototype) + Replica Catalog Ongoing implementation based on the Condor matchmaking library (Salvatore’s presentation)

Information Service All the information needed by the broker published in one Grid Information Space (Globus GIS/MDS for the first release) New MDS 2 alpha release soon available Should address some of the existing shortcomings Necessary to implement plug in modules Index (for a first level query, to identify a set of candidate resources) Information providers (to publish needed information about resources)

Job submission service Job submission service based (for the first release) on: Globus GRAM Condor-G on top of Globus GRAM (to implement a reliable job submission service) Globus GRAM Comprehensive evaluation already done (collaboration with the “Evaluation of the Globus toolkit” WP) Globus GRAM as uniform interface to different underlying resource management system (LSF, Condor, PBS) GRAM reporter (GRAM – GIS interaction) RSL

Job submission service Condor-G First prototype implementation already tested Promising, but many problems to fix New Condor-G implementation under testing Many problems fixed, but still other open issues Other new Condor-G implementation released hopefully in a few weeks Exploitation of a new persistent Globus jobmanager Active in following the developments of Globus GRAM, Condor-G, implementing the required customizations

Bookkeeping & Logging Job monitoring and control Job status Used resources Start time End time … Record of significant events occurring in the workload management system

User interface Command-line, for job management operations List of resources “suitable” to run a job Job submission (with the possibility to specify where to submit the job, or leaving this choice to the broker) Job status monitoring Job removal Access to bookkeeping info for the job

Workload management system (1 st prototype) Globus GRAM CONDOR Globus GRAM LSF Globus GRAM PBS Site1 Site2Site3 Job submission service Condor-G Broker GIS + Replica Catalog Submit jobs (using JDL [Class-Ads]) Resource Discovery Information on characteristics and status of local resources Local Resource Management Systems Globus GRAM as uniform interface to different local resource management systems Condor-G able to provide a reliable/crash- proof job submission service Broker chooses in which Globus resources the jobs must be submitted Farms Other info

Grid Accounting New problem Working systems (even prototype implementations) don’t exist yet Economy-based model for Grid accounting ? See Stefano’s presentation

Deliverables foreseen in the INFN-GRID proposal D2.1.1 Technical assessment about Globus and Condor, interactions and usage (5/2001) Done D2.1.2 First resource broker implementation for high throughput applications (7/2001) The resource broker should be easily customizable for high throughput applications Usable after M9 release

Deliverables foreseen in the INFN-GRID proposal D2.1.3 Comparison of different local resource managers (10/2001) Condor, LSF, PBS Farms with these resource management systems already in place and instrumented with the Globus software D2.1.4 Study of the three workload systems and implementation of the workload system for Monte Carlo productions (12/2001) Should be achievable