Enabling Grids for E-sciencE SGE J. Lopez, A. Simon, E. Freire, G. Borges, K. M. Sephton All Hands Meeting Dublin, Ireland 12 Dec 2007 Batch system support.

Slides:



Advertisements
Similar presentations
HTCondor and the European Grid Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Advertisements

CENTRO DE SUPERCOMPUTACIÓN DE GALICIA E. Freire, J. López, C. Fernández, A. Simon, P. Rey, R. Diez, S. Diaz CESGA Experience with the Grid Engine batch.
Presented by: Priti Lohani
DCC/FCUP Grid Computing 1 Resource Management Systems.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
DONVITO GIACINTO (INFN) ZANGRANDO, LUIGI (INFN) SGARAVATTO, MASSIMO (INFN) REBATTO, DAVID (INFN) MEZZADRI, MASSIMO (INFN) FRIZZIERO, ERIC (INFN) DORIGO,
Accounting Update Dave Kant Grid Deployment Board Nov 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Supporting MPI applications on the EGEE Grid.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
Enabling Grids for E-sciencE SGE J. Lopez, A. Simon, E. Freire, G. Borges, K. M. Sephton All Hands Meeting Barcelona, Spain 23 May 2007.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Grid Workload Management Massimo Sgaravatto INFN Padova.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
9th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CERN status report SA3 All Hands Meeting.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Resource Management Task Report Thomas Röblitz 19th June 2002.
Portal Update Plan Ashok Adiga (512)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
A step towards interoperability (between Int.EU.Grid and EGEE Grid infrastructures) Gonçalo Borges, Jorge Gomes LIP on behalf of Int.EU.Grid Collaboration.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress report from University of Cyprus.
Batch Systems P. Nilsson, PROOF Meeting, October 18, 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA3 partner collaboration tasks & process.
EGEE-III INFSO-RI Enabling Grids for E-sciencE SA3 All Hands Meeting 'Cluster of Competence' Experience SA3 INFN Cyprus May 7th-8th.
Enabling Grids for E-sciencE Batch System Integration Jan Just Keijser Nikhef Amsterdam EGEE III SA3 Kickoff Meeting, May
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Patch Preparation SA3 All Hands Meeting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSA3.4.1 “The process document” Oliver Keeble.
II EGEE conference Den Haag November, ROC-CIC status in Italy
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
2007/05/22 Integration of virtualization software Pierre Girard ATLAS 3T1 Meeting
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Grid Engine batch system integration in the EMI era
CESGA QR2 SA1-SWE Partner Coordination Meeting 2 CICA, Sevilla
Workload Management Workpackage
OpenPBS – Distributed Workload Management System
Andreas Unterkircher CERN Grid Deployment
Farida Naz Andrea Sciabà
The CREAM CE: When can the LCG-CE be replaced?
Outline Introduction Objectives Motivation Expected Output
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Cristina del Cano Novales STFC - RAL
Basic Grid Projects – Condor (Part I)
Mike Becher and Wolfgang Rehm
Gonçalo Borges Jornadas LIP – 21/22 Dezembro Peniche
Sun Grid Engine.
Condor-G Making Condor Grid Enabled
Presentation transcript:

Enabling Grids for E-sciencE SGE J. Lopez, A. Simon, E. Freire, G. Borges, K. M. Sephton All Hands Meeting Dublin, Ireland 12 Dec 2007 Batch system support in EGEE-III

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 2 Outline Priorities and sharing experience Who is Who SGE in EGEE-III New YAIM functions Batch Systems stress tests Special configurations To come... References

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 3 Who is Who London e-Science College, Imperial College Keith Sephton  Principal maintainer of SGE Information Provider  IP latest version in certification 0.9 for lcg-CE LIP Gonçalo Borges  Principal SGE Yaim functions developer  SGE rpm packager

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 4 Who is Who RAL and CESGA Ral apel team/ Pablo Rey (Cesga)‏  Apel sge developers  Certified version glite-apel-sge CESGA Javier Lopez Esteban Freire Alvaro Simon  SGE JobManager maintainers and developers  SGE certification testbed  SGE stress testbed

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 5 SGE in EGEE-III SGE Roadmap SGE Patch #1474 released.  Certification testbed in progress  Patch originally designed for SL3  Tested in SL4 ...and only minor issues encountered  New fixed version developed by Gonçalo Borges SGE will be in PreProduction system.  Predeployment tests. Which sites?  SGE released to PPS sites

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 6 SGE in EGEE-III SGE will be in Production sites  At this moment only a few sites are using SGE without support  EGEE Support for SGE in production sites  SGE yaim functions in production repositories

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 7 New Yaim functions SGE Yaim functions Configuring CE with SGE batch system server  /opt/glite/yaim/bin/yaim -c -s site-info.def -n lcg-CE -n SGE_server -n SGE_utils  SGE server can be in another box: SGE_SERVER=$CE_HOST  glite-yaim-sge-server package provides SGE_server This function creates the necessary configuration files to deploy a SGE QMASTER BATCH Server  glite-yaim-sge-utils package provides SGE_utils These functions configures the SGE IP, SGE JobManager and apel to SGE parser

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 8 New Yaim functions SGE Yaim functions Configuring a WN with SGE  /opt/glite/yaim/bin/yaim -c -s site-info.def -n WN -n SGE_client  glite-yaim-sge-client package provides SGE_client Enables glite-WN to work as a SGE exec node

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 9 Batch systems stress tests Not optimal for parallel aplications Complex configuration CPU harvesting Special ClassAds language Dynamic check-pointing and migration Mechanisms for Globus Interface Condor Two separate products -> Two separate configurations No user friendly GUI to configuration and management Software development uncertain Bad documentation Very well known because it comes from PBS: Torque=PBS+bug fixes Good integration of parallel libraries Flexible Job Scheduling Policies Fair Share Policies, Backfilling, Resource Reservations Very good support in gLite Torque/ Maui Expensive comercial product Flexible Job Scheduling Policies Advance Resource Management Checkpointing & Job Migration, Load Balacing Good Graphical Interfaces to monitor Cluster functionalities LSF ConsProsLRMS

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 10 Batch systems stress tests open source job management system Grid Engine, an open source job management system developed by Sun Queues are located in server nodes and have attributes which caracterize the properties of the different servers execution features  A user may request at submission time certain execution features Memory, execution speed, available software licences, etc  Submitted jobs wait in a holding area where its requirements/priorities are determined It only runs if there are queues (servers) matching the job requests N1 Grid Engine, commercial version including support from Sun Some Important Features operating system Extensive operating system support Flexible Scheduling Policies: Flexible Scheduling Policies: Priority; Urgency; Ticket-based: Share-Based, Functional, Override Subordinate Queues Supports Subordinate Queues Array Jobs Supports Array Jobs Interactive Jobs Supports Interactive Jobs (qlogin)‏ Complex Resource Atributes Shadow Master Hosts Shadow Master Hosts (high availability)‏ (ARCo)‏ Accounting and Reporting Console (ARCo)‏ Tight integration of parallel libraries Calendars Implements Calendars for Fluctuating Resources Check-pointing and Migration Supports Check-pointing and Migration DRMAA Supports DRMAA 1.0 Transfer-queue Over Globus (TOG)‏ Intuitive Graphic Interface  Used by users to manage jobs and by admins to configure and monitor their cluster Good Documentation: Good Documentation: Administrator’s Guide, User’s Guide, mailing lists, wiki, blogs Enterprise-grade scalability:10,000 nodes per one master (promised )‏

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 11 Batch systems stress tests SGE stress testbed Based on the Torque/maui tests Good response with a large number of jobs SGE daemons has a good memory behaviour By default SGE doesn't support more than 100 queued jobs  SGE configuration should be changed to avoid this: qconf -msconf maxujobs 7000 max_functional_jobs_to_schedule 7000 qconf -mconf finished_jobs 7000 max_u_jobs 7000

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 12 Batch systems stress tests SGE stress testbed

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 13 Batch systems stress tests LRMS memory management after stress tests during a large time period.

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 14 Special configurations SGE prod configuration at CESGA CESGA production site has two batch systems running at same CE  Torque/Maui  SGE (only available for CESGA VO)‏ SGE server is in another machine (SVGD)‏  CE submits jobs to SVGD using our SGE JM (qsub)‏  SVGD nodes are in another network (private Ips)‏  To send a job we should specify our desired batch system in our JDL Requirements = other.GlueCEUniqueID == "ce2.egee.cesga.es:2119/jobmanager- lcgsge-GRID" Requirements = other.GlueCEUniqueID == "ce2.egee.cesga.es:2119/jobmanager- lcgpbs-cesga"

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 15 Special configurations CESGA Production Schema UI CE SVGD WN TAR-WN Torque/maui SGE JobManager pbs_server SGE batch server: Sgeqmaster sgeschedd Nodes with private ips sgeexecd Nodes with public ips pbs_mom gLite-WN 3.1 SGE JobManager sends a qsub $SGE_ROOT/default/common/act_qmaster

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 16 To come... SGE for cream-CE A new SGE BLAHP parser. Gridice sensor for SGE. New gridice sensor for SGE daemons like PBS.

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 17 References SGE stress tesbed Grid Engine N1 Grid Engine SGE Wiki Page

Enabling Grids for E-sciencE SA3 All Hands Meeting Dublin 18 Thanks Thank you!