EGEE is a project funded by the European Union under contract IST-2003-508833 Workload management system testing : - what exists - further testing planned.

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
Job Submission The European DataGrid Project Team
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract IST JRA1 Testing Activity: Status and Plans Leanne Guy EGEE Middleware Testing.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Glite I/O Storm Testing in EDG-LCG Framework Elena Slabospitskaya, Vadim Petukhov, (IHEP, Russia) Gilbert Grosdidier, (CNRC, France) NEC'2005, Sept 16.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
FP6−2004−Infrastructures−6-SSA IPv6 and Grid Middleware: the EUChinaGRID experience Gabriella Paolini – GARR Valentino.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite IPv6 compliance project tests Further.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Grid Workload Management Massimo Sgaravatto INFN Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
DataGRID WPMM, Geneve, 17th June 2002 Testbed Software Test Group work status for 1.2 release Andrea Formica on behalf of Test Group.
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
LCG EGEE is a project funded by the European Union under contract IST LCG PEB, 7 th June 2004 Prototype Middleware Status Update Frédéric Hemmer.
EGEE is a project funded by the European Union under contract IST Tools survey status, first experiences with the prototype Diana Bosio EGEE.
1 Grid2Win: porting of gLite middleware to Windows Dario Russo INFN Catania
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid2Win: Porting of gLite middleware to.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
JRA1 Testing Current Status Leanne Guy Testing Coordination Meeting, 13 th September 2004 EGEE is a project funded by the European.
Tests at Saclay D. Calvet, A. Formica, Z. Georgette, I. Mandjavidze, P. Micout DAPNIA/SEDI, CEA Saclay Gif-sur-Yvette Cedex.
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.
II EGEE conference Den Haag November, ROC-CIC status in Italy
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Datamat Status Report F. Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.
Practical using C++ WMProxy API advanced job submission
Grid2Win Porting of gLite middleware to Windows XP platform
CEMon
WP1 WMS release 2: status and open issues
Andreas Unterkircher CERN Grid Deployment
Workload Management System ( WMS )
Grid2Win: Porting of gLite middleware to Windows XP platform
Introduction to Grid Technology
Grid2Win: Porting of gLite middleware to Windows XP platform
Testing for patch certification
Short update on the latest gLite status
Leanne Guy EGEE JRA1 Test Team Manager
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Porting LCG to IA64 Andreas Unterkircher CERN openlab May 2004
Presentation transcript:

EGEE is a project funded by the European Union under contract IST Workload management system testing : - what exists - further testing planned Mario Reale CERN JRA-1 middleware testing team JRA1 all-hands Padova, November 15-17,

JRA1 all-hands meeting, November17 th, Contents Some initial general remarks Current WMS components under test Tests done and main outcome  documentation  bugs & problems  installation and configuration tools Plans to test WMS components in future :  What to test further  Tools

JRA1 all-hands meeting, November17 th, some initial remarks - Initially ( 2 months ago) following « webwritings » instructions - tarballs, modified executables from spread sources, cut & paste ad-hoc procedures…. [ i.e. the real original prototype taste ] - non standard paths and locations - Now : - S/W versioning OK - build system fully in place - pure RPMs installations - « 2 click away » installations for a given Iteam build: -./glite-ce-install.sh -Downloads and installs all required RPMs - $GLITE_LOCATION/etc/ /config/scripts/glite-ce-config -Configures all services on the machine and starts daemons - Still a few minor points by hand – config scripts still need improvement

JRA1 all-hands meeting, November17 th, WMS functional components UI WN

JRA1 all-hands meeting, November17 th, Workload software components Python wms client UI RGMA client gI/o client Globus clients UI WMS CE LB WN 1 WN 2 WN 3 WN 4 Network Server Workload Manager Local Logger Job Controller Condor-C master schedd collector negotiator startd launcher advertiser Globus gatekeeper Condor-C master schedd BLAHPD LRMS ( PBS serv,sched LSF serv ) Grid FTP Proxy renewd Log Monitor PBS mom PBS mom PBS mom PBS mom logd interlogger bookkeeping srv

JRA1 all-hands meeting, November17 th, WMS components tested so far CE - installed,configured,tested with release I CEmon - not yet tested. WMS - installed,configured,tested with release I LB - installed,configured,tested with release I WN - although an LSF recipe exists, tested only as PBS client. LSF via Quattor ASAP ( already available on the prototype tb)

JRA1 all-hands meeting, November17 th, Documentation No official gLite documentation available yet Some recipes available on the web, referring to manual installation available from JRA-1 WMS website Updated doc on all supported classAds /JDL is missing yes ( or just 1:1 with edg? - ) Some first documentation on API recently made available

JRA1 all-hands meeting, November17 th, Computing Element Documentation  The LSF based web recipe available at wm/lsfnode_install.shtml requires some updates, PBS and Torque not available Installation and configuration  Post-installation script available – needs quite some refinement Bugs and Issues  LCAS voms plugins related problems [ 5454 ] - SOLVED Gridsite recompiled ( X509_supported_extension unresolved) Gridsite need to be rebuild using the globus libs instead of -I /usr/include/openssl  LCMAPs / LCMAPs directories for plugins - SOLVED Concerns:  Gridsite support or newer Gridsite (1.1.x) – if not already solved

JRA1 all-hands meeting, November17 th, Workload Management System node Bugs and Issues  Bugs in old blahph versions : SOLVED  Wrong boost library dynamically linked – ( wm, ns daemons not starting ) [ bug # 5364 ] SOLVED  Cannot locate condor_schedd : PENDING [ bug 5525 ] Concerns and Problems - The Condor problem is currently a serious one, preventing further functional tests. contact to developers activated - Some instability in the network server observed on the prototype testbed

JRA1 all-hands meeting, November17 th, “Houston, we have a small problem here..” [lxb1409] ~/prototipo > glite-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Failed) Exit code: 0 Status Reason: Got a job held event, reason: Error locating schedd Destination: lxb1422.cern.ch:2119/blah-pbs-short Submitted: Thu Nov 11 00:18: CEST *************************************************************

JRA1 all-hands meeting, November17 th, LB server, Worker Node Documentation  Manual install guide available for LB server  No doc on how to install and configure Worker Nodes Bugs and Issues  The bkserver start script is missing “&” – shell gets blocked  Careful working PBS configuration is quite painful and demanding

JRA1 all-hands meeting, November17 th, First basic Test Cases Defined first 9 basic tests for the Workload System  Valid proxy  Hello World  100 simple jobs without resubmission  100 simple jobs with resubmission  Very large input / output data files ( 2 jobs with a 1 GB file in input/output sandbox – compute checksums and compare)  100 long lasting jobs, no resubmission, to check myproxy’s behaviour  Basic matchmaking : require a specific input file  Basic matchmaking : require a specific RunTimeEnvironment  Structured matchmaking : use many different classads matching at least on CE and InputData set

JRA1 all-hands meeting, November17 th, Many existing Test Code sources LCG certification test suite ( Gilbert Grodidier & LCG certification team )  General LCG certification test suite  / EDG WP8 perl suite ( Jean-Jacques Blaising )  Extracts info from the InfoSys on grid structure  Submits jobs simulating a generic HEP application and retrieves output Many different existing scripts inherited from the EDG era / WP8  Bash  Python  Perl EDG site certification test suite INFN EDG TSTG install and config test suite EDG TSTG test suite

JRA1 all-hands meeting, November17 th, LCG certification and testing suite Very comprehensive Perl OO test suite implemented for the LCG certification  Ranging from configuration tests, functionality, stress tests  Very effectively completed by a presentation logic to display all results on the web in a “click-and-get-further-information” approach  First basic tests already ported to GLITE and executed Example : testing the gatekeeper on the CE:  > cd /afs/cern.ch/user/g/grodid/public/grid/LCG  > source /afs/infn.it/project/datamat/ui/setup_ui.csh  > grid-proxy-init -valid 100:0  > setenv CRON_JOB_OLD yes  > opt/edg/bin/MainScript --forcingVO=egee --TList=DNS  > opt/edg/bin/MainScript --forcingVO=egee --TList=CEGate

JRA1 all-hands meeting, November17 th, LCG certification testsuite snapshot

JRA1 all-hands meeting, November17 th, Further Tests to be implemented Input data files on multiple SEs Stress Tests Very structured match making Full coverage of main bugs regression tests Further level of functionality :  DAGs and dependencies,  MPI & interactive jobs,  checkpointing,  GUI ( when provided ) Exploit different scheduling policies  Make clear how to proceed to test it : user side / wms configuration

JRA1 all-hands meeting, November17 th, Next steps LSF deployment in the testing testbed  Currently only have pbs Complete Integration of the missing components  InfoSys : RGMA, WMS Purchaser, CEmon  DM : Brokerinfo-like functionality  GAS developments (unified access point) and accessing services through it Further testing and refinement of post-installation scripts with iTeam

JRA1 all-hands meeting, November17 th, Most relevant bugs to be closed asap Condor-c [ 5525 ]  Most jobs are currently aborted Gridsite [ 5454 ]  A quick and dirty workaround hack exist. This problem should anyhow by now already have been solved in CV It needs to be tested. Missing bits and pieces of code, provided by no RPM [ 5099 ]  rpm -q --whatprovides /opt/glite/globus-conf/grid-functions file /opt/glite/globus-conf/grid-functions is not owned by any package

JRA1 all-hands meeting, November17 th, Conclusions The workload system as a whole is getting there Some workload components still affected by critical pending bugs The development of a comprehensive integrated test suite on WMS is in late : systematic, comprehensive development and re-writing/porting of test suites just started. feedback provided in many forms both to the developers and the iTEAM Also the adoption of Quattor will ease the deployment of the releases on the testing testbed