Probes Requirement Review OTAG-08 03/05/2011. 2 Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT) https://rt.egi.eu/rt/Ticket/Display.html?id=1154.

Slides:



Advertisements
Similar presentations
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
INFSO-RI Enabling Grids for E-sciencE XACML and G-PBox update MWSG 14-15/09/2005 Presenter: Vincenzo Ciaschini.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
UK NGI Operations John Gordon 10 th January 2012.
SchoolDude ArbiterGame Integration. FSDirect – Locations FSDirect’s Locations are the same as ArbiterGame’s Sites You can add a new location to your list.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Marian Babik, Luca Magnoni SAM Test Framework. Outline  SAM Test Framework  Update on Job Submission Timeouts  Impact of Condor and direct CREAM tests.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
UK NGI Operations John Gordon 15 th May Helpdesk Ticket Workflow NGI Core Services.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
EMI is partially funded by the European Commission under Grant Agreement RI Argus Policies Tutorial Valery Tschopp - SWITCH EGI TF Prague.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
EMI INFSO-RI Argus Policies in Action Valery Tschopp (SWITCH) on behalf of the Argus PT.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
LHCb Pilot Job Tests. We have not started this test due to several reasons: our analysis job submission (ganga) was not interfaced in time to DIRAC3,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
EMI is partially funded by the European Commission under Grant Agreement RI Argus Policies Tutorial Valery Tschopp (SWITCH) – Argus Product Team.
II EGEE conference Den Haag November, ROC-CIC status in Italy
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regionalisation summary Prague 1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional tools use cases overview Peter Solagna – EGI.eu On behalf of the.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
SRM v2.2: service availability testing and monitoring SRM v2.2 deployment Workshop - Edinburgh, UK November 2007 Flavia Donno IT/GD, CERN.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI VT report OMB Meeting 28 th February 2012.
Argus EMI Authorization Integration
NGI and Site Nagios Monitoring
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
MPI probes OMB Meeting 26th February 2013
Outline Introduction Objectives Motivation Expected Output
Discussions on group meeting
Danilo Dongiovanni INFN-CNAF
Presentation transcript:

Probes Requirement Review OTAG-08 03/05/2011

2 Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT) ● Add information on the mpi flavor where checks are executed. CREAM CE will handle the attributes WholeNodes, HostNumber and SMPGranularity (as required by the EGEE MPI WG) in the EMI 1.0 release. Consequently JDL for org.sam.mpi.CE should be modified, taking into account these new requirements, to enable mpi jobs submission properly ● Changes to the LB probe (NGI_PL) ● A probe that would test the core LB functionality is needed. Checking the port accessibility is not sufficient. Both standard interface (listening on port 9000) and web service should be checked. The probe could try to use some functions from API to make sure the service is not dead/overloaded. The response time limit should be set and examined. ● Changes to the VOMS probe (NGI_PL) ● A probe that would test the VOMS core functionality is needed. The probe cannot just check the port accessibility. It should for example try to obtain a proxy from the server.

3 Requirements to be discussed (1/2) ● Direct submission to CREAM CE (NGI_IT) ● We strongly support the deployment of the CREAM CE direct job submission as described in the metric description page ● These probes are already deployed and just need to be included in NGI profile ● Monitor WMS status (NGI_FR) ● A new probe is needed to monitor the status of WMS. Usecase: when a WMS fails it is often detected by the fact that all CE probes for all sites fail. There should be a probe for the higher level service ● Actually there are probes which are checking WMS: ● What would actually be better is to check CEs directly and not via WMS.

4 Requirements to be discussed (2/2) ● Modification to GGUS ticket NAGIOS probe (NGI_FR) ● developed ages ago - useful for site admin to know if there are tickets opened for their sites to have a sort of reminder but there is a long list of complains about this probe - ggus has already reminder for open tickets - moreover it is attached to site_bdii and creates problems with arc sites and if the ticket is rerouted the site still gets the alarms ● We propose to switch this off

5 Further input needed from the submitter (1/2) ● GLEXEC tests only on CE supporting glexec (NGI_IT) ● Can be closed after interaction with NGI_IT ● Fix to certificate-lifetime probe (NGI_PL) ● The probe should not report „expired certificate” when unable to access the service. ● Which probe reports this message? If the service is unavailable probe will report UNKNOWN.

6 Further input needed from the submitter (2/2) ● Probes to test VO application presence (NGI_CH) ● In order to ensure that a site supports a specific application, specific probes should test the site. This scenario should be supported as much as possible by the Nagios test framework. Also the application specific probes should be taken into account by brokers, so that eventually no more jobs would get sent to sites having a problem. ● Not sure that we can pass this as a requirement to EMI. This is something that each VO has to provide themselves. SAM provides mechanism to setup VO SAM instances. Regarding WMSes there is something called FCR which already takes into account results coming from SAM. At least this was the case before, I'm not sure what is the status now. I suggest asking FCR developers through GGUS directly. ● Change rep-WN default SE when needed (NGI_IT && NGI_FR): / ● Inde pendent CE and SE tests ● An error on the closeSE shouldn’t put error on the CE

7 General Requirements (1/2) ● Easy access to code of probes (NGI_CH) ● Currently all probes distributed with SAM are available in following two repositories: ● ● Working on documentation pointers for each probe which will be soon circulated around. Once EMI takes over probes I believe they will store them in standard repositories ● Detailed Error Reporting (NGI_CH) ● OK we'll pass it but some examples should be given in order to refine the requirement ● Local and remote probes (NGI_CH) ● Better clarify with usecases before passing to EMI

8 General Requirements (2/2) ● Enabling SNMP in grid monitoring (NGI_BA) ● We (University of Banja Luka Faculty of Electrical Engineering) are willing to provide effort in order to enable integration of grid monitoring data into standard NMS systems (not just Nagios) via SNMP "bridge" that collects the data from BDII, SAM, etc, processes and represents it in a suitable manner to the NMS via SNMP (in essence extending exiting SNMP agents via AgentX protocol). We have already done this for our limited needs in order to be able to use centralised monitoring system, that we use for all other monitoring including network and lower level services the grid depends on (NFS, DNS, etc), but we feel it would be a good approach to enable others to use similar setup. In essence, we are willing to do the work but would need inputs from other NGIs.