Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probes Requirement Review OTAG-08 03/05/2011. 2 Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT) https://rt.egi.eu/rt/Ticket/Display.html?id=1154.

Similar presentations


Presentation on theme: "Probes Requirement Review OTAG-08 03/05/2011. 2 Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT) https://rt.egi.eu/rt/Ticket/Display.html?id=1154."— Presentation transcript:

1 Probes Requirement Review OTAG-08 03/05/2011

2 2 Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT) https://rt.egi.eu/rt/Ticket/Display.html?id=1154 https://rt.egi.eu/rt/Ticket/Display.html?id=1154 ● Add information on the mpi flavor where checks are executed. CREAM CE will handle the attributes WholeNodes, HostNumber and SMPGranularity (as required by the EGEE MPI WG) in the EMI 1.0 release. Consequently JDL for org.sam.mpi.CE should be modified, taking into account these new requirements, to enable mpi jobs submission properly ● Changes to the LB probe (NGI_PL) https://rt.egi.eu/rt/Ticket/Display.html?id=1170 https://rt.egi.eu/rt/Ticket/Display.html?id=1170 ● A probe that would test the core LB functionality is needed. Checking the port accessibility is not sufficient. Both standard interface (listening on port 9000) and web service should be checked. The probe could try to use some functions from API to make sure the service is not dead/overloaded. The response time limit should be set and examined. ● Changes to the VOMS probe (NGI_PL) https://rt.egi.eu/rt/Ticket/Display.html?id=1169 https://rt.egi.eu/rt/Ticket/Display.html?id=1169 ● A probe that would test the VOMS core functionality is needed. The probe cannot just check the port accessibility. It should for example try to obtain a proxy from the server.

3 3 Requirements to be discussed (1/2) ● Direct submission to CREAM CE (NGI_IT) https://rt.egi.eu/rt/Ticket/Display.html?id=1156 https://rt.egi.eu/rt/Ticket/Display.html?id=1156 ● We strongly support the deployment of the CREAM CE direct job submission as described in the metric description page https://tomtools.cern.ch/confluence/display/SAMDOC/CREAMCE+DJS https://tomtools.cern.ch/confluence/display/SAMDOC/CREAMCE+DJS ● These probes are already deployed and just need to be included in NGI profile ● Monitor WMS status (NGI_FR) https://rt.egi.eu/rt/Ticket/Display.html?id=1195 https://rt.egi.eu/rt/Ticket/Display.html?id=1195 ● A new probe is needed to monitor the status of WMS. Usecase: when a WMS fails it is often detected by the fact that all CE probes for all sites fail. There should be a probe for the higher level service ● Actually there are probes which are checking WMS: https://tomtools.cern.ch/confluence/display/SAMDOC/WMS ● What would actually be better is to check CEs directly and not via WMS.

4 4 Requirements to be discussed (2/2) ● Modification to GGUS ticket NAGIOS probe (NGI_FR) https://rt.egi.eu/rt/Ticket/Display.html?id=1192 https://rt.egi.eu/rt/Ticket/Display.html?id=1192 ● developed ages ago - useful for site admin to know if there are tickets opened for their sites to have a sort of reminder but there is a long list of complains about this probe - ggus has already reminder for open tickets - moreover it is attached to site_bdii and creates problems with arc sites and if the ticket is rerouted the site still gets the alarms ● We propose to switch this off

5 5 Further input needed from the submitter (1/2) ● GLEXEC tests only on CE supporting glexec (NGI_IT) https://rt.egi.eu/rt/Ticket/Display.html?id=1153 https://rt.egi.eu/rt/Ticket/Display.html?id=1153 ● Can be closed after interaction with NGI_IT ● Fix to certificate-lifetime probe (NGI_PL) https://rt.egi.eu/rt/Ticket/Display.html?id=1171 https://rt.egi.eu/rt/Ticket/Display.html?id=1171 ● The probe should not report „expired certificate” when unable to access the service. ● Which probe reports this message? If the service is unavailable probe will report UNKNOWN.

6 6 Further input needed from the submitter (2/2) ● Probes to test VO application presence (NGI_CH) https://rt.egi.eu/rt/Ticket/Display.html?id=1209 https://rt.egi.eu/rt/Ticket/Display.html?id=1209 ● In order to ensure that a site supports a specific application, specific probes should test the site. This scenario should be supported as much as possible by the Nagios test framework. Also the application specific probes should be taken into account by brokers, so that eventually no more jobs would get sent to sites having a problem. ● Not sure that we can pass this as a requirement to EMI. This is something that each VO has to provide themselves. SAM provides mechanism to setup VO SAM instances. Regarding WMSes there is something called FCR which already takes into account results coming from SAM. At least this was the case before, I'm not sure what is the status now. I suggest asking FCR developers through GGUS directly. ● Change rep-WN default SE when needed (NGI_IT && NGI_FR): https://rt.egi.eu/rt/Ticket/Display.html?id=1155 / 1193 https://rt.egi.eu/rt/Ticket/Display.html?id=1155 ● Inde pendent CE and SE tests ● An error on the closeSE shouldn’t put error on the CE

7 7 General Requirements (1/2) ● Easy access to code of probes (NGI_CH) https://rt.egi.eu/rt/Ticket/Display.html?id=1206 https://rt.egi.eu/rt/Ticket/Display.html?id=1206 ● Currently all probes distributed with SAM are available in following two repositories: http://svnweb.cern.ch/guest/sam/trunk/probes/src/ http://svnweb.cern.ch/guest/sam/trunk/probes/src/ ● https://www.sysadmin.hep.ac.uk/svn/grid-monitoring/trunk/probe/hr.srce ● Working on documentation pointers for each probe which will be soon circulated around. Once EMI takes over probes I believe they will store them in standard repositories ● Detailed Error Reporting (NGI_CH) https://rt.egi.eu/rt/Ticket/Display.html?id=1205 https://rt.egi.eu/rt/Ticket/Display.html?id=1205 ● OK we'll pass it but some examples should be given in order to refine the requirement ● Local and remote probes (NGI_CH) https://rt.egi.eu/rt/Ticket/Display.html?id=1207 https://rt.egi.eu/rt/Ticket/Display.html?id=1207 ● Better clarify with usecases before passing to EMI

8 8 General Requirements (2/2) ● Enabling SNMP in grid monitoring (NGI_BA) https://rt.egi.eu/rt/Ticket/Display.html?id=1164 https://rt.egi.eu/rt/Ticket/Display.html?id=1164 ● We (University of Banja Luka Faculty of Electrical Engineering) are willing to provide effort in order to enable integration of grid monitoring data into standard NMS systems (not just Nagios) via SNMP "bridge" that collects the data from BDII, SAM, etc, processes and represents it in a suitable manner to the NMS via SNMP (in essence extending exiting SNMP agents via AgentX protocol). We have already done this for our limited needs in order to be able to use centralised monitoring system, that we use for all other monitoring including network and lower level services the grid depends on (NFS, DNS, etc), but we feel it would be a good approach to enable others to use similar setup. In essence, we are willing to do the work but would need inputs from other NGIs.


Download ppt "Probes Requirement Review OTAG-08 03/05/2011. 2 Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT) https://rt.egi.eu/rt/Ticket/Display.html?id=1154."

Similar presentations


Ads by Google