Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.

Slides:



Advertisements
Similar presentations
July 2010 D2.1 Upgrading strategy Javier Soto Catalog Release 3. Communities.
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Release Process Maria Alandes Pradillo.
DBS to DBSi 5.0 Environment Strategy Quinn March 22, 2011.
Africa & Arabia ROC tutorial The GSTAT2 Grid Monitoring tool Mario Reale GARR - Italy ASREN-JUNET Grid School - 24 November 2011 Africa & Arabia ROC Tutorial.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
The HEPiX IPv6 Working Group David Kelsey EGI TF, Prague 18 Sep 2012.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
Analysis trains – Status & experience from operation Mihaela Gheata.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
DPM Python tools Ivan Calvet IT/SDC-ID DPM Workshop 10 th October 2014.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EMI INFSO-RI EMIR integration in BDII Maria Alandes Pradillo (CERN) Information System Product Team.
European Middleware Initiative (EMI) The Software Engineering Model Alberto Di Meglio (CERN) Interim Project Director.
EMI INFSO-RI Software Quality Assurance in EMI Maria Alandes Pradillo (CERN) SA2.2 Task Leader.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Kati Lassila-Perini EGEE User Support Workshop Outline: – CMS collaboration – User Support clients – User Support task definition – passive support:
WLCG Information System Use Cases Review WLCG Operations Coordination Meeting 18 th June 2015 Maria Alandes IT/SDC.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Oracle eBusiness Financials R12 Oracle Receivables Functional Overview TCS Oracle Practice.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSA3.4.1 “The process document” Oliver Keeble.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI SAM New Requirements from the SA1 Survey.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Requirements Status EGI.eu UCB
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI 2 nd level support training Marian Babik, David Collados, Wojciech Lapka,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI GLUE 2: Deployment and Validation Stephen Burke egi.eu EGI OMB March 26 th.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
WLCG Information System Status Maria Alandes Pradillo, CERN CERN IT Department, Support for Distributed Computing Group GDB 9 th September 2015.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional tools use cases overview Peter Solagna – EGI.eu On behalf of the.
Storage Accounting John Gordon STFC GDB, Lyon 6 th April2011 GDB January 2012.
Implementation of GLUE 2.0 support in the EMI Data Area Elisabetta Ronchieri on behalf of JRA1’s GLUE 2.0 Working Group INFN-CNAF 13 April 2011, EGI User.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
The Grid Information System Maria Alandes Pradillo IT-SDC White Area Lecture, 4th June 2014.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Information system workshop Stephen Burke egi.eu EGI TF Madrid September.
WLCG Accounting Task Force Introduction Julia Andreeva CERN 9 th of June,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Middleware Update GDB, 9 th February 2011 Slides by Maria Alandes.
EMI is partially funded by the European Commission under Grant Agreement RI EMI Status And Plans Laurence Field, CERN Towards an Integrated Information.
Development Environment
EGI Operations Management Board
gLite Information System
Short term improvements to the Information System: a status report
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
The CREAM CE: When can the LCG-CE be replaced?
SRM2 Migration Strategy
GLUE 2 Support in gLite Data Management
Monitoring of the infrastructure from the VO perspective
Stephen Burke egi.eu EGI TF Prague September 20th 2012
Presentation transcript:

Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013

Overview  Part one  Introduction  glue-validator libraries  Command line options  Output formats  Error messages  Part two  Current status of GLUE 2 validation  Future validation process  Long term goals  Part three  Most common GLUE 2 errors and how to fix them EGI TF Madrid 2 September 2013

Part one – What is and how to use glue-validator EGI TF Madrid 3 September 2013

Introduction  Glue-validator is a command line tool written in python  It is able to validate against GLUE 1.3, GLUE 2.0 and EGI profile for GLUE 2.0 EGI TF Madrid 4 data types data types data types GLUE 1.3 GLUE 2.0 EGI profile GLUE 2.0 Entry Test EGIProfileTest Validator Known Issues September 2013

Where to get glue-validator  EMI/UMD repositories  EPEL repositories  Obsolete version right now!  To be updated in the upcoming weeks  Midmon server  Limited to site validation   Access to CERN AFS?  Latest version installed in malandes public area September 2013 EGI TF Madrid 5

glue-validator libraries  The data library contains a description of the GLUE schema:  Object Classes  Attributes (type, single/multi valued, mandatory or not) EGI TF Madrid 6 September 2013

glue-validator libraries  The type library contains a description of the types as defined by the GLUE schema  Enumerations are defined within the OGF GLUE working group  New values will be added as needed EGI TF Madrid 7 September 2013

glue-validator libraries  There are different libraries containing the actual tests:  EntryTest: general tests for all attributes  EGIProfileTest: specific tests per attribute EGI TF Madrid 8 September 2013

glue-validator libraries  The KnownIssues library contains a list of tests for GLUE attributes that are wrongly published due to known issues in the middleware EGI TF Madrid 9 September 2013

Scope of this training  Validation against the EGI profile for GLUE 2  It specifies how the information schema should be used in EGI  How information should be interpreted  What uses are likely  How information may be validated EGI TF Madrid 10 data types EGI profile GLUE 2.0 Entry Test EGIProfileTest Validator Known Issues September 2013

Command Line Options EGI TF Madrid 11 September 2013

Command Line Options  Very similar to ldapsearch glue-validator –H hostname –p port –b binding ldapsearch –x –LLL –h hostname –p port –b binding  By default, validation is against the EGI profile for GLUE 2.0  Some interesting options  Verbosity (default is 1)  Actually level 0 and 1 are the same → to be fixed EGI TF Madrid 12 September 2013

Command Line Options  Some more interesting options  Exclude known issues  This is a very useful option for sites  Avoids running tests that are known to fail due to bugs in the info providers  This option will be always used in production  Timeout  Useful when validating top BDIIs  Separator  Useful to manipulate detailed output EGI TF Madrid 13 September 2013

Output format  Nagios output with different verbose options  Level 0 and 1  Number of errors, warnings and info messages CRITICAL - errors 9, warnings 483, info 1825 | errors=9;warnings=483;info=1825  Level 2  Details per message type CRITICAL - errors 9, warnings 480, info 1825 | errors=9;warnings=480;info=1825 Summary per type of error, warning and info message: E002 - Obsolete entry (GLUE2EntityValidity): 9 I012 - Unknown VO name in share (GLUE2EntityOtherInfo): 21 EGI TF Madrid 14 September 2013

Output format  Level 3  Affected DN, attribute and published value I012 Description: Unknown VO name in share I012 Affected DN: GLUE2ManagerID=ce207.cern.ch_ComputingElement_Manager GLUE2ServiceID=ce207.cern.ch_ComputingElement GLUE2GroupID=resource GLUE2DomainID=CERN-PROD o=glue I012 Affected attribute: GLUE2EntityOtherInfo: Share I012 Published value: na48 EGI TF Madrid 15 September 2013

Error messages  Three types of messages  ERROR:  Values that are definitely invalid  WARNING  Values that are likely, but not certain, to be wrong  INFO  Values that may be valid but that are unknown or seem wrong to glue- validator  Only ERROR messages will raise a CRITICAL error in Nagios  Twiki giving more details on each error  Tips on how to fix the error  Bug in the Information provider  Misconfiguration of the site  Whether there are any known issues EGI TF Madrid 16 September 2013

Error messages EGI TF Madrid 17 Easily identify error number Guidelines on what to do to get rid of the error Whether there are any known bugs affecting the attribute publication September 2013

Some examples  Remember to export the validator libraries in PYTHONPATH export PYTHONPATH=$PYTHONPATH:/afs/cern.ch/user/m/malandes/public/glu e-validator/usr/lib/python2.4/site-packages/  Site validation glue-validator -H prod-bdii -p b o=glue glue-validator -H prod-bdii -p b o=glue –v 2 glue-validator -H prod-bdii -p b o=glue –v 3 glue-validator -H prod-bdii -p b o=glue –v 3 –r “ “ glue-validator -H lcg-bdii -p b GLUE2DomainID=CERN- PROD,GLUE2GroupID=grid,o=glue September 2013 EGI TF Madrid 18

Some examples  Resource validation glue-validator -H prod-bdii -p b GLUE2GroupID=resource,o=glue glue-validator -H prod-bdii -p b "o=glue '(objectClass=GLUE2ComputingService)'“ glue-validator -H prod-bdii -p b GLUE2ServiceID=ce206.cern.ch_ComputingElement,GLUE2GroupID=res ource,GLUE2DomainID=CERN-PROD,o=glue  Top BDII validation  Do you really want to do this? glue-validator -H lcg-bdii -p b o=glue September 2013 EGI TF Madrid 19

Example with verbosity level 2 September 2013 EGI TF Madrid 20

Example with verbosity level 3 September 2013 EGI TF Madrid 21

Example with separator September 2013 EGI TF Madrid 22

Part two – How to improve things with glue-validator EGI TF Madrid 23 September 2013

Current status of GLUE validation  Monthly reports since March 2013  Only for WLCG sites for practical reasons  Manual review of the glue-validator results  Ticketing sites  This approach helped tuning glue-validator  And already improved the overall quality!  Some improvements so far but…  This approach is not sustainable EGI TF Madrid 24 September 2013

Future validation process  Deploy glue-validator as a Nagios probe  Automatic and stable validation process  glue-validator already deployed in Midmon  As soon as probe is validated it will become a production probe  Sites will get tickets from ROD team for critical errors if not fixed after 24h  glue-validator will be also used in the EGI middleware acceptance tests EGI TF Madrid 25 September 2013

Long term goals  Integration of glue-validator in the resource BDII  Enforce early validation in the development stage  Requires agreement and coordination with product teams  Change of current way of working  Is it better to publish nothing than something wrong? EGI TF Madrid 26 September 2013

Part three – Most common GLUE 2 errors and how to fix them EGI TF Madrid 27 September 2013

Common errors  Operating System Information  Operating system names and versions: he_OS_name  Easy to fix in YAIM:  CE_OS → GLUE2ExecutionEnvironmentOSName  CE_OS_RELEASE → GLUE2ExecutionEnvironmentOSVersion EGI TF Madrid 28 September 2013

Common errors  Batch system attributes  In many places default values are published  It is OK as far as this is what you want!  Configuring the batch system seems to be a complex task  Some guidelines here:  ng#GluePolicy_GLUE2ComputingShare_a ng#GluePolicy_GLUE2ComputingShare_a   Many GLUE attributes depend on the batch system configuration! EGI TF Madrid 29 September 2013

Common errors  VO, WLCG and Grid Infrastructure names  VO names:   WLCG names:   Grid Infrastructure names:  formation formation  What to do if you still want to publish a value that does not exist in any of the above?  Please, let us know! EGI TF Madrid 30 September 2013

Pending known issues  Storage related errors are done  Computing-related errors still to be evaluated  Marked as “?” in Error Twiki  EMonitoring#444444_waiting_jobs EMonitoring#444444_waiting_jobs  List of known issues may be modified  A final version ready for validation will include these ones as well! September 2013 EGI TF Madrid 31

Feedback  Glue-validator not yet used in production  Feedback for the tests  Are the tests useful?  Are they reporting properly?  This all may have an impact on GLUE 2 profile too!  Feedback for the error messages  Are the tips useful?  Feedback for the known issues  Are there any more known issues to be added?  If the sites have nothing to do for a certain error, it should be a known issue!  And feedback for anything else! (bugs, usability, etc)  Please use GGUS September 2013 EGI TF Madrid 32

Useful links  glue-validator guide  glue-validator code  EGI profile for GLUE  Error messages  GLUE 2 validation monitoring EGI TF Madrid 33 September 2013