Mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Slides:



Advertisements
Similar presentations
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Advertisements

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Mardi 30 mars 2010 Lavoisier : a way to integrate heteregeneous monitoring systems. Cyril LOrphelin IN2P3/CNRS Computing Centre, Lyon, France.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Wofgang Thöne, Institute For Scientific Computing – EGEE-Meeting August 2004 Welcome to the User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Grid Infrastructure and Operations Maite.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
GOCDB A repository for a worldwide grid infrastructure G. Mathieu, A. Richards, J. Gordon, C. Del Cano Novales, P. Colclough, M. Viljoen CHEP09, Prague,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Hélène Cordier COD-20, CNRS-IN2P3, CSC.
EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EG recent developments T. Ferrari/EGI.eu ADC Weekly Meeting 15/05/
INFSO-RI Enabling Grids for E-sciencE GLOBAL GRID USER SUPPORT THE MODEL AND EXPERIENCE IN LCG/EGEE Gilles Mathieu(1), Torsten Antoni(2),
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
INFSO-RI Enabling Grids for E-sciencE EGEE 1 st EU Review – 9 th to 11 th February 2005 CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE is a project funded by the European Union under contract IST User support in EGEE Alistair Mills Torsten Antoni EGEE-3 Conference 20 April.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD21 22 Sept 2009 Forum & COD-22 since COD21 until EGI Hélène Cordier COD-22, CNRS-IN2P3,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Parallel sessions Hélène Cordier COD-20, CNRS-IN2P3,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Operations Automation Team KoM, May ROC VIEW (SWE)‏ Javier Lopez Cacheiro/
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
Grid Monitoring and Operations SAM Development Team CERN IT/GD Tier2 Admin Workshop 03 Dec. 2006, Mumbai.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CIC portal Requirements from users WLCG service.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
Vendredi 19 février 2016 CIC portal development status and TODO list Gilles Mathieu, Osman Aidel, Cyril L’Orphelin IN2P3/CNRS Computing Centre, Lyon, France.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
INFSO-RI Enabling Grids for E-sciencE User and Virtual Organisation Support in EGEE Flavia Donno, CERN Torsten Antoni, FZK Alistair.
Mardi 8 mars 2016 Status of new features in CIC Portal Latest Release of 22/08/07 Osman Aidel, Hélène Cordier, Cyril L’Orphelin, Gilles Mathieu IN2P3/CNRS.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Operations Portal Development Update on Requirements Cyril L'Orphelin IN2P3/CNRS.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operational Procedures (Contacts, procedures,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regionalisation summary Prague 1.
INFSO-RI Enabling Grids for E-sciencE GOCDB Requirements John Gordon, STFC.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
6 th CIC on Duty meeting Lyon 27-29/03/2006 Enabling Grids for E-sciencE Grid INTER-Operations Hélène Cordier EGEE/WLCG Operations IN2P3 Computing Centre.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
TSA1.4 Infrastructure for Grid Management Tiziana Ferrari, EGI.eu EGI-InSPIRE – SA1 Kickoff Meeting1.
EGEE is a project funded by the European Union under contract IST GGUS-ROCs Interface status update Marco Verlato INFN – Sezione di Padova.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Operations Portal OTAG September, 21th 2011 Cyril L’Orphelin – CCIN2P3/CNRS.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI COD activity in EGI-InSPIRE Marcin Radecki CYFRONET, Poland & COD Team 9/29/2016.
Job monitoring and accounting data visualization
Ian Bird GDB Meeting CERN 9 September 2003
POW MND section.
Helene Cordier, CNRS-IN2P3 Villeurbanne, France
Lavoisier : a way to integrate heteregeneous monitoring systems.
Operations & Coordination Tools
The CCIN2P3 and its role in EGEE/LCG
Maite Barroso, SA1 activity leader CERN 27th January 2009
EGEE Operation Tools and Procedures
Presentation transcript:

mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France

Contents CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)3 Use tools Each actor can use a set of operational tools (provided, integrated or interfaced) REGIONAL CENTER SITE USER OPERATOR VO MANAGER Tools (CIC Portal) Communicate Track, report, diagnose and follow-up problems Manage static information about my VO Report on site activity, submit tests, configure

What do people connect to the CIC portal for ? Av connections Dec 2004-Dec 2007

Connections and process

Tasks handled by CIC portal Development team Between October 2006 and February 2007 Tasks handled by CIC portal Development team between february 2007 and january 2008 Between February 2007 and January 2008

Contents CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

Latest changes in 6 months Last technical changes –authentication is now based on full certificate DN instead of CN Work on VO ID cards –changes in Database schema for VO/VOMS information –VO ID card interface improved –Integration of the YAIM VO Configurator to the CIC portal –Downloadable XML dump of VO ID card info Scheduled downtimes procedure Integration of the regional 1rst line support dashboard – prototype with CE

CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover On-going developments

What is left for next release in March 2159 Adapt to new components released into production, cf YAIM tool Development of a new version report taking into account several feedback Follow SAM migration to gridview on CIC portal side  IDLE Internal Tasks include quick fixes/bug fixes, documentation, background clean-up work, code optimization/prospective for EGEE-III.

09/03/2016ARM Meeting, EGEE’07, Budapest11 COD activity CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)12 A tool for Grid Operators: COD dashboard Operato r Ticketing system Sites info Monitoring tool #1 Monitoring tool #2 Monitoring tool #n Mail client MANY ENTRY POINTS Monitoring tool #2 Operato r Ticketing system Sites info Monitoring tool #1 Monitoring tool #n Mail sender Dashboard SINGLE ENTRY POINT Start of EGEE Now

09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)13 Interaction with EGEE services Interaction with EGEE services FZK, Karlsruhe, Germany GGUS ASGC, Taipei, Taiwan Gstat CERN, Geneva, Switzerland SAM GOC-DB http GIIS status per site - Create ticket - Update ticket SOAP - View ticket Test results on nodes XSQL-based service - Site info - Scheduled downtimes SQL queries IN2P3-CC, Lyon, France OPERATIONS PORTAL Site4 Site2 Site3 Site1 ticket #14 ticket #32 No ticket ticket #28 status

09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)14 Outline CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

Statistics % of opened ticketsCESESRMRGMAsBDII October November December Solution time [hours]OctNovDec cod tickets ggus tickets ass. To ROCs ALL SU

09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)16 CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Duties and Working groups Zoom on Failover

COD Duties Rotations of 10 federations/teams -- 1/5 weeks. Quarterly face-to-face meetings to update tools, procedures and uniformize working habits. =================================== 10 federations over 18 months in EGEE-I Working groups for over 18 months now….

There is more to it …. Straightforward mandate working groups: GSTAT -- TW, SAM -- CERN, SAMAP – CE, topped by - Tools for Improvement for COD, TIC – CE (EGEE’07)

Working groups mandate - Integration of the existing tools CIC– FR Integration platform of all COD tools to ease-up the daily operational job - Improvement of BEST PRACTICES -- DE-CH Identifity, raise and analyse with COD how to have homogeneous operations  - Release of updated documentation OPM –SE Documentation under constant evolution - Set-up of Failover Mechanisms for GRID CORE SERVICES – SWE, What is done at a federation level, what is done at the project level (need help from JShiers group), what could be done (operational point of view) and what is needed at the ROC/Site level (from a m/w point of view). - Set-up of High Availability strategy of the operational tools for CODs FAILOVER– IT

09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)20 Failover working group CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover for Operational Tools

EGEE Failover: purpose Propose, implement and document failover procedures for the collaboration, management and monitoring tools used in EGEE/WLCG Grid. –Solution is based on DNS and consists in: mapping the service name to one or more destinations update this mapping whenever some failure is detected Geographical failover for the EGEE-WLCG Grid collaboration tools –CHEP 2007, Victoria BC, Canada (September 2007)

How the system works: DNS switch

COD Work aspects to keep in EGEE IIII Dedication : Working groups recognized within federations to provide expertise and by federations to make the needs come to the central operations. Collaboration : Up to now, each federation had found a way to contribute actively to improve their COD work environment, when not proactively leading a working group. Also, each person/tool developper/expert recognized as of « global interest » eventhough out of COD scope has been integrated happily in this « closed community », e.g SAMAP  TIC scope to monitor this aspect with Nagios prototype for example. Flexibility : Purpose of the groups to evolve together with their mandate with time and the upcoming of the needs e.g. Core grid services HA, EGI Anticipation : e.g. Strategy of the Operational Failover Working Group. Experiment : e.g regionalisation of tools and the future modular « NGI dashboards » to widen the CE 1rst line support experience.

COD Work aspects to make evolve in EGEE IIII Mandate and Assessment of the COD activity  Integration of NDGF/NE as a COD team – other teams ?  Catch-all and global operations center -- what core services are to be monitored centrally, and how to monitor them and how to properly switch to backup -- How to aggregate local data and what local data would be concerned  Assess metrics in order to assess the most problematic m/w components, recurrently unreliable sites  Operational tools reliability assessment /ENOC test as a start base?  Strenghten need on HA/Failover of operational tools and grid core services Vision of the COD tools long-term evolution : 1 set of tools /federation + aggregation? Which set of tools is to be regionalized ? SAM, GOC DB, COD? what else? How are they going to interact => need for a global schema, NOW.

COD Work aspects to make evolve in EGEE IIII Leverage on « project labeled » tools in order for operational use-cases for not to remain « pending ».  developements strategy/priorities are coherent. -- data workflow – synch GOCDB/BDII/SAM/COD -- development strategy – depends on the stretegy of the COD tools long-term evolution -- priority decision workflow – Who and how to drive the « project labeled » tools requests priority for operational use-cases for not to remain « pending ». - critical tests monitoring/accounting or ARC CE. - ca update procedure, - need for SAM failover…  staffing is adequate for proper reactivity not only for bugfix. Interoperability/interoperations (item to be followed up) –OSG : rather informal for the moment, BUT NOW, users do have problems and sites are the relay of their users cf GGUS ticket –NDGF : existing critical test monitoring ? and what are the consequences on operational procedures?

Conclusions and References Where, how, when do we adress these topics?? Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations. References: CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations Grid 2007 (IEEE), Austin Tx, United-States (September 2007) Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007, Victoria BC, Canada (September 2007)