Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Similar presentations


Presentation on theme: "Mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France."— Presentation transcript:

1 mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France

2 Contents CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

3 09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)3 Use tools Each actor can use a set of operational tools (provided, integrated or interfaced) REGIONAL CENTER SITE USER OPERATOR VO MANAGER Tools (CIC Portal) Communicate Track, report, diagnose and follow-up problems Manage static information about my VO Report on site activity, submit tests, configure

4 What do people connect to the CIC portal for ? Av connections Dec 2004-Dec 2007

5 Connections and process

6 Tasks handled by CIC portal Development team Between October 2006 and February 2007 Tasks handled by CIC portal Development team between february 2007 and january 2008 Between February 2007 and January 2008

7 Contents CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

8 Latest changes in 6 months Last technical changes –authentication is now based on full certificate DN instead of CN Work on VO ID cards –changes in Database schema for VO/VOMS information –VO ID card interface improved –Integration of the YAIM VO Configurator to the CIC portal –Downloadable XML dump of VO ID card info Scheduled downtimes procedure Integration of the regional 1rst line support dashboard – prototype with CE

9 CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover On-going developments

10 What is left for next release in March 2159 Adapt to new components released into production, cf YAIM tool. 1559 Development of a new version report taking into account several feedback. 1920 Follow SAM migration to gridview on CIC portal side  IDLE Internal Tasks include quick fixes/bug fixes, documentation, background clean-up work, code optimization/prospective for EGEE-III.

11 09/03/2016ARM Meeting, EGEE’07, Budapest11 COD activity CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

12 09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)12 A tool for Grid Operators: COD dashboard Operato r Ticketing system Sites info Monitoring tool #1 Monitoring tool #2 Monitoring tool #n Mail client MANY ENTRY POINTS Monitoring tool #2 Operato r Ticketing system Sites info Monitoring tool #1 Monitoring tool #n Mail sender Dashboard SINGLE ENTRY POINT Start of EGEE Now

13 09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)13 Interaction with EGEE services Interaction with EGEE services FZK, Karlsruhe, Germany GGUS ASGC, Taipei, Taiwan Gstat CERN, Geneva, Switzerland SAM GOC-DB http GIIS status per site - Create ticket - Update ticket SOAP - View ticket Test results on nodes XSQL-based service - Site info - Scheduled downtimes SQL queries IN2P3-CC, Lyon, France OPERATIONS PORTAL Site4 Site2 Site3 Site1 ticket #14 ticket #32 No ticket ticket #28 status

14 09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)14 Outline CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover

15 Statistics % of opened ticketsCESESRMRGMAsBDII October391514116 November341418610 December29182198 Solution time [hours]OctNovDec cod tickets269268228 ggus tickets ass. To ROCs277281307 ALL SU364427709

16 09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)16 CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Duties and Working groups Zoom on Failover

17 COD Duties Rotations of 10 federations/teams -- 1/5 weeks. Quarterly face-to-face meetings to update tools, procedures and uniformize working habits. =================================== 10 federations over 18 months in EGEE-I Working groups for over 18 months now….

18 There is more to it …. Straightforward mandate working groups: GSTAT -- TW, SAM -- CERN, SAMAP – CE, topped by - Tools for Improvement for COD, TIC – CE (EGEE’07)

19 Working groups mandate - Integration of the existing tools CIC– FR Integration platform of all COD tools to ease-up the daily operational job - Improvement of BEST PRACTICES -- DE-CH Identifity, raise and analyse with COD how to have homogeneous operations  - Release of updated documentation OPM –SE Documentation under constant evolution - Set-up of Failover Mechanisms for GRID CORE SERVICES – SWE, What is done at a federation level, what is done at the project level (need help from JShiers group), what could be done (operational point of view) and what is needed at the ROC/Site level (from a m/w point of view). - Set-up of High Availability strategy of the operational tools for CODs FAILOVER– IT

20 09/03/2016The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007)20 Failover working group CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover for Operational Tools

21 EGEE Failover: purpose Propose, implement and document failover procedures for the collaboration, management and monitoring tools used in EGEE/WLCG Grid. –Solution is based on DNS and consists in: mapping the service name to one or more destinations update this mapping whenever some failure is detected Geographical failover for the EGEE-WLCG Grid collaboration tools –CHEP 2007, Victoria BC, Canada (September 2007)

22 How the system works: DNS switch

23

24

25 COD Work aspects to keep in EGEE IIII Dedication : Working groups recognized within federations to provide expertise and by federations to make the needs come to the central operations. Collaboration : Up to now, each federation had found a way to contribute actively to improve their COD work environment, when not proactively leading a working group. Also, each person/tool developper/expert recognized as of « global interest » eventhough out of COD scope has been integrated happily in this « closed community », e.g SAMAP  TIC scope to monitor this aspect with Nagios prototype for example. Flexibility : Purpose of the groups to evolve together with their mandate with time and the upcoming of the needs e.g. Core grid services HA, EGI Anticipation : e.g. Strategy of the Operational Failover Working Group. Experiment : e.g regionalisation of tools and the future modular « NGI dashboards » to widen the CE 1rst line support experience.

26 COD Work aspects to make evolve in EGEE IIII Mandate and Assessment of the COD activity  Integration of NDGF/NE as a COD team – other teams ?  Catch-all and global operations center -- what core services are to be monitored centrally, and how to monitor them and how to properly switch to backup -- How to aggregate local data and what local data would be concerned  Assess metrics in order to assess the most problematic m/w components, recurrently unreliable sites  Operational tools reliability assessment /ENOC test as a start base?  Strenghten need on HA/Failover of operational tools and grid core services Vision of the COD tools long-term evolution : 1 set of tools /federation + aggregation? Which set of tools is to be regionalized ? SAM, GOC DB, COD? what else? How are they going to interact => need for a global schema, NOW.

27 COD Work aspects to make evolve in EGEE IIII Leverage on « project labeled » tools in order for operational use-cases for not to remain « pending ».  developements strategy/priorities are coherent. -- data workflow – synch GOCDB/BDII/SAM/COD -- development strategy – depends on the stretegy of the COD tools long-term evolution -- priority decision workflow – Who and how to drive the « project labeled » tools requests priority for operational use-cases for not to remain « pending ». - critical tests monitoring/accounting or ARC CE. - ca update procedure, - need for SAM failover…  staffing is adequate for proper reactivity not only for bugfix. Interoperability/interoperations (item to be followed up) –OSG : rather informal for the moment, BUT NOW, users do have problems and sites are the relay of their users cf GGUS ticket 31037. –NDGF : existing critical test monitoring ? and what are the consequences on operational procedures?

28 Conclusions and References Where, how, when do we adress these topics?? Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations. References: CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations Grid 2007 (IEEE), Austin Tx, United-States (September 2007) Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007, Victoria BC, Canada (September 2007)


Download ppt "Mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France."

Similar presentations


Ads by Google