Presentation is loading. Please wait.

Presentation is loading. Please wait.

Helene Cordier, CNRS-IN2P3 Villeurbanne, France

Similar presentations


Presentation on theme: "Helene Cordier, CNRS-IN2P3 Villeurbanne, France"— Presentation transcript:

1 Helene Cordier, CNRS-IN2P3 Villeurbanne, France
COD-16 Transition meeting EGEE-III Helene Cordier, CNRS-IN2P3 Villeurbanne, France

2 Contents What has been achieved in EGEE-II Mandate for EGEE-III
Mandate for EGEE-III Definition of the 3 poles Main Goals and objectives of this meeting Pole 1 : Regionalisation of the service Pole 2 : Assessment and Improvement of the service Pole 3 : COD tools Operational model  Procedures and tools  Sustainability and Reliability at the international level Pole 1 : Regionalisation of the service – what (service level) / how (operations model) Pole 2 : Assesment of the service - Problem follow-up/ procedure /metrics Pole3 : COD tools Representative here means that we need to know who will be active and who will not COD-16 Transition meeting

3 Mandate COD as we understand it today will not exist at the end of EGEE-III; consequently, all current rocs are to run a regional COD service by the end of EGEE-III. The mandate of the COD is mainly to prepare for the evolution of the current operations model towards a more “sustainable” one in 2 years time. i.e. move from the current centralized COD model to regional COD (r-COD) model for all federations. Of course, our goal is to achieve the set-up of this distributed model as smooth and seamless as possible; hence a need for communication and share of expertise between regional CODs and for building together the operations model. Operational model  Procedures and tools  Sustainability and Reliability at the international level SUSTAINABILITY : r-cod model Smooth and Seamless : the current work is still done  Everybody stays on board unless proven otherwise RELIABITY /AVAILABILITY : model of c-COD COD-16 Transition meeting

4 From 6 wkg grps to 3 poles 1rst line Support : CE- NE- TWN - SWE Best Practices and Procedures : DE-CH /SEE COD tools : FR- IT- SWE-CE 1rst line support Best Practices COD tools Ops. Manual Procedure Failover /HA ops tools CIC integration TIC/ HA m/w Efforts from same people who are doing or have been doing the COD work and not somebody out of the blue. Integration of the OAT specifications into the evolution of the COD dashboard according to COD proposals, make regional instances  “catch-all “ and elaborate  "thin layer COD" instance. Weekly Operations Regional teams Operation Model SA1 coordination Ops tools OAT H.Cordier

5 Main Goals / Objectives
Define what would the operations model be Validation of the internal organization 3 poles with enough «representative participation» and « active » leaders i.e. start with 4 federations e.g pole 1 and its steering comittee e.g. Pole 1 Active leaders should come from 2 federations. Each Pole has a task list and is staffed consequently 3 poles whith a list of tasks including some of the EGEE-II current tasks / COD15 actions list  easy to follow-up Identify need for tools / staffing Define precisely Interfaces with external bodies Logistics : Phone conferences bi-monthly by poles in-between meetings F2F Meetings following SA1 coordination meetings at best. Composition/specific needs Next meeting : EGEE’08 Representativity from all federations means that the outcome has some value to the EGEE-III project and the structure beyond. Some of the EGEE-II current tasks: People have to sort what/how COD-16 Transition meeting

6 Because there is mow more than one COD …
COD : Distributed teams doing monitoring shifts and ensuring critical tests failures against sites are attended at i.e. at minima : communication schema + grid expertise stored in procedures and wiki First Line support : COD service for sites within a federation with current model of regionalization for operations being: Alarms for sites belonging to AP region and younger than 24h don't appear on the regular COD dashboard. 1st line supporters are allowed to switch these alarms off, mask them, and create tickets from these alarms. 1st line supporters  are allowed to pass information through the site notepad. Access to all other info in read-only mode. COD-16 Transition meeting

7 Glossary as of today R-COD : ultimate model of 1rst line support with maximum autonomy regarding alarms and operational tickets assigned to sites in the region. C-COD : small (how small can it be) team coordinating r-CODs, catch-all for monitoring cover need for escalation process/grid experts/reporting/grid integrity. Maximum autonomy to be defined Operational tickets : which ones now stay and could stay/ or should stay at the regional level  C-CODs COD-16 Transition meeting

8 1rst line support forum – CE /TW, SWE/NE
Current model : alarms younger than 24h Planning and specificities of federations/Questionnaire Recommendations for the r-COD service on federations who join How to improve the model – Open questions Go further in the regional model ? Boundary between the 1rst line support/c-COD ? What would do the final c-COD team ? (e.g. need for escalation process/grid experts/reporting/grid integrity) Knowledge base and collaborative tools Impact of VO specific tests on the operations model ? Mutualization of work of both teams at the region level H.Cordier

9 1rst line support forum – cont’d CE /TW, SWE/NE
How to improve the model – Open questions Knowledge base and collaborative tools Impact of VO specific tests on the operations model ? Mutualization of work of both teams at the region level Regionalisation of the COD service What it takes to have it successful still on May 1rst 2010. What is the operations model c-COD going to be  tomorrow How to improve the process of COD service  hints Run SAMAP Set aside network glitches impact Take into account the Core Services Failure COD-16 Transition meeting

10 Best practices &procedures DE-CH – SEE and ???
Best Practices Drive evolution of COD Best Practices reflected in procedures: Advisory comittee for ensuring uniformisation and regulation, before setting up new critical tests Interface to weekly operations meetings and ROC: incl.item 196: Ask CODs if they can check in their cycle of work from GDA on monday 09/06/08 ROC : accounting tests passing critical. - Procedure release PPS: VO aspects: -VO registration: Site aspects: - EGEE-II SA1 SLD Site registration: Downtime  Procedure - How-to General: - New CA release: COD activity: - Operational Procedure Manual -- official version 1.6 to be released anytime now [COD activity - describing the working groups and referencing the activity report] [COD dashboard - How to ] H.Cordier

11 Best practices & procedures – cont’d DE-CH – SEE and ???
Ensuring that the present COD work is not disrupted: Operational use-cases follow-up /operational tools GGUS report Work Assessment so that the central service is operational Metrics for COD activity in EGEE-III, Handover improvement, Alarm masking rules and weighing mechanism – SEE Operation Procedure Manual cf Clemens’ talk and MSA1.2 COD-16 Transition meeting

12 COD tools – CE-IT-FR-SWE
TIC Failover mechanisms for GSTAT, SAMAP, CIC and GOC Follow-up of HA of ops tools /ENOC and core service mw /TCG -- forum /OCC  COD dashboard Evolution towards N regional instances + 1 specifically dedicated for C-COD. Incl Alarm weighing mechanism. Request for change on monitoring tools through OAT Follow-up of requirements on OAT Follow-up of requirements on the separate tools Automatization is the how and do not tell where we want to go Automatization is providing the specifications of the new info source to COD dashboard All Automatization in turn is to be taken with caution that includes automated opening of tickets H.Cordier

13 POLE1 : EGEE-II – Now COD range : All federations Sites COD range : some federations Sites r-COD RC OPEN QUESTIONS To BE ADRESSED TOMORROW MORNING OR LATER TODAY by inter - POLE 1 discussions or PLENARY -- What is requested to build a r-COD acceptable to EGEE-III i.e : minimum level of service. We have to define this -- What is going to be the final model of COD becoming c-COD -- Could be minimal but we need to make sure that the model is sustainable as to May 1rst 2010 : IT STILL WORKS i.e : work out how to imagine the operations model what is the % of tickets passing to c-COD after 24h00 and why ? should that be, because it takes too long to solve ? Or because it is relevant to the project level  what do the r-cods think a c-cod is needed for ? escalation process/grid experts/reporting/grid integrity/security COD-16 Transition meeting

14 range : some federations
Now COD range : some federations Sites r-COD RC COD-16 Transition meeting

15 range : some federations
Now – End of PY2 C-COD range : some federations Sites r-COD RC C-COD ..... r-COD RC OPEN QUESTIONS TO BE ADRESSED TOMORROW: -- What is the number of federations involved : e.g. CERN or even if we come to a full set of r-COD it may very well be that it is not the case on May 1rst 2010. How the specific needs of the r-CODs / LHC, will impact the relations between r-COD and c-COD. c-COD need to be aware of the monitoring of this. COD-16 Transition meeting

16 Main Goals / Objectives
Define what would the operations model be Validation of the internal organization 3 poles with enough «representative participation» and « active » leaders i.e. start with 4 federations e.g pole 1 and its steering comittee e.g. Pole 1 Active leaders should come from 2 federations. Each Pole has a task list and is staffed consequently 3 poles whith a list of tasks including some of the EGEE-II current tasks / COD15 actions list  easy to follow-up Identify need for tools / staffing Define precisely Interfaces with external bodies Collaborative tools/ mailing lists Staffing needs/Rota basis or lead team duties Representativity from all federations means that the outcome has some value to the EGEE-III project and the structure beyond. Some of the EGEE-II current tasks: People have to sort what/how HOMOGENEOUS INVOLVEMENT IN POLES NEEDED COD-16 Transition meeting


Download ppt "Helene Cordier, CNRS-IN2P3 Villeurbanne, France"

Similar presentations


Ads by Google