EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.

Slides:



Advertisements
Similar presentations
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC SEE By E. Atanassov,
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Operations Dashboard Workplan Cyril.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Grid Infrastructure and Operations Maite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Hélène Cordier COD-20, CNRS-IN2P3, CSC.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Pole 3 – COD TOOLS Cyril L’Orphelin - CNRS/IN2P3.
Enabling Grids for E-sciencE COD 19 meeting, Bologna Nordic ROD experiences Michaela Lechner COD-19, Bologna.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Romanian SA1 report Alexandru Stanciu ICI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What GGUS can do for you JRA1 All hands.
INFSO-RI Enabling Grids for E-sciencE EGEE 1 st EU Review – 9 th to 11 th February 2005 CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD21 22 Sept 2009 Forum & COD-22 since COD21 until EGI Hélène Cordier COD-22, CNRS-IN2P3,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Parallel sessions Hélène Cordier COD-20, CNRS-IN2P3,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Pre-production in EGEEIII Operation principles Antonio Retico EGEE-II / EGEE II SA1.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operational Security Coordination Team Ian.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGI Operations Tiziana Ferrari EGEE User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC Security Contacts R. Rumler Lyon/Villeurbanne.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Resource Allocation in EGEEIII Overview &
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CIC portal Requirements from users WLCG service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
1Maria Dimou- cern-it-gd LCG GDB May 2008 USAG and direct GGUS ticket routing to Sites Grid Deployment.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Alistair.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD20. June 2009 Helsinki R-COD in UKI Claire Devereux, Jeremy Coles & Co. COD-20,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks User Support for Distributed Computing Infrastructures.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Overview of Operations in EGEE-III Marcin.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA2 Networking support for EGEE III Xavier.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1 & SA2-ENOC Interactions status and plans.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Pole 2 wrap up Vera, Helene, Malgorzata, David, Fotis, Diana.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operational Procedures (Contacts, procedures,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Application Porting Support Gergely Sipos,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD EGEE09 Barcelona Pole-2 Restructuring of Procedures Vera Hansper.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the SAM/Nagios/GSTAT Components.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD EGEE09 Barcelona C-COD Survey results Vera Hansper.
Helene Cordier, CNRS-IN2P3 Villeurbanne, France
Nordic ROC Organization
Pole 3 – Dashboard Assessment COD 20 - Helsinki
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to SA1-Italy Slides taken from COD-16 agenda and adapted Alessandro Cavalli, INFN-CNAF SA1-Italy phone conference

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mandate COD as we understand it today will not exist at the end of EGEE-III; consequently, all current rocs are to run a regional COD service by the end of EGEE-III. The mandate of the COD is mainly to prepare for the evolution of the current operations model towards a more “sustainable” one in 2 years time. i.e. move from the current centralized COD model to regional COD (r- COD) model for all federations. Need for communication and share of expertise between regional CODs and for building together the operations model. Re-organization of COD working groups in EGEE-III

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD working groups (poles)‏ What has been achieved in EGEE-II Mandate for EGEE-III Definition of the 3 poles –Description of the three poles: –Pole 1 : Regionalisation of the service –Pole 2 : Best Practices, Assessment and Improvement of the service –Pole 3 : COD tools

Enabling Grids for E-sciencE EGEE-III INFSO-RI From 6 wkg grps to 3 poles Regionalization - 1rst line Support: Setup of regional teams Operational model Best Practices and Procedures: Best practice definition Operations Manual update COD tools : Operational Tools CIC portal integration TIC (tools improvements, SAM sensors impr., …) Tools/MW Failover Integration with OAT mandate (newborn SA1 Operations Automation Team)

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD-16 Report - SA1-Italy phone conference Pole 1: Terminology typical COD - team doing monitoring shifts for entire infrastructure in EGEE-II, that’s what we do now r-COD - regional-COD, team dealing with alarms and tickets for their own region, communicating with other r- CODs, c-COD etc., planned for EGEE-III. 1-st line support - team helping site admins in the region in technical matters (see EGEE-III WBS TSA 1.2.3)‏ c-COD - central-COD, small team aka „thin layer” coordinating r-CODs, making sure the r-CODs are ”converging” in terms of procedures etc., planning evolution towards EGI

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD-16 Report - SA1-Italy phone conference The Model r-COD duties –handle alarms  in first 24h – time to reaction in site, place for 1 st line support to act  having 1 st line support team experts is valuable – recommend to regions  notification about alarms can be send to sites also – obligatory for sites –open tickets  after 24h, should be assigned to sites directly, not to e.g. 1 st line as it is site responsibility to solve the problem and contact 1 st line eventually  create tickets mandatory after 24h? yes, as having tickets increased availability. Need to be written into procedures. Problem with automatization to avoid big numer of tickets, fake alarms ones etc.

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD-16 Report - SA1-Italy phone conference The Model (cont.)‏ r-COD duties (cont)‏ –handle tickets  proposal: give time to r-COD to handle the ticket, 1week, 2 weeks – take time from current model, then escalate to c-COD  escalating only in case of sites not responding, c-COD, can be done automatically by the tool –escalate problems to other regions (which problems, what tool?)‏  negative example: core service failure – should be handled by appropriate region. If not, it is a general problem: send GGUS ticket  SAM problem with CERN's RB: send GGUS ticket assigned to CERN  no other cases has been identified... –escalate problems to c-COD (which problems, what tool?)‏  site not responding for certain time  site suspension requests –should r-COD be able to suspend a site?  ROC is asked for site suspension  we should retain decision on higher level: if there is a VO having a crucial service, they will be interested in site suspension

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD-16 Report - SA1-Italy phone conference The Model (cont.)‏ 1 st line support duties –what is the basic set of responsibilities?  focus on what's the outcome of their work, not how they interact with sites knowlegde base experience sharing between teams –A tool for experience sharing  web forum – problem is assigned to an expert site admin sends a support request, searches the forum possibility to change problem topic by expert, add keywords/tags, organize in the right way to facilitate usage by others also use GGUS knowledge base forum shall be centralized owe use the same middleware oexpertise sharing between 1 st line support teams how to organize that, who will set it up? -> ask COD tools?

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD-16 Report - SA1-Italy phone conference The Model (cont.)‏ c-COD responsibility –present issues to Operations Meetings  site suspension  these will mainly be issues from problems in regions -> shouldn't they go through the ROC?  can't it be done by r-COD? no need for developing tools problem: r-COD may want to hide such situations, internal pressure etc. –dealing with problems raised by r-COD that were not solved in specified time  originated by COD and assigned to some Support Unit (not site)‏ –dealing with actions assigned to COD by others (e.g. ROC managers etc.)‏

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD-16 Report - SA1-Italy phone conference Roadmap for all feds.‏ model of gradual, smooth transition from „typical COD to r-CODs” –Some feds. want avoid doing r-COD and typical COD in parallel –Proposal: „plug-out” federation from „central ROTA” while starting r-COD  the federation will no longer have to look at the other sites  but nobody has to look at the sites of the federation  who is doing c-COD duties? new r-CODs? do we need for tool for c-COD? requirement on COD tools: not display r-COD tickets in typical COD portal, etc. the rota for feds. remaining in central COD will change!  transition period: still looked by the others? - tickets are created twice, lazy, not doing... fuzzy

Enabling Grids for E-sciencE EGEE-III INFSO-RI COD-16 Report - SA1-Italy phone conference Pole 3: failover GOCDB failover is in progress Oracle DB replicated at CNAF –With Oracle Materialized Views, Read Only replica Web GOCDB replicated at ITWM (DE) –Daily updated with yum from RAL repo Tested ITWM GOCDB web with CNAF DB: –The web gives a blank page, error message in the logs –Reason: only data replicated, Oracle procedures are not replicated –Defined how to make the proper replication Failover scenary: –When/how/what to switch –New idea: LDAP DB connection method:  provides the possibility to instruct the GOCDB customers to use Master or Replica DB  LDAP entries contain the Oracle DB contact details: updating them can route access to backup DB for GOCDB (an possibly other operational tools)  LDAP has master/slave feature for failover.  check if LDAP server can be installed at RAL (with backup on another site) Frequency of GOCDB update –CNAF DB is only 9 MB. If it’s really so, it can be refreshed more than daily, probably very frequently