Presentation is loading. Please wait.

Presentation is loading. Please wait.

Operation team at Ccin2p3 Suzanne Poulat –

Similar presentations


Presentation on theme: "Operation team at Ccin2p3 Suzanne Poulat –"— Presentation transcript:

1 Operation team at Ccin2p3 Suzanne Poulat – suzanne@in2p3.fr

2 Overview Operation Team Organisation Operation’s role Services during out of working hours Tools Monitored services Examples Suzanne Poulat - suzanne@in2p3.fr2

3 Operation team  Two groups : Support and Operation  Support (9 persons) : −general user support, −dedicated persons for LHC experiments, −help-desk(Xhelp), −opening CC to collaborations and other sciences  Operation : details follow 3Suzanne Poulat - suzanne@in2p3.fr

4 Organisation  Ten persons in the group −two for Grid coordination −Four for Operation −Four operators in shift to cover 08:00AM to 09:PM 7/7  on a weekly basis : −one person for operation (often 1.5) −The others have tasks as developments, monitoring or administrative tasks 4Suzanne Poulat - suzanne@in2p3.fr

5 Operation’s role  Check the avalaibility of all services (storage, cpu,…)  Optimize service usage  Insure that commitments of CCIN2P3 for the experiments and Grid VOs are respected  Organize the scheduled shutdowns  Coordinate actions during unscheduled downtimes  Monitoring and management of tape libraries  Create and manage accounts and AFS space  Organize the « on duty » service 5Suzanne Poulat - suzanne@in2p3.fr

6 Services - Out of working hours  On site night security guard from 6PM to 8AM and weekends –no computing actions : Alerting and Messaging  1 on-duty engineer (evenings, weekends) –Corrective actions if possible (documentations, Training) –else call an expert … if available  Weekend : 1 operator on site (10AM – 5PM) –first low level action –else call on-duty engineer  Result is a « Best effort » coverage 6Suzanne Poulat - suzanne@in2p3.fr

7 tools  Monitoring tool : NGOP -> Nagios  Remote Logging Service : RLS  Mails  Tickets from local and grid users : Xhelp interfaced with GGUS at CC  Web pages on the current state of services  Wiki for documentation, recipes, shutdowns, postmortem analysis  log of the daily production : ELog  Tickets web page for tapes and drives incidents (~50 incidents per month : 10 drives, 40 tapes with 2 lost of data)  Scripts to analyse faulty tapes 7Suzanne Poulat - suzanne@in2p3.fr

8 Monitored services  BQS  Storage : HPSS, dCache, AFS  Grid : CE, SRM, TOP BDII  Databases  Others : Tape libraries, Saphir (privileges and location of services)  Workers and all servers Suzanne Poulat - suzanne@in2p3.fr8

9 Nagios 9

10 SMURF 10

11 Anastasie – Running jobs Suzanne Poulat - suzanne@in2p3.fr11

12 Xhelp Suzanne Poulat - suzanne@in2p3.fr12

13 Xhelp (2) Suzanne Poulat - suzanne@in2p3.fr13 ~320 tickets by month = 10 to 20 tickets by days

14 Xhelp (3) Suzanne Poulat - suzanne@in2p3.fr14

15 implementations  Wiki Operation Wiki Operation  Nagios monitoring Nagios monitoring  Ovax Ovax  Users database Interface Users database Interface  Incidents robotique Incidents robotique  On duty tools On duty tools 15Suzanne Poulat - suzanne@in2p3.fr

16 QUESTIONS ? 16Suzanne Poulat - suzanne@in2p3.fr


Download ppt "Operation team at Ccin2p3 Suzanne Poulat –"

Similar presentations


Ads by Google