Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE/LCG Operation Workshop

Similar presentations


Presentation on theme: "EGEE/LCG Operation Workshop"— Presentation transcript:

1 EGEE/LCG Operation Workshop
24th-26th May 2005 A report on operation support, open issues and statistics Marco Verlato INFN – Sezione di Padova EGEE is a project funded by the European Union under contract IST

2 Outline History since last Operation Workshop
EGEE User Support Overview Grid.it helpdesk usage report ROC Integration Open issues EGEE/LCG Operation Workshop – May 24-26,

3 History Nov. 04: Outcome from User Support Task Force, Grid.it support infrastructure and pilot interface to/from GGUS presented at 1st EGEE/LCG Operation Workshop Nov. 04: Pilot GGUS-Grid.it helpdesk interface live demo at EGEE-2 Conference Dec. 04: Grid.it helpdesk code and interface documentation made available to other ROCs Jan. 05: E(gee, or xecutive) Support Commettee kick off at FZK, WP definition and mandate Mar. 05: Support on Duty start, GGUS/ Grid.it / Cic-on-duty helpdesks fully interfaced May 05: GGUS enhanced, SE , SW helpdesks interfaced EGEE/LCG Operation Workshop – May 24-26,

4 EGEE User Support: requirements
Support requests range from: Grid Services and Sites faults Problems with installation/configuration “How do I …?” Problems with applications Bugs Requirements for extra features Users may be Site Admins, VO application users, VO managers, … they all prefer a single point of contact for Grid problems User Support / Operation Support / VO Support are different but with a lot of overlap Different sets of experts and levels of support EGEE/LCG Operation Workshop – May 24-26,

5 EGEE User Support: infrastructure
The ROCs, VOs and the other project wide groups such as the Core Infrastructure Center (CIC), middleware groups (JRA), and network groups (NA), will be connected via a central integration platform provided by GGUS. This central helpdesk keeps track of all service requests and assigns them to the appropriate support groups. In this way, formal communication between all support groups is possible. To enable this, each group has to build only one interface between its internal support structure and the central GGUS application. EGEE/LCG Operation Workshop – May 24-26,

6 EGEE User Support: interfaces
Using the local Helpdesk Systems in conjunction with a central integration platform at GGUS Resource Center 1(RC) ... Resource Center N(RC) Local User Support Application Regional Operations Center (ROC) Third level support: Generic deployment Grid Middleware Report Problem The User Interface VO support Use the Webview Report Problem Central GGUS Application Interface CIC EGEE/LCG Operation Workshop – May 24-26,

7 EGEE User Support: Responsible Units
First Level Support GGUS team SOD (ROC experts rotation) Second Level Support CIC-on-duty ROC_Asia/Pacific ROC_CE ROC_CERN ROC_France ROC_GER/CH ROC_Italy ROC_North ROC_Russia ROC_SE ROC_SW ROC_UK/Ireland VOSupport (atlas,magic,biomed,compass,babar,cdf,alice,lhcb,cms,d0) Third Level Support (filled with experts provided by ROCs) Grid Deployment Castor Generic Deployment Manual Installation Pre-production system VO management/VOMS Grid Middleware d-Cache Data Management GLUE GridICE Information System/GIP/BDII R-GMA Security Management Workload Management ROC Helpdesks EGEE/LCG Operation Workshop – May 24-26,

8 The Grid-It portal 31 RCs ~1400 CPUs ~120 TB 21 VOs +DAG+MPI +DGAS
31 RCs ~1400 CPUs ~120 TB 21 VOs +DAG+MPI +DGAS EGEE/LCG Operation Workshop – May 24-26,

9 Deployment Status EGEE/LCG Operation Workshop – May 24-26,

10 Services and Sites Monitoring
EGEE/LCG Operation Workshop – May 24-26,

11 Grid-it Helpdesk EGEE/LCG Operation Workshop – May 24-26,

12 Trouble Ticketing System
The trouble ticketing system is based on OneOrZero Helpdesk tool ( coded in PHP, using MySQL, customizable, free Replaced with Xoops / xHelp tool soon Access allowed to registered members approved by administrators: End-users: they create the tickets describing problems or suggestions Supporters: fix the problems, or redirect somewhere else Site Managers: act as supporters for a given RC, and exchange tickets with Operatives for operational issues Operatives: people of ROC/CIC Central Management Team, Release & Deployment Team and Ticketing System Team itself, exchange tickets with Site Managers and Supporters EGEE/LCG Operation Workshop – May 24-26,

13 ROC Support Units ~ 40 people + site managers
EGEE/LCG Operation Workshop – May 24-26,

14 Weekly shifts 4 people a day weekly rotating 8.30-19.30 working hours
11x5 coverage Mainly busy with Operations EGEE/LCG Operation Workshop – May 24-26,

15 Usage Report Statistics for last 6 months of operations
~25 tickets a week on average EGEE/LCG Operation Workshop – May 24-26,

16 Usage Report Grid services Operative teams Grid sites VO applications
EGEE/LCG Operation Workshop – May 24-26,

17 Interface to GGUS http://infnforge.cnaf.infn.it/eticketimp/
First Interface between Grid.it Helpdesk and GGUS ready since November 04 and in ‘production’ since March 05 Based on Web Services at GGUS side, several advantages: sample code available for PHP / Perl and other computing languages very fast: service requests/sec on the GGUS Servers easy to adapt Based on at Grid.it side (importing tool) XML exchange format EGEE/LCG Operation Workshop – May 24-26,

18 Interface to GGUS EGEE/LCG Operation Workshop – May 24-26,

19 GGUSROC Basic Workflow
GGUS System ROC Helpdesk XML Mail GGUS/SOD Web Portal SUPP Unit CMT Ticket assignment CIC-on-duty SUPP Unit X CIC Interface Ticket solved notification SUPP Unit Y Web services EGEE/LCG Operation Workshop – May 24-26,

20 ROC Integration All ROCs were asked to create/enable their Support Structure to be integrated with GGUS: providing a contact to their helpdesk system providing a well defined structure behind their helpdesk system providing a list of experts committed to VO support and 3th level support filling the corresponding GGUS Responsible Units Some ROCs set up an helpdesk system interfaced to GGUS following the Grid.it example using OneOrZero SE: ready in production since April 25th SW: ready in production since May RU: work started in April, plan to be in production by end of July EGEE/LCG Operation Workshop – May 24-26,

21 ROC Integration: SE EGEE/LCG Operation Workshop – May 24-26,

22 ROC Integration: SE EGEE/LCG Operation Workshop – May 24-26,

23 ROC Integration: SW Links to : Home FAQ Ticket Documents Repositories
Training EGEE/LCG Operation Workshop – May 24-26,

24 ROC Integration: SW username/ password needed
EGEE/LCG Operation Workshop – May 24-26,

25 ROC Integration: Russia
EGEE/LCG Operation Workshop – May 24-26,

26 ROC Integration: Russia
EGEE/LCG Operation Workshop – May 24-26,

27 ROC Integration Some ROCs had different helpdesks inside their federation: CE & NE: helpdesk based on RT open to local users since April, plan to be interfaced to GGUS by end of May, support structure and responsibilities defined within their ROC, tickets expected to be answered in a reasonable time FR: home developed helpdesk, interface to GGUS by end of May D-CH: helpdesk based on Remedy, interface to GGUS ready by June 8th UK-I: helpdesk based on Footprint, plan to be interfaced to GGUS by end of July All ROCs will have their Support System ready and interfaced to GGUS by end of July EGEE/LCG Operation Workshop – May 24-26,

28 ROC Integration: some numbers
Even if most ROC helpdesks not yet interfaced to GGUS, ROC supports units are reached with mailing lists: ROC # tickets # open oldest CE 29 2 1 day France 3 1 month GER-CH 33 Italy 54 5 days NE 10 5 Russia 13 2 months SE 36 4 SW 31 12 UK-I 58 15 TOTAL 309 60 Statistic available since half March More than 90% coming from CIC-on-duty CIC-on-duty rate: ~ # 50/week 1st Level rates: GGUS ~ # 20/week SOD ~ # 4/week EGEE/LCG Operation Workshop – May 24-26,

29 Open issues Distributed EGEE User/Operation Support Infrastructure is progressing, but: tickets must be solved within an acceptable timeframe, otherwise we’ll not attract users real responsive people has to be behind the Support Units urgent to define and control the Service Level Agreements (SLAs) escalate problems if they are not solved within the defined SLAs measuring and reporting regarding the SLAs document Processes and Workflows within the whole infrastructure enhance 1st Level Support / SOD: most of people at ROCs involved in deploying / troubleshooting the Grid can more easily solve tickets without addressing them to other RUs, shortening time Integration effort useless if at the end we are not able to provide a reliable service EGEE/LCG Operation Workshop – May 24-26,


Download ppt "EGEE/LCG Operation Workshop"

Similar presentations


Ads by Google