Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A three years thorough review of a project’s.

Similar presentations


Presentation on theme: "EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A three years thorough review of a project’s."— Presentation transcript:

1 EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A three years thorough review of a project’s NOC: the EGEE Network Operating Centre ( ) Guillaume Cessieux (CNRS/IN2P3-CC, EGEE-SA2) Xavier Jeannin (CNRS/UREC, EGEE-SA2) TNC 2009, Málaga, 2009-06-08

2 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Outline EGEE in a very small nutshell –Overview –Network involved –Networking support activity in EGEE ENOC –Concept –History –Detailed implementation Achievements & review Current areas of work 2

3 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Enabling Grids for E-sciencE (EGEE) –The largest multi-disciplinary Grid infrastructure in the world –Brings together more than 140 institutions –Produces a reliable and scalable computing resource ~300 sites (50 countries) >80,000 CPUs cores >20 PetaBytes >14,000 users >370,000 jobs/day Networks are the key underlying layer The EGEE project 3 LHC

4 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Networks involved Multi-domains shared network infrastructure –Delivered by more than 30 NRENs & GÉANT2 –Including non European (CA, RU, SN, TW, US...) A dedicated network: The LHCOPN –Large Hadron Collider Optical Private Network –10 Gb lightpaths ending on sites 4

5 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX EGEE networking support Networking support: Support Activity 2 - SA2 –“Small” activity (~ 1.5% of overall project’s budget, ~ 7 FTEs) –Provide a single interface between Grid and networks 5 Grid Networks SA2 ENOC

6 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX The ENOC Why a project’s NOC? –Embed network operations in Grid operations at project level  Scheduled network downtimes, incident reports, bandwidth issues... –Single convenient operational interface between networks & Grid ENOC: EGEE Network Operation Centre –Including all its required dependencies  Monitoring, troubleshooting, operational tools... But a very particular “NOC” –The EGEE project did not own nor manage any network devices... –More a “workflow facilitator”  Manpower allocated ~ 2 FTEs - not 24x7x365 6

7 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX History 2004-2006 – EGEE-I –Survey and feasibility investigation –Processes defined, prototyped and validated 2006-2008 – EGEE-II –First raw implementation from scratch  Topology database, ticket handling and analysis... 2008-2010 – EGEE-III –SA2 now focused around the ENOC  Particularly on tools (troubleshooting, tickets exchange...)  Maturing processes and tools with lessons learnt from EGEE-II 7

8 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Foreseen operational process 8 Site A Site B NREN ANREN BNREN C ENOC Users GGUS (Grid TTS) Grid operations 1 3 2 3 NOC B NOC A 1 Grid Networks

9 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX 9 Tools - Overview 19 NRENs + GÉANT2 Trouble tickets DB translate, homogenize, sort Maps Network topology DB GGUS Public dashboard Sharing Impact assessment, filtering Internal tools Statistics Dashboard ~ 800 tickets/month ~ 2500 e-mails/month

10 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Tools - Tickets handling (1/2) 10 GRNET RedIRIS Tickets are the only operational information widely available from network providers Ticket homogeniser –Templates per NRENs to define matching criterias  Regexp based: Match location, start date, ticket ID etc. Templates See related poster during TNC2009: Grid Management: Architecture analysis of a trouble ticket normalization and delivery service

11 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Tools - Tickets handling (2/2) Now successfully facing a huge workflow –For 19 NRENs ~ 2500 e-mails/month representing 800 tickets  Very low trash ratio: ~ 5% ~200 tickets opened at the same time –In our database: 102k e-mails, 27k tickets, 1GB of data 11

12 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Tools – Topology database (1/2) Provide a logical view of the network –Avoid going at too low level  Network providers did not want to expose their topology  Useless and might be a burden to maintain Schema was hard to define –Another too complex... 12

13 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Tools – Topology database (2/2) Initialy automaticaly filled from traceroute automatic analysis (~ DNS domain name matching) –Then humanly reviewed thanks to graphical tools 13

14 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Tools – Impact computation Really the tricky part –How to filter from all tickets received those impacting the Grid? Automatic impact computation attempted –Match ticket’s locations on our adapted topology database If a node is affected guess all linked sites are –Store impact and map ticket on node –Suspected ratio impacting the Grid: ~ 15% 14

15 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Tools – Operational database Topology database + operational information = operational database –To store network outage are impacting the Grid 15

16 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Connectivity tests: home made DownCollector –TCP tests on all Grid nodes (~ 2000) from a central point  Aggregated results per site –Impact localisation using stored network checkpoints Tools - Monitoring 16

17 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX 2008 breakdown from DownCollector From average assessment from DownCollector for year 2008 on EGEE certified Grid sites (~ 300): Network troubles are not concentrated on few sites More than half of connectivity problems detected are on-sites 17 80% of off-site network troubles are solved within 30 minutes Only ~ 45/month last more 80%

18 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Achievements around the ENOC Downcollector - https://ccenoc.in2p3.fr/DownCollector/ https://ccenoc.in2p3.fr/DownCollector/ –Reached 3GB of monthly traffic (web + Nagios quering) ASPDrawer doing BGP monitoring of LHCOPN –Useful service assessment for 2008 and official for 2009 Trouble ticket exchange standard –Work around database (topology, tickets, impacts) –Normalisation of network trouble tickets ready to be implemented –Rendering on web interfaces Approaches strongly driven by automation –Reasonable efforts to run and maintain things! 18

19 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Review (1/3) Information acquired in network trouble tickets is not formalised and accurate enough –Plain text e-mail tickets are a plague to analyse –Even matched correctly the meaning is often not satisfying  Impact on services not computed  Only targeted to local community, meaningless at project level  Naming conventions linked to a topology database somewhere? This really prevents us from a successful reliable automatic impact assessment –And it is hard to make network providers improving that…  What about homogenising at least interfaces within NRENs? 19

20 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Review (2/3) Disclosure of network trouble tickets is a big issue –How can they be shared?  What about a centralised knowledge database of network issues? Few or wrong inquiries from Grid –Middleware still output some very misleading error messages Lack of place to globally exchange with NRENs –EGEE Technical Network Liaison Committee – TNLC was set up  But attendance often reduced to EGEE partners... 20

21 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Review (3/3) Lack of serious network monitoring is really embarrassing –Technical complexity due to the scale and... viewpoints –NOC not feed, no history, no quality assessment...  Connectivity tests good but not enough –Good convergence toward perfSONAR solutions  Some extra time needed to maturate and be deployed enough Networks are really working fine –This was not expected in such extends 21

22 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Current areas of work (1/2) e2e troubleshooting service: perfSONAR lite TSS –perfSONAR PS based with central webinterface –On demand measurements only Standard trouble tickets exchange –Data models and software are now here  RFC draft was submitted (2009-05) –But what is the benefit for NRENs to deliver standard trouble tickets?  This might really slow down adoption… Trouble tickets impact matching –Correlate tickets with monitoring data 22

23 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Current areas of work (2/2) What are traffic patterns related to the Grid? –Full perfSONAR monitoring of Tiers 1/Tiers 2 in Spain by RedIRIS LHCOPN –SA2 is leading design and implementation of a federated model Also non fully ENOC related tasks: SLA, IPv6 etc. 23

24 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Conclusion (1/2) Networks “seem” working really fine –Not so many complex multi-domains issues –Current strategy is still: If it is down, just wait it to be back Lot of work performed to set up the ENOC –Simple ideas revealed technical challenges Unfortunately our requirements are constraints for network providers –No clear benefits for them to follow us  Slowing down happy « collaboration » –Local user community versus worldwide project... 24

25 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Conclusion (2/2) Mitigated results around the ENOC –Technical success but insufficiently used  Now strong concurrence of local support structures –Stoppers: Lack of some key requirements Near future –European Grid Initiative (EGI) Network Support Centre (ENSC) –No longer active roles expected in network operations –Focused on underlying network tasks at project level  Monitoring, advanced network services, quality assessment Such project wide problematic might become common –Abstract all network providers at project level 25

26 Including a joint network session with TERENA NRENs & Grid workshop and EGEE SA2

27 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Thank you! Questions? http://www.eu-egee.org/ 27

28 Enabling Grids for E-sciencE TNC 2009 – Málaga – 2009-06-08 GCX Acknowledgements EGEE SA2 team –Main partners  CERTH  CNRS  DANTE  DFN  GARR  GRNET  NTUA  RedIRIS  RRC-KI IN2P3-CC network team CNRS UREC 28


Download ppt "EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A three years thorough review of a project’s."

Similar presentations


Ads by Google