Presentation is loading. Please wait.

Presentation is loading. Please wait.

WorkShop 2007 sul Calcolo e Reti dell'INFN Enabling Grids for E-sciencE Rimini, 7-11 Maggio 2007 Operation and Support at INFN-GRID Daniele Cesini – INFN-CNAF.

Similar presentations


Presentation on theme: "WorkShop 2007 sul Calcolo e Reti dell'INFN Enabling Grids for E-sciencE Rimini, 7-11 Maggio 2007 Operation and Support at INFN-GRID Daniele Cesini – INFN-CNAF."— Presentation transcript:

1 WorkShop 2007 sul Calcolo e Reti dell'INFN Enabling Grids for E-sciencE Rimini, 7-11 Maggio 2007 Operation and Support at INFN-GRID Daniele Cesini – INFN-CNAF Alessandro Paolini – INFN-CNAF Paolo Veronesi – INFN-CNAF On behalf of the Italian ROC

2 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 2 INFNGRID services Overview INFNGRID Overview

3 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 3 Supported Sites 40 Sites supported: 31 INFN Sites 9 NON INFN Sites Total Resources: About 4600 CPUs About 1000 TB Disk Storage (+ About 700 TB Tape)

4 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 4 Supported VOs 40 VOs supported: 4 LHC (ALICE, ATLAS, CMS, LHCB) 3 cert (DTEAM, OPS, INFNGRID) 8 Regional (BIO, COMPCHEM, ENEA, INAF, INGV, THEOPHYS, VIRGO) 1 catch all VO: GRIDIT 23 Other VOs Recentrly a new regional VO enabled: COMPASSIT

5 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 5 Components of the production Grid Grid is not only CPUs and Storage Other elements are as much fundamental for running, managing and monitoring the grid: Middleware Grid Services Monitoring tools Accounting tools Management and control infrastructure Users

6 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 6 GRID Management Grid management is performed by the Italian Regional Operation Center (ROC). Its main activities are: Production of the INFNGRID release and test it Deployment of the release to the sites, support to local administrators and sites certification Deployment of the release into central grid services Maintenance of grid services Periodical check of the resources and services status Account the resources usage Support at an Italian level to site managers and users Support at an European level to site managers and users Introduction of new Italian sites Introduction of new regional VOs The IT-ROC is involved in many other activities, not directly related to the production infrastructure (i.e. PreProcution, PreView and Certifcation Testbeds), but these are not described in this talk

7 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 7 The Italian Regional Operation Center (ROC) One of 10 existing ROCs in EGEE Operations Coordination Centre (OCC) –Management, oversight of all operational and support activities Regional Operations Centres (ROC) –providing the core of the support infrastructure, each supporting a number of resource centres within its region Grid Operator on Duty Grid User Support (GGUS) –At FZK, coordination and management of user support, single point of contact for users

8 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 8 Middleware INFNGRID RELEASE

9 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 9 The m/w installed on INFNGRID nodes is a customization of the gLite m/w used in the LCG/EGEE community. The customized INFNGRID release is packaged by the INFN release team (grid-release infn.it). The ROC is responsible for the deployment of the release. At the moment the INFNGRID-3.0-Update20 (based on gLite3.0-Update 20) is deployed. LCG LCG 1.0 INFN-GRID 1.0 EGEE EGEE II 2004200720032008 LCG 2.0 2.0 gLite 3.0 3.0 20052006 INFNGRID Release

10 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 INFN-GRID customizations: why? VOs not supported by EGEE: define once configuration parameters (e.g. VO servers, poolaccounts, add VOMS certificates,...) to reduce misconfiguration risks MPI (requested by non-HEP sciences), additional GridICE config (monitor Wns), AFS read-only (CDF requirement),... Deploy additional middleware in a non intrusive way: Since Nov. 2004 VOMS, now in EGEE; DGAS (DataGrid Accounting System); NetworkMonitor (monitor network connection metrics)

11 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 INFN-GRID customizations Additional VOs (~20) GridICE on almost all profiles (including WN) Preconfigured support for MPI WN without home shared but with ssh hostbased authentication DGAS: accounting New profile (HLR server) + additional packages on CE and WN NME (Network Monitor Element) Collaboration with CNAF-T1 for Quattor UI “PnP” –UI installable without administrator privilegies NTP AFS (read-only) on WN (needed by CDF VO)

12 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 12 gLite Updates: 17/10/2006 - gLite Update 06 20/10/2006 - gLite Update 07 24/10/2006 - gLite Update 08 14/11/2006 - gLite Update 09 11/12/2006 - gLite Update 10 19/12/2006 - gLite Update 11 22/01/2007 - gLite Update 12 05/02/2007 - gLite Update 13 19/02/2007 - gLite Update 14 26/02/2007 - gLite Update 15 ……. INFNGRID Updates: 27/10/2006 - INFNGRID Update 06/07/08 (+ new dgas, gridice packages) 15/11/2006 - INFNGRID Update 09 19/12/2006 - INFNGRID Update 10/11 29/01/2007 - INFNGRID Update 12 14/02/2007 - INFNGRID Update 13 20/02/2007 - INFNGRID Update 14 27/02/2007 - INFNGRID Update 15 …… Steps: – gLite Update announcement – INFNGRID release alignment to announced update (ig-metapackages, ig-yaim) – Local testing – IT-ROC deployment Middleware Updates deployment – Since the introduction of gLite3.0, from EGEE there where no more big release changes, but a series of smaller frequent updates (about weekly) – INFNGRID release was updated consequently

13 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 13 INFNGRID services Overview INFNGRID Services Overview

14 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 14 The new general web portal

15 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 15 The old technical web portal

16 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 16 General Purpose Services

17 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 17 General purpose services – VOMS servers

18 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 18 VOMSes Stats VO User argo 17 bio 44 compchem 31 enea 8 eumed 56 euchina 35 gridit 89 inaf 25 infngrid 178 ingv 12 libi 10 pamela 16 planck 16 theophys 20 virgo 9 Cdf 1133 Egrid 28 VOMS NUMBER OF USERS PER VO TOP USERS (about 85% of total proxies): CDF (~50k proxies/month) EUMED (~500 proxies/month) PAMELA (~500 proxies/month) EUCHINA (~400 proxies/month) INFNGRID (Test purposes ~ 200 proxies/month)

19 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 19 General purpose Services - HLRs Accounting: Home Location Register

20 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 20 VOs Dedicated Services New DEVEL-INFNGRID-3.1 WMS and LB are coming soon as VO dedicated services into production (atlas, cms, cdf, lhcb) VO specific services previously run by the INFNGRID Certification Testbed and now moved to production  DEVEL RELEASE A total of 18 VO dedicated services that will become 25 with the introduction of the 3.1 WMS and LB

21 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 21 FTS channels and VOs Installed and fully managed via Quattor-Yaim; 3 hosts as frontend, 1 backend oracle cluster; Not only LHC VOs –PAMELA –VIRGO Full standard T1-T1 + T1-T2 + STAR channels –51 channel agents; –7 VO agents; (A prototype of) Monitoring tool available –Agent and Tomcat log file parsing and saved in a mysql db –Web interface: http://argus.cnaf.infn.it/fts/index-FTS.phphttp://argus.cnaf.infn.it/fts/index-FTS.php Support: –Dedicated department team for Tickets; –Mailing list: fts-support cnaf.infn.it

22 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 22 FTS transfer overview

23 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 23 Monitoring and Accounting Monitoring and Accounting Tools used by the ROC

24 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 24 Monitoring GridICE: http://gridice4.cnaf.infn.it:50080/gridice/site Developed by INFN Several servers with different scopes are installed and maintained by the IT-ROC More details on the monitoring talk….

25 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 25 GSTAT: http://goc.grid.sinica.edu.tw/gstat//Italy.html Developed out of INFN A GSTAT server is maintained by the IT-ROC Monitoring GSTAT queries the Information System every 5 minutes The sites and nodes checked are those registered in the GOC DB The inconsistency of the information published and the eventual missing of a service that a site should publish are reported as an error

26 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 26 SAM: https://lcg-sam.cern.ch:8443/sam/sam.py SAM-ADMIN: https://cic2.gridops.org/samadmin/ Is the CERN-EGEE official testing tool, tests are performed by jobs submitted to sites. Submission is triggered by an admin web interface. A mirror of the web interface is hosted at CNAF and maintained by the IT-ROC. Monitoring

27 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 27 ROCRep && HLRMON: http://grid-it.cnaf.infn.it/rocrep/index.php http://grid-it.cnaf.infn.it/hlrmon/index.php (Data about all VOs, all sites, T1 excluded) Web interface to obtain aggregated Grid usage data. Two versions exists: 1)Data taken from the GridiceDB 2)Data taken from DGAS HLR DB – a new interface is being released Accounting more details on the accounting talk….

28 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 28 GOC ACCOUNITNG SYSTEM: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php Data from the HLR server are accounted into the GOC system through the dgas2apel tool, more details on the accounting talk…. Accounting

29 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 29 Users and Sites Support Support

30 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 30 The IT-ROC offers a number of grid services and controls their correct operation. But not only…. The IT-ROC also continuously monitors the status of the sites inside the ROC itself and in case of problems helps site managers or users to find a solution. As a parallel activity the IT-ROC is also involved in the monitoring and support of the entire EGEE infrastructure (TPM and COD) – The same support activity to users and sites given to the INFNGRID is given to the LCG/EGEE Grid on a round robin manner among the ROCs Support

31 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 31 Users and sites support The main tools to give support to users are the ticketing systems: EGEE make use of the GGUS (Global Grid User Support) ticketing system Each ROC uses different tools interfaced to GGUS in a bidirectional Way. By means of Web services, it is possible to: Transfer tickets from the global to regional system Transfer tickets from the regional to the global system Once tickets are logged they are assigned to a proper support unit either in GGUS either in the regional systems The IT-ROC ticketing system is based on XOOPS/xHelp

32 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 32 IT-ROC Control Shifts About 20 supporters perform a monitoring activity composed by 2 shifts per day, from Monday to Friday, with 2 persons per shift. At the end of the shift a report is produced. During the shift the supporters: Check the Grid status and try to discover problems before the users. In case of problems open tickets to the interested department in order to find a solution. If he/she is able suggests a possible solution. Perform sites certification during the deployment phases Check the status of tickets and urges experts or site-managers to give answers and solutions to them

33 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 33 IT-ROC Shifts ISSUES The ROC monitoring is oriented to the infrastructure and not to the VOs The active monitoring done via test jobs (i.e. the SAM tool) uses 3 VOs dedicated to infrastructure testing: dteam, ops and infngrid that in general have greater priority on sites  the side effect of this is that VO specific problems are not observed. Passive controls (i.e. gstat and gridice) are not affected by this problem. The infrastructure test can be ok, but users can experience problems as well. The actual control shift organization seems to be insufficient for the VOs needs and the LHC VOs are already performing their own tests (VO dashboards) in order to face this situation.

34 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 34 IT-ROC Shifts ISSUES Both the Italian and the European experiences in Grid monitoring show that it is necessary to integrate the infrastructure oriented monitoring with a more VO specific monitoring  But just in INFNGRID we have about 40 VOs !! Collaboration between the ROC and the people involved in the VO dashboards is desirable, at least to define a set of controls that are important for the VOs, but still not performed by the ROC

35 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 35 TPM and COD TPM (Ticket Process Manager): is responsible of the right ticket assignment in the central GGUS system. When a ticket is logged it is automatically assigned to the TPM group that routes the ticket to the proper support unit or, if able, proposes a solution. The whole ticket life is under the control of the TPM that can at any time modify the ticket urging for an answer or solution. Each ROC performs 1 week shift on a round robin cycle. COD (CIC On Duty): the same monitoring done for the INFNGRID infrastructure is done for the EGEE infrastructure using the same tools (i.e. GSTAT, SAM, GRIDICE, GGUS) and some COD specific tools (i.e. COD dashboard) The Italian ROC is involved also in the monitoring and support of the entire LCG/EGEE infrastructure. It participates to the TPM and COD activities.

36 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 36 Procedures Managing procedures

37 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 37 Introducing a new site Before joining the INFNGRID, a site have to accept several rules, described in a Memorandum of Understanding (MoU). The COLG (Grid Local Coordinator) read and sign it, and they fax this document to INFN-CNAF. Moreover all sites must provide this email alias: grid-prod@. This alias will be used to report problems and it will be added to the site managers' mailing list. Of course it should include all site managers of your grid site. The IT-ROC registers the site and site-managers in the GOC-DB, and create a supporter-operative group in the ticketing system XOOPS.XOOPS Site-managers have to register themselves in XOOPS, so they can be assigned to their supporter-operative groups; each site-manager has to register in the test VOs infngrid and dteam Site-managers install the middleware, following the instructions distributed by the Release Team (http://grid-it.cnaf.infn.it/ Installation section). When finished, they make some preliminary test (http://grid- it.cnaf.infn.it/ --> Test&Cert --> Fry) and then they make the request for the ROC certificationhttp://grid-it.cnaf.infn.it/http://grid- it.cnaf.infn.it/ (http://grid-it.cnaf.infn.it/index.php?id=cmtreport&type=1).http://grid-it.cnaf.infn.it/index.php?id=cmtreport&type=1 IT-ROC log a ticket to communicate with site-managers during the certification.

38 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 38 Introducing a new VO When an experiment asks to enter in grid as a new VO, it is necessary a formal request followed by some technical steps. Formal Part: Needed resources and economical contribution to be agreed between the experiment and the INFNGRID Executive Board (EB) Pick out the experiment software and verify it will work in the Grid environment Verify the support that it will receive in the several INFN-GRID production sites Communicate to IT-ROC the names of VO-managers, Software-managers, persons responsible of resources and of the support for the software experiment Software requisites, kind of jobs and of the storage final destination (CASTOR, SE, experiment disk server)

39 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 39 Introducing a new VO Once the Executive Board (EB) has approved the experiment request, the technical part begins: IT-ROC creates the VO voms server IT-ROC creates the VO support group on the ticketing system VO-managers fill in the VO identity card on the CIC portal IT-ROC announces the new VO to sites

40 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 40 Useful links… Italian grid project: http://grid.infn.it/http://grid.infn.it/ Italian production grid: http://grid-it.cnaf.infn.it/http://grid-it.cnaf.infn.it/ SAM: https://lcg-sam.cern.ch:8443/sam/sam.pyhttps://lcg-sam.cern.ch:8443/sam/sam.py CIC Portal: http://cic.gridops.org/http://cic.gridops.org/ GSTAT: http://goc.grid.sinica.edu.tw/goc/http://goc.grid.sinica.edu.tw/goc/ GridICE: http://gridice4.cnaf.infn.it:50080/gridice/site/site.phphttp://gridice4.cnaf.infn.it:50080/gridice/site/site.php GOC Accounting: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php

41 Enabling Grids for E-sciencE Daniele Cesini - WorkShop 2007 sul Calcolo e Reti dell'INFN - Rimini 7-11 May 2007 41 Many thanks to… All EGEE-SA1 shifters; All INFNGrid site manager; INFNGrid release team; DGAS devel team (Guarise A., Patania G., Piro R.); INFN-T1 staff (Chierici A., Italiano A., Lo Re’ G.) Bonacorsi Daniele; Cavalli Alessandro; Fattibene Enrico; Gaido Luciano; Misurelli Giuseppe; Pagano Alfredo; Paolini Alessandro; Selmi Matteo; Veronesi Paolo; Vistoli Cristina;


Download ppt "WorkShop 2007 sul Calcolo e Reti dell'INFN Enabling Grids for E-sciencE Rimini, 7-11 Maggio 2007 Operation and Support at INFN-GRID Daniele Cesini – INFN-CNAF."

Similar presentations


Ads by Google