Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Grids for E-sciencE The INFN GRID Marco Verlato (INFN-Padova) EELA WP2 E-infrastructure Workshop Rio de Janeiro, 20-23 August 2007.

Similar presentations


Presentation on theme: "Enabling Grids for E-sciencE The INFN GRID Marco Verlato (INFN-Padova) EELA WP2 E-infrastructure Workshop Rio de Janeiro, 20-23 August 2007."— Presentation transcript:

1 Enabling Grids for E-sciencE The INFN GRID Marco Verlato (INFN-Padova) EELA WP2 E-infrastructure Workshop Rio de Janeiro, 20-23 August 2007

2 Enabling Grids for E-sciencE 2 Outline A little of history INFNGRID Overview INFNGRID Release INFNGRID Services From developers to production… Monitoring and Accountig Users and Sites Support Managing procedures

3 Enabling Grids for E-sciencE 3 The INFN GRID project The 1° National Project (Feb. 2000) aiming to develop the grid technology and the new e-infrastructure to solve LHC (and e-Science) computing requirements e-Infrastructure = Internet + new WEB and Grid Services on top of a physical layer composed by Network, Computing, Supercomputing and Storage Resources, made properly available in a shared fashion by the new Grid services Since then many Italian and EU projects made this a reality Many scientific sectors in italy, EU and the entire World base now their research activities on the Grid INFN Grid continues to be the national container used by INFN to reach its goals coordinating all the activities: –In the national, european and international Grid projects –In the standardization processes of the Open Grid Forum (OGF) –In the definition of EU policies in the ICT sector of Research Infrastructures –Through its managerial structure: Executive Board, Technical Board…

4 Enabling Grids for E-sciencE 4 The INFN GRID portal http://grid.infn.it

5 Enabling Grids for E-sciencE 5 The strategy Clear and stable objectives: development of the technology and of the infrastructure needed for the LHC computing but of general value Variable instruments: use of projects and external funds ( from EU, MIUR...) to reach the goal Coordination among all the projects (Executive Board) –Grid middleware & infrastructure Grid needed by INFN and LHC within a number of core European and International projects, often coordinated by CERN  DataGrid, DataTAG, EGEE, EGEE II, WLCG –Often fostered by INFN itself International collaboration with US Globus and Condor for the middleware and Grid projects like Open Science Grid e Open Grid Forum in order to reach global interoperability among developed services and the adoption of international standards National pioneer developments of the MW and the national infrastructure in the areas not covered by EU projects via national projects like Grid.it, LIBI, EGG … Strong contribution to political committees: e-Infrastructure Reflection Group (eIRG ->ESFRI), EU Concertation meetings and with involved Units of Commission (F2 e F3) to establish activities programs (Calls)

6 Enabling Grids for E-sciencE 6 Some history … LHC  EGEE Grid 1999 – Monarc Project –Early discussions on how to organise distributed computing for LHC 2000 – growing interest in grid technology –HEP community was the driver in launching the DataGrid project 2001-2004 - EU DataGrid project / EU DataTAG project –middleware & testbed for an operational grid 2002-2005 – LHC Computing Grid – LCG –deploying the results of DataGrid to provide a production facility for LHC experiments 2004-2006 – EU EGEE project phase 1 –starts from the LCG grid –shared production infrastructure –expanding to other communities and sciences 2006-2008 – EU EGEE-II –Building on phase 1 –Expanding applications and communities … … and in the future – Worldwide grid infrastructure?? –Interoperating and co-operating infrastructures? CERN

7 Enabling Grids for E-sciencE 7 Other FP6 activities of INFN GRID in Europe/1 To guarantee Open Source Grid Middleware evolutions towards international standards –OMII Europe …and its availability through an effective repository –ETICS To contribute to R&D informatics activities –Core Grid To Coordinate EGEE extension in the world –EUMedGrid –Eu-IndiaGrid –EUChinaGrid –EELA

8 Enabling Grids for E-sciencE 8 Other FP6 activities of INFN GRID in Europe/2 To promote EGEE for new scientific communities –GRIDCC (real time applications and instruments control) –BioInfoGrid (Bioinformatics: Coordinated by CNR) –LIBI (MIUR, Bionfomatics in Italy) –Cyclops (Civil Protection) To contribute to e-IRG, the e-Infrastructure Reflection Group born in Rome the December 2003 –Initiative of Italian Presidency on “eInfrastructures (Internet and Grids) – The new foundation for knowledge-based Societies” Event organised by MIUR, INFN and EU Commission –Representatives in EIRG appointed by EU Science Ministres –Policies and Roadmap for e-Infrastrutture development in EU To coordinate participation to Open Grid Forum (ex GGF)

9 Enabling Grids for E-sciencE 9 INFN GRID / FP6 active projects

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 10 FP7:guarantee sustainability The future of Grids in FP7 after 2008 –EGEE proposed to European Parlament to set up an European Grid Initiative (EGI) in order to:  Guarantee long-term support & development to European e- Infrastructure based on EGEE, DEISA and the Grid national projects being fundend by the National Grid Initiatives (NGI)  Provide a coordination framework at EU level as done for the research networks by Geant, DANTE and the National Networks like GARR The Commission asked that a plan for long-term sustainability Grid infrastructure (EGI + EGEE-III, …) to be included among the goals of EGEE-II (other than DANTE+ Geant 1-2) The building of EGI at EU level and of a National Grid Initiave at national level is among the main goals of FP7

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 11 The future of INFNGRID :IGI In 2006 ended Grid.IT, the 3+1 years National Project funded by MIUR with 12 M€ (2002-05) The future: the Italian Grid Infrastructure (IGI) Association –EU (eIRG, ESFRI) requires the fusion of different pieces of National Grids into a single National Organisation (NGI) to be unique interface to EU --> IGI for Italy –Substantial consensus for the creation of IGI for a common governance of the italian e-Infrastructure from all involved public bodies:INFN Grid, S-PACI, ENEA Grid, CNR, INAF, Centri Nazionali di supercalcolo : CINECA, CILEA, CASPUR, and new consortia “nuovi PON” –Under evaluation with MIUR the evolution of GARR towards a more general body to manage all the components of the infrastructure: Network, Grid, Digital Libraries… Crucial for INFN in 2007-2008 will be to manage the transition from INFN Grid to IGI, in such a way to preserve and if possible enhance the organisation levels which allowed Italy to reach world leadership and become a leading partner of EGI

12 Enabling Grids for E-sciencE 12 Overview INFNGRID Overview

13 Enabling Grids for E-sciencE 13 Supported Sites 40 Sites supported: 31 INFN Sites 9 NON INFN Sites Total Resources: About 4600 CPUs About 1000 TB Disk Storage (+ About 700 TB Tape)

14 Enabling Grids for E-sciencE 14 Supported VOs 40 VOs supported: 4 LHC (ALICE, ATLAS, CMS, LHCB) 3 cert (DTEAM, OPS, INFNGRID) 8 Regional (BIO, COMPCHEM, ENEA, INAF, INGV, THEOPHYS, VIRGO) 1 catch all VO: GRIDIT 23 Other VOs Recentrly a new regional VO enabled: COMPASSIT

15 Enabling Grids for E-sciencE 15 Components of the production Grid Grid is not only CPUs and Storage Other elements are as much fundamental for running, managing and monitoring the grid: Middleware Grid Services Monitoring tools Accounting tools Management and control infrastructure Users

16 Enabling Grids for E-sciencE 16 GRID Management Grid management is performed by the Italian Regional Operation Center (ROC). Its main activities are: Production of the INFNGRID release and test it Deployment of the release to the sites, support to local administrators and sites certification Deployment of the release into central grid services Maintenance of grid services Periodical check of the resources and services status Account the resources usage Support at an Italian level to site managers and users Support at an European level to site managers and users Introduction of new Italian sites Introduction of new regional VOs The IT-ROC is involved in many other activities, not directly related to the production infrastructure, i.e. PreProduction, PreView and Certification Testbeds

17 Enabling Grids for E-sciencE 17 The Italian Regional Operation Center (ROC) One of 10 existing ROCs in EGEE Operations Coordination Centre (OCC) –Management, oversight of all operational and support activities Regional Operations Centres (ROC) –providing the core of the support infrastructure, each supporting a number of resource centres within its region Grid Operator on Duty Grid User Support (GGUS) –At FZK, coordination and management of user support, single point of contact for users

18 Enabling Grids for E-sciencE 18 Middleware INFNGRID RELEASE

19 Enabling Grids for E-sciencE 19 The m/w installed on INFNGRID nodes is a customization of the gLite m/w used in the LCG/EGEE community. The customized INFNGRID release is packaged by the INFN release team (grid-release infn.it). The ROC is responsible for the deployment of the release. At the moment the INFNGRID-3.0-Update28 (based on gLite3.0-Update 28) is deployed. LCG LCG 1.0 INFN-GRID 1.0 EGEE EGEE II 2004200720032008 LCG 2.0 2.0 gLite 3.0 3.0 20052006 INFNGRID Release

20 Enabling Grids for E-sciencE 20 INFNGRID customizations: why? VOs not supported by EGEE: define once configuration parameters (e.g. VO servers, poolaccounts, add VOMS certificates,...) to reduce misconfiguration risks MPI (requested by non-HEP sciences), additional GridICE config (monitor Wns), AFS read-only (CDF requirement),... Deploy additional middleware in a non intrusive way: Since Nov. 2004 VOMS, now in EGEE; DGAS (DataGrid Accounting System); NetworkMonitor (monitor network connection metrics)

21 Enabling Grids for E-sciencE 21 INFNGRID customizations Additional VOs (~20) GridICE on almost all profiles (including WN) Preconfigured support for MPI: –WN without home shared, but home synchronization using scp with host based authentication DGAS accounting: –New profile (HLR server) + additional packages on CE NME (Network Monitor Element) Collaboration with CNAF-T1 for Quattor UI “PnP” –UI installable without administrator privilegies NTP AFS (read-only) on WN (needed by CDF VO)

22 Enabling Grids for E-sciencE 22 The packages are distributed in repositories available via HTTP For each release EGEE, there are 2 repositories collecting different types of packages: –Middleware http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30/ –Security http://linuxsoft.cern.ch/LCG-CAs/current/ INFNGRID customizations => 3-rd repository –http://grid-it.cnaf.infn.it/apt/ig_sl3-i386 Packages and metapackages

23 Enabling Grids for E-sciencE 23 Metapackages management process 1: starting from EGEE lists, update INFNGRID lists (maintained in SVN repository) 2: once the lists are ok, to test them generate a first version of INFNGRID metapackages 3: install and/or upgrade the metapackages on the release testbed 4: if there are errors, correct and goto 2: 5: publish the new metapackages on the official repositories so they are available for everybody

24 Enabling Grids for E-sciencE 24 Metapackages management our metapackages are supersets of the EGEE ones: –INFNGRID metapackage = EGEE metapackage + INFNGRID additional rpms EGEE distributed metapackages –http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30 Flat rpm lists are available: –http://glite.web.cern.ch/glite/packages/R3.0/deployment We maintain a customized copy of the lists and resync them easily –https://forge.cnaf.infn.it/plugins/scmsvn/viewcvs.php/trunk/ig- metapackages/tools/getglists?rev=1888&root=igrelease&view=log Using another tool (bmpl) we can generate all artifacts starting from the lists –“Our” (INFNGRID) customized metapackages  http://grid-it.cnaf.infn.it/apt/ig_sl3-i386 –HTML files with the lists of the packages (one list per profile)  http://grid-it.cnaf.infn.it/?packages –Quattor templates lists:  http://grid-it.cnaf.infn.it/?quattor

25 Enabling Grids for E-sciencE 25 ig-yaim The package ig-yaim is an extension of glite-yaim. It provides: –Additional functions or functions that override existing ones. Both are stored in functions/local instead of functions/ –e.g to configure NTP, AFS, LCMAPS gridmapfile/groupmapfile,.. More poolaccounts => ig-users.def instead of users.def More configuration parameters => ig-site-info.def instead of site-info.def –Both packages (glite-yaim, ig-yaim) are needed!!

26 Enabling Grids for E-sciencE 26 Documentation Documentation is published at each release –Release notes, upgrade and installation guides:  http://grid-it.cnaf.infn.it/?siteinstall  http://grid-it.cnaf.infn.it/?siteupgrade  http://grid-it.cnaf.infn.it/?releasenotes written in LaTeX and published in html, pdf and txt Additional informations about Updates, various Notes are published also in wiki pages: –https://grid- it.cnaf.infn.it/checklist/modules/dokuwiki/doku.php?id=rel:updates –https://grid- it.cnaf.infn.it/checklist/modules/dokuwiki/doku.php?id=rel:hlr_server_installa tion_and_configuration Everything is available for site managers on a central repository

27 Enabling Grids for E-sciencE 27 gLite Updates: 17/10/2006 - gLite Update 06 20/10/2006 - gLite Update 07 24/10/2006 - gLite Update 08 14/11/2006 - gLite Update 09 11/12/2006 - gLite Update 10 19/12/2006 - gLite Update 11 22/01/2007 - gLite Update 12 05/02/2007 - gLite Update 13 19/02/2007 - gLite Update 14 26/02/2007 - gLite Update 15 ……. INFNGRID Updates: 27/10/2006 - INFNGRID Update 06/07/08 (+ new dgas, gridice packages) 15/11/2006 - INFNGRID Update 09 19/12/2006 - INFNGRID Update 10/11 29/01/2007 - INFNGRID Update 12 14/02/2007 - INFNGRID Update 13 20/02/2007 - INFNGRID Update 14 27/02/2007 - INFNGRID Update 15 …… Steps: – gLite Update announcement – INFNGRID release alignment to announced update (ig-metapackages, ig-yaim) – Local testing – IT-ROC deployment Updates Updates deployment – Since the introduction of gLite3.0, from EGEE there where no more big release changes, but a series of smaller frequent updates (about weekly) – INFNGRID release was updated consequently

28 Enabling Grids for E-sciencE 28 INFNGRID services Overview INFNGRID Services Overview

29 Enabling Grids for E-sciencE 29 The general web portal

30 Enabling Grids for E-sciencE 30 The technical web portal

31 Enabling Grids for E-sciencE 31 General Purpose Services

32 Enabling Grids for E-sciencE 32 General purpose services – VOMS servers

33 Enabling Grids for E-sciencE 33 VOMSes Stats VO User argo 17 bio 44 compchem 31 enea 8 eumed 56 euchina 35 gridit 89 inaf 25 infngrid 178 ingv 12 libi 10 pamela 16 planck 16 theophys 20 virgo 9 Cdf 1133 Egrid 28 VOMS NUMBER OF USERS PER VO TOP USERS (about 85% of total proxies): CDF (~50k proxies/month) EUMED (~500 proxies/month) PAMELA (~500 proxies/month) EUCHINA (~400 proxies/month) INFNGRID (Test purposes ~ 200 proxies/month)

34 Enabling Grids for E-sciencE 34 General purpose Services - HLRs Accounting: Home Location Register DGAS (Distributed Grid Accounting System) is used to account jobs running on the farm (grid and not-grid jobs) 12 HLR (1 st level) distributed 1 experimental 2 nd level HLR to aggregate data from 1 st level DGAS2Apel used to send job to the GOC for all sites.

35 Enabling Grids for E-sciencE 35 VOs Dedicated Services New DEVEL-INFNGRID-3.1 WMS and LB are coming soon as VO dedicated services into production (atlas, cms, cdf, lhcb) VO specific services previously run by the INFNGRID Certification Testbed and now moved to production  DEVEL RELEASE A total of 18 VO dedicated services that will become 25 with the introduction of the 3.1 WMS and LB

36 Enabling Grids for E-sciencE 36 FTS channels and VOs Installed and fully managed via Quattor-Yaim; 3 hosts as frontend, 1 backend oracle cluster; Not only LHC VOs –PAMELA –VIRGO Full standard T1-T1 + T1-T2 + STAR channels –51 channel agents; –7 VO agents; (A prototype of) Monitoring tool available –Agent and Tomcat log file parsing and saved in a mysql db –Web interface: http://argus.cnaf.infn.it/fts/index-FTS.php Support: –Dedicated department team for Tickets; –Mailing list: fts-support cnaf.infn.it

37 Enabling Grids for E-sciencE 37 FTS transfer overview

38 Enabling Grids for E-sciencE 38 Testbeds M/W FLOW FROM DEVELOPERS TO PRODUCTION IN EGEE AND INFNGRID

39 Enabling Grids for E-sciencE 39 TESTBEDS Preview Certification CERN Certification INFN Pre-Production Service (PPS) Testbeds JRA1 Developers SA3 (Certification CERN) SA1 PPS (Pre-Production) SA1 EGEE PS (Production) VOs JRA1/SA1 Preview TB VOs INFN Certification TB VOs INFNGRID Release Team SA1 INFNGRID PS (Production) SA1 INFNGRID PS (DEVEL Production) VOs

40 Enabling Grids for E-sciencE 40 AIM:the last step for m/w testing before being deployed at the production scale INPUT: CERN Certification (SA3) SCOPE: EGEE SA1 about 30 sites spread all over Europe (1 Taiwan) COORDINATION: CERN USER ALLOWED: all the LHC VOs, diligent, switch and 2 PPS fake VOs CONTACTS : project-eu-egee-pre-production-service cern.ch http://egee-pre-production-service.web.cern.ch/egee-pre-production-service// ACTIVITIES: Main activity is the testing of the installation procedures and basic functionalities of releases/patches done by site-managers. There is limited m/w testing done by users: this is the main pps issue! Pre-Production Service (PPS) in EGEE

41 Enabling Grids for E-sciencE 41 PPS is run as the Production Service: –SAM TESTs –Tickets from COD –GOCDB registration –Etc… Pre-Production Service (PPS) in EGEE

42 Enabling Grids for E-sciencE 42 Italian Participation to PPS 3 INFN sites: CNAF PADOVA BARI 2 Diligent sites: CNR ESRIN cert-se-01 cert-ce-01 150 slots production farm cert-ce-03 cert-rb-01 cert-bdii-03 ALL OTHER PPS SITES OUTSIDE INFN prep-ce-02 prep-se-01 68 slots production farm prep-ce-01 pccms2 vgridba5 150 slots production farm cert-voms-01 pps-fts pps-lfc cert-ui-01 pps-apt-repo cert-mon-01 cert-mon CNAF BARI PADOVA Central Services CNAF: 2 CE with access to the production farm, 1 SE, 1 mon box + central services (VOMS, UI, BDII, WMS, FTS, LFC, APT REPO) people: D.Cesini, M.Selmi, D.Dongiovanni PADOVA: 2 CE with access to the production farm, 1 SE, 1 Mon Box people: M.Verlato, S.Bertocco BARI: 1 CE with access to the production farm, 1 SE people: G.Donvito

43 Enabling Grids for E-sciencE 43 Preview Testbed It is now an official EGEE activity asked by JRA1 to expose to users those components not yet considered by CERN (SA3) certification. The aim is getting feedback from end-users and sitemanagers. It is a distributed testbed deployed in few European sites. A joint SA1-JRA1 effort is needed in order not to dedicate persons at 100% of their time to this activity as acknowledged by TCG and PMB COORDINATOR : JRA1 (Claudio Grandi) USER ALLOWED: JRA1/Preview people and all interested users CURRENT ACTIVITIES: CREAM, gLexec, gPBox CONTACTS : project-eu-egee-middleware-preview cern.ch https://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLitePreviewNowTesting

44 Enabling Grids for E-sciencE 44 Italian Participation to the Preview Testbed 3 INFN sites: - CNAF (D.Cesini, D.Dongiovani) - PADOVA (M.Sgaravatto, M.Verlato, S.Bertocco) ROMA1 (A. Barchiesi) H/W resources are partly taken from the INFN certification testbed and partly from the jra1 testbed. cert-wn-04 egee-rb-05cert-bdii-02 egee-rb-08 cert-ce-04 cert-ce-05 cert-wn-05 pre-ce-01 cert-wn-03 cert-se-01 cert-04 pad-wn-02 rm1-ce pre-ui-01 cert-pbox-01 cert-pbox-02 cream-01 cream-02cream-03cream-04 cream-05 Cream-06 ALL OTHER PREVIEW SITES OUTSIDE INFN rm1-wn cert-ce-06 cert-wn-06 CNAF PADOVA Central Services ROMA1 Physical nodes that run virtual services Preview services deployed in Italy: PADOVA: 1 CREAM CE + 5 WN CNAF: 1 WMS 3.1, 1 BDII, 1 gLiteCE+ 1 WN, 1 UI, 1 DPM-SE (for gpbox) 1 WMS3.1 + 2 gLiteCE + 1 LCG CE + 3 WN + 2 gpbox servers ROMA1: 1 CE + 1 WN for gpbox tests (to be installed) Virtual machines used at cnaf to optimize h/w resources

45 Enabling Grids for E-sciencE 45 EGEE Activity run by SA3 – It is the official EGEE certification testbed that releases gLite m/w to PPS and to Production. ACTIVITY: Test and certify all gLite components, release packaging. COORDINATION: CERN INFN Involved Sites: CNAF (A.Italiano), MILANO (E.Molinari), PADOVA (A.Gianelle) Italian Activities: Testing of Information providers, DGAS, WMS CERN Certification (SA3) Services provided: 1 lsf CE + 1 batch system server on a dedicated machine + 1 DGAS HLR + 1 site BDII + 2 WN. All services are located at CNAF. wmstest-ce-02 wmstest-ce-03 wmstest-ce-04 wmstest-ce-05 wmstest-ce-06 wmstest-ce-07 wmstest-ce-08 SA3 CERN Certification testbed INFN participation CNAF Recently the responsibility of WMS testing passed from CERN to INFN – Main Focus of SA3-Italia

46 Enabling Grids for E-sciencE 46 Distributed testbed deployed in a few Italian sites where EGEE m/w with INFNGRID customizations and INFNGRID grid products are installed for testing purposes by a selected number of end users and grid-managers before being released. It is NOT an official EGEE activity and it should not be confused with the CERN certification testbed run by the SA3 EGEE activity. Most of the server migrated to the PREVIEW TESTBED SITES and PEOPLE: CNAF (D.Cesini, D.Dongiovani) PADOVA (S.DallaFina, C., Aifitimiei, M.Verlato) TORINO (R.Brunetti, G.Patania,F.Nebiolo) ROMA1 (A.Barchiesi) CONTACTS : cert-release infn.it http://grid-it.cnaf.infn.it/certification INFNGRID Certification Testbed

47 Enabling Grids for E-sciencE 47 WMS (CNAF)  No more time to perform detailed test as in the first phase of the certification tb. ( https://grid-it.cnaf.infn.it/certification/?INFN_Grid_Certification_Testbed:WMS%2BLB_TEST )  Provide resources to VOs or developers and maintain patched and experimental WMS: Experimental WMS 3.0: - 1 ATLAS WMS - 1 ATLAS LB - 1 CMS WMS + LB - 1 CDF WMS + LB - 1 LHCB WMS + LB WMS for developers: - 2 WMS + LB The Experimental WMS were heavily used in the last period because more stable than those officially released due to the long time needed for patches to reach the PS: - bad support from certification - production usage statistics altered  recently tagged as INFNGRID DEVEL (see next slide) PRODUCTION services  Support to JRA1 for the installation of WMS 3.1 in the development TB ACTIVITIES / 1 INFNGRID Certification Testbed

48 Enabling Grids for E-sciencE 48 DGAS CERTIFICATION (TORINO) - 4 physical servers virtualized in a very dynamic way DEVEL RELEASE (PADOVA/CNAF): -To speed up the flow of patches into the service used by VOs, does not follow the normal m/w certification process -Based on the INFNGRID official release (3.0) -Wiki page on how to transform a normal INFNGRID release into a DEVEL http://agenda.cnaf.infn.it/materialDisplay.py?contribId=4&materialId=0&confId=18 -apt repository to maintain control on what is going into the DEVEL release -1 WMS Server at CNAF - Announced via mail after testing at CNAF -Cannot come with all the guarantees of normally certified m/w INFNGRID Certification Testbed ACTIVITIES / 2

49 Enabling Grids for E-sciencE 49 RELEASE INFNGRID CERTIFICATON (PADOVA) - 20 Virtual Machines on 5 Physical Servers - http://igrelease.forge.cnaf.infn.it STORM – Some resources Provided - 3 physical servers SERVER VIRTUALIZATION (all sites) INFNGRID Certification Testbed ACTIVITIES / 3

50 Enabling Grids for E-sciencE 50 cert-rb-02 cert-rb-03 cert-rb-04 cert-rb-05 cert-rb-06 egee-rb-06 cert-bdii-01 Ce torino server roma1 INFN Certification services cert-wn-01 cert-ce-02 cert-wn-02 ibm139 Experimental Patched WMS PASSED TO DEVEL PROD or used by JRA1 Virtualization Tests Reources provided to STORM test cert-wn-03 Release DEVEL Release1 Release2Release4 Release3Release5 Ce torino Release INFNGRID 5 Physical servers X 4 VM = 20 VM DGAS test CNAF PADOVA ROMA1 TORINO cert-rb-07 Egee-rb-04 Testbed snapshot Ce torino INFNGRID Certification Testbed

51 Enabling Grids for E-sciencE 51 VIRTUAL GRID  ’NEW’ Create a self contained grid using old T1 h/w resource to be dedicated to WMS tests: - Total control of what is installed - No interference with the production grid (altered statistics, site-managers complaining in case of stuck jobs, no production cpu wasting) INFNGRID Certification Testbed Physical services Under the developpers control Virtual Sites Exact deployment is under study, probably: 1 LCG CE and 1 WN per physical box WMS LB BDII CEWN CEWN CEWN CEWN CEWN CEWN CEWN CEWN CEWN CEWN 37 Physical Box available per RACK (2 racks available) Dual PIII 1.4 GHz 2 GB RAM - box dedicated to virtual sites, services can be installed on more powerful machines A Virtual Site Prototype is already installed on a couple of Boxes We are investigating the performance that can be reached with this kind of hardware/deployment

52 Enabling Grids for E-sciencE 52 Monitoring and Accounting Monitoring and Accounting Tools used by the ROC

53 Enabling Grids for E-sciencE 53 Monitoring GridICE: http://gridice4.cnaf.infn.it:50080/gridice/site Developed by INFN Several servers with different scopes are installed and maintained by the IT-ROC

54 Enabling Grids for E-sciencE 54 GSTAT: http://goc.grid.sinica.edu.tw/gstat//Italy.html Developed out of INFN A GSTAT server is maintained by the IT-ROC Monitoring GSTAT queries the Information System every 5 minutes The sites and nodes checked are those registered in the GOC DB The inconsistency of the information published and the eventual missing of a service that a site should publish are reported as an error

55 Enabling Grids for E-sciencE 55 SAM: https://lcg-sam.cern.ch:8443/sam/sam.py SAM-ADMIN: https://cic2.gridops.org/samadmin/ Is the CERN-EGEE official testing tool, tests are performed by jobs submitted to sites. Submission is triggered by an admin web interface. A mirror of the web interface is hosted at CNAF and maintained by the IT-ROC. Monitoring

56 Enabling Grids for E-sciencE 56 ROCRep && HLRMON: http://grid-it.cnaf.infn.it/rocrep/index.php http://grid-it.cnaf.infn.it/hlrmon/index.php (Data about all VOs, all sites, T1 excluded) Web interface to obtain aggregated Grid usage data. Two versions exists: 1)Data taken from the GridiceDB 2)Data taken from DGAS HLR DB – a new interface is being released Accounting

57 Enabling Grids for E-sciencE 57 GOC ACCOUNITNG SYSTEM: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php Data from the HLR server are accounted into the GOC system through the dgas2apel tool Accounting

58 Enabling Grids for E-sciencE 58 Users and Sites Support Support

59 Enabling Grids for E-sciencE 59 The IT-ROC offers a number of grid services and controls their correct operation. But not only…. The IT-ROC also continuously monitors the status of the sites inside the ROC itself and in case of problems helps site managers or users to find a solution. As a parallel activity the IT-ROC is also involved in the monitoring and support of the entire EGEE infrastructure (TPM and COD) – The same support activity to users and sites given to the INFNGRID is given to the LCG/EGEE Grid on a round robin manner among the ROCs Support

60 Enabling Grids for E-sciencE 60 Users and sites support The main tools to give support to users are the ticketing systems: EGEE make use of the GGUS (Global Grid User Support) ticketing system (www.ggus.org) Each ROC uses different tools interfaced to GGUS in a bidirectional Way. By means of Web services, it is possible to: Transfer tickets from the global to regional system Transfer tickets from the regional to the global system Once tickets are logged they are assigned to a proper support unit either in GGUS either in the regional systems The IT-ROC ticketing system is based on XOOPS/xHelp

61 Enabling Grids for E-sciencE 61 Interface to GGUS Web Portal GGUS System GGUS/ TPM ROC-1 Helpdesk ROC-1 Interface Ticket solved Ticket assignment to ROC-1 SU-1 SU-2 SU-N ROC-X Helpdesk ROC-X Interface SU-1 SU-2 SU-N Ticket re-assigned

62 Enabling Grids for E-sciencE 62 Interface to GGUS A new ticket arrives from GGUS We assign the ticket to the site concerning it

63 Enabling Grids for E-sciencE 63 Interface to GGUS The site reassigns the ticket to GGUS… …and adds a response

64 Enabling Grids for E-sciencE 64 IT-ROC Control Shifts About 20 supporters perform a monitoring activity composed by 2 shifts per day, from Monday to Friday, with 2 persons per shift. At the end of the shift a report is produced. During the shift the supporters: Check the Grid status and try to discover problems before the users. In case of problems open tickets to the interested department in order to find a solution. If he/she is able suggests a possible solution. Perform sites certification during the deployment phases Check the status of tickets and urges experts or site-managers to give answers and solutions to them

65 Enabling Grids for E-sciencE 65 IT-ROC Shifts ISSUES The ROC monitoring is oriented to the infrastructure and not to the VOs The active monitoring done via test jobs (i.e. the SAM tool) uses 3 VOs dedicated to infrastructure testing: dteam, ops and infngrid that in general have greater priority on sites  the side effect of this is that VO specific problems are not observed. Passive controls (i.e. gstat and gridice) are not affected by this problem. The infrastructure test can be ok, but users can experience problems as well. The actual control shift organization seems to be insufficient for the VOs needs and the LHC VOs are already performing their own tests (VO dashboards) in order to face this situation.

66 Enabling Grids for E-sciencE 66 IT-ROC Shifts ISSUES Both the Italian and the European experiences in Grid monitoring show that it is necessary to integrate the infrastructure oriented monitoring with a more VO specific monitoring  But just in INFNGRID we have about 40 VOs !! Collaboration between the ROC and the people involved in the VO dashboards is desirable, at least to define a set of controls that are important for the VOs, but still not performed by the ROC

67 Enabling Grids for E-sciencE 67 TPM and COD TPM (Ticket Process Manager): is responsible of the right ticket assignment in the central GGUS system. When a ticket is logged it is automatically assigned to the TPM group that routes the ticket to the proper support unit or, if able, proposes a solution. The whole ticket life is under the control of the TPM that can at any time modify the ticket urging for an answer or solution. Each ROC performs 1 week shift on a round robin cycle. COD (CIC On Duty): the same monitoring done for the INFNGRID infrastructure is done for the EGEE infrastructure using the same tools (i.e. GSTAT, SAM, GRIDICE, GGUS) and some COD specific tools (i.e. COD dashboard) The Italian ROC is involved also in the monitoring and support of the entire LCG/EGEE infrastructure. It participates to the TPM and COD activities.

68 Enabling Grids for E-sciencE 68 Procedures Managing procedures

69 Enabling Grids for E-sciencE 69 Introducing a new site Before joining the INFNGRID, a site have to accept several rules, described in a Memorandum of Understanding (MoU). The COLG (Grid Local Coordinator) read and sign it, and they fax this document to INFN-CNAF. Moreover all sites must provide this email alias: grid-prod@. This alias will be used to report problems and it will be added to the site managers' mailing list. Of course it should include all site managers of your grid site. The IT-ROC registers the site and site-managers in the GOC-DB, and create a supporter-operative group in the ticketing system XOOPS. Site-managers have to register themselves in XOOPS, so they can be assigned to their supporter-operative groups; each site-manager has to register in the test VOs infngrid and dteam Site-managers install the middleware, following the instructions distributed by the Release Team (http://grid-it.cnaf.infn.it/ Installation section). When finished, they make some preliminary test (http://grid- it.cnaf.infn.it/ --> Test&Cert --> Fry) and then they make the request for the ROC certification (http://grid-it.cnaf.infn.it/index.php?id=cmtreport&type=1). IT-ROC log a ticket to communicate with site-managers during the certification.

70 Enabling Grids for E-sciencE 70 MoU for sites Every site have to: Provide computing and storage resources. Farm dimensions (at least 10 cpu) and storage capacity will be agreed with each site Guarantee sufficient man power to manage the site: at least 2 persons Manage efficently the site resources: middleware installation and upgrade, patch application, configuration changes as requested by CMT and do that by the maximum time stated for the several operation Answer to the ticket by 24 hours (T2) or 48 hours (other sites) from Monday to Friday Check from time to time own status Guarantee continuity to site management and support, also in holidays period Partecipate to SA1/Production-Grid phone conferences an meetings and compile weekly pre report Keep updated the information on the GOC DB Enable test VOs (ops, dteam and infngrid), with a higher priority than other VOs Eventual non-fulfilment noticed by ROC will be referred to the biweekly INFNGRID phone conferences, then to COLG, eventually to EB

71 Enabling Grids for E-sciencE 71 Introducing a new VO When an experiment asks to enter in grid as a new VO, it is necessary a formal request followed by some technical steps. Formal Part: Needed resources and economical contribution to be agreed between the experiment and the INFN GRID Executive Board (EB) Pick out the experiment software and verify it will work in the Grid environment Verify the support that it will receive in the several INFN GRID production sites Communicate to IT-ROC the names of VO-managers, Software-managers, persons responsible of resources and of the support for the software experiment Software requisites, kind of jobs and of the storage final destination (CASTOR, SE, experiment disk server)

72 Enabling Grids for E-sciencE 72 Introducing a new VO Once the Executive Board (EB) has approved the experiment request, the technical part begins: IT-ROC creates the VO voms server IT-ROC creates the VO support group on the ticketing system VO-managers fill in the VO identity card on the CIC portal IT-ROC announces the new VO to sites

73 Enabling Grids for E-sciencE 73 Useful links… INFN GRID project: http://grid.infn.it/ Italian Production grid: http://grid-it.cnaf.infn.it/ SAM: https://lcg-sam.cern.ch:8443/sam/sam.py CIC Portal: http://cic.gridops.org/ GSTAT: http://goc.grid.sinica.edu.tw/goc/ GridICE: http://gridice4.cnaf.infn.it:50080/gridice/site/site.php GOC Accounting: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php THANK YOU


Download ppt "Enabling Grids for E-sciencE The INFN GRID Marco Verlato (INFN-Padova) EELA WP2 E-infrastructure Workshop Rio de Janeiro, 20-23 August 2007."

Similar presentations


Ads by Google