Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.

Similar presentations


Presentation on theme: "INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005."— Presentation transcript:

1 INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005

2 INFNGRID-2.6.0 deployment status Deployment status summary: 25 sites are registered in the GOCDB:  23 already updated to INFNGRID-2.6.0  2 upgrade to be defined (ESA-ESRIN and ROMA1-CMS) 14 additional sites are not yet in the GOCDB (but under regular control by the italian ROC):  10 already updated  2 under certification (INFN-Parma and ENEA-INFO)  1 upgrade in progress (INFN-Genova)  1 upgrade to be defined (NAPOLI-VIRGO)

3 INFNGRID-2.6.0 deployment status: resources

4 INFNGRID-2.6.0 deployment status: services

5 INFNGRID-2.6.0 deployment status: services

6 INFNGRID-2.6.0 deployment status: services

7 INFNGRID-2.6.0 features It is essentially LCG-2.6.0 with some additional features:  Features/customizations already present in the previous releases:  new Network Monitor profile  improved support for LSF and MPI  support for additional VOs (managed via LDAP VO server):  babar, zeus  support for the additional VOs (managed via VOMS server):  infngrid, cdf, gridit, compchem, planck, bio, enea, theophys, ingv, inaf, virgo, argo  support for MPI jobs via home syncronisation with scp with hostbased authentication  DGAS (DataGrid Accounting System)  new customizations:  support for argo VO

8 844 Production Infrastructure: Resources 438

9 DGAS Usage records are collected by DGAS and stored in two different Home Location Registries (HLR), one for the resources and one for the VOs. Data are collected for jobs submitted via the IT Resource Brokers Accounting data will be provided to APEL A prototype web interface (DGAS web monitor) has been developed to get accounting data from the HLR with various levels of aggregations and 3 views (user,resource and VO). Access to data is controlled by means of certificates and ACLs. A DGAS functional test to check if DGAS is working on a resource has been developed and is currently under test

10 DGAS Web monitor (VO view)

11 DGAS: Web monitor (resource view)

12 Pre-production activities CNAF site is already part of the PPS: Two more sites (Bari and Padova) will join the PPS infrastructure soon

13 Pre-production activities In addition to the standard PPS activities we want to test the functionality, stability and performance of the gLite WMS interfaced to a production BDII. If the tests are satisfactory a gLite WMS could be deployed as a core service in addition to the LCG Resource Brokers.

14 Certification services INFN Grid Certification Testbed –to test and certificate the Grid software developed inside the INFN: gLite and LCG. –to certify new INFN-GRID releases installation –Five sites: INFN-TORINO, INFN-PADOVA, INFN- CNAF, INFN-ROMA1 and INFN-BARI. –The activity is carried out in strict collaboration with the INFN-LCG-EGEE development teams, the EGEE Pre Production Service, ECGI and the Experiment task forces –http://grid-it.cnaf.infn.it/certification/

15 LCG-2.6.0 SitegLite-1.3 Site cert-mon-it (1.2 R-GMA server With Registry/Schema) cert-rb-02 (WMS+LB) cert-rls-01 (gLite1.2FireMan Cat.) glite-rb-00 (1.4 WMS+LB) pre-ui-01 (gLite 1.1 UI) cert-voms-01 (gLite 1.3 VOMS Server) cert-voms-02 (gLite1.1 VOMS Server) cert-ui-01 (gLite 1.2 with bulk UI) gLite-1.2 Site cert-rb-01 (1.2 WMS+LB) APT Repository cert-mon (gLite 1.2 R-GMA Server) ALL PPS devrb (rb) devui (ui) Release Creation/Test Cert Sites EGEE Production BDII cert-rb-03 (gLite 1.4 WMS+LB) cert-pbox-01 (PBOX server) cert-bdii-01 (LCG-2.6.0 BDII) Services for PBOX TESTS CNAF CERTIFICATION / PRE-PRODUCTION +3 servers dedicated to STORM tests

16 PADOVA CERTIFICATION / PRE-PRODUCTION gLite-1.3 Site cert-mon (1.3 R-GMA server) cert-ui-01 (gLite 1.4 UI + Bulk) cert-rb-01 (gLite-1.4 WMS+LB) pre-ce-01 pre-wn-01pre-wn-02pre-wn-03 pre-se-01 gLite-1.3 Site Cert Sites

17 BARI CERTIFICATION pccms7 alicegrid1 alicegrid4 gLite-1.2 Site pccms10 (gLite 1.4 UI) ROMA1 CERTIFICATION grid-cert-01 (gLite 1.3 UI) grid-cert-02 (gLite 1.3 CE/WN) + 3 server dedicate to storage test TORINO CERTIFICATION grid007 (gLite 1.4 UI/RB) grid006 (gLite 1.4 CE/WN/RGMA )

18 Release and documentation Release and documentation : –Documentation: site installation guide, release notes…. –Software repository –Site management guide –FRY is a tool developed by the Release and Documentation group of SA1 Italian ROC to perform quickly a set of basic test on all the grid elements (CE, SE, RB, WN,...). The idea is to increase the speed and reliability of the release certification phase, performing a "standard" set of tests to verify automatically configuration/setup troubles (daemons, permission and ownership of some directories,...). http://grid-it.cnaf.infn.it/index.php?sitetest&type=1 –DGAS checklist [new] DGAS developers produced this document to check if DGAS configuration is ok: –UiPNP –Installation of LCG 2.6 on IA64 http://www.spaci.it/egee/content.php?loc=docs&pg=default.php http://grid-it.cnaf.infn.it/index.php?siteman&type=1

19 Release and documentation

20 Central Management Team Site Certification The CMT is responsible of the certification: checking the functionalities of a site before to join the site to the production grid. In particular checks: –GIIS' information consistence – Local jobs submission (LRMS) –Grid submission with Globus (globus-job-run) –Grid submission with the ResorceBroker –ReplicaManager functionalities In order to certificate a site the CMT uses dedicated grid services – RB: gridit-cert-rb.cnaf.infn.it BDII: gridit-cert-rb.cnaf.infn.it In this way we avoid to have an uncertificate site in the production grid. The same grid services should be used for test activities. The procedure is described in the following document: CMT's site certification procedure [PDF]CMT's site certification procedure

21 Supported VO

22 Voms proxy VO 28-30 AprMayJuneJulyAugSep01-09 Octtotal argo00000303 bio0081628539213 cdf3180810298688677772434623 compchem035924787135 enea00137913943229 gridit0041484511024268 inaf0065320034 infngrid9298274177151409691387 ingv00131812 459 planck801110 031 theophys007065422 virgo003113310360 total48114114931241110816274067064

23 Job status 10/oct/2005 23.25

24

25 Job report 26/9 -10/10

26

27 Support First level support: Italian ROC shift –The Italian ROC provides geographically based local front line support to Virtual Organization, Users and Resources Centres –Provided through daily shifts –Check list to be covered during the shift –Periodic (every 15 days) phone conference ROC/CIC teams and site managers –ROC report to GDA Shitf example, weekly based: Second level support: CIC on Duty –Weekly shift –CIC tools

28 Support system Problems Communication : -ROC on Duty and site managers -Site managers to Central management team and viceversa -Site certification during installation/upgrade -GGUS to ROC

29 tickets statistics –starting date: August 2005 –272 total –64 from GGUS (COD and user)

30 Application Testing

31 Number of job per VO since18/7/2005 in INFNGrid


Download ppt "INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005."

Similar presentations


Ads by Google