Presentation is loading. Please wait.

Presentation is loading. Please wait.

– n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid 25-27 ottobre 2004 Bari.

Similar presentations


Presentation on theme: "– n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid 25-27 ottobre 2004 Bari."— Presentation transcript:

1 – n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid 25-27 ottobre 2004 Bari

2 – n° 2 Summary u INFN-GRID u Release u Resources, u Services u Supported VOs; u Basic tests before joining the grid; u Certification and periodic tests activity; u Calendar and Ticketing System; u Certification queue; u GridAT (Grid Application Test);

3 – n° 3 INFN-GRID Release u INFN-GRID is a customized release of LCG n All resources are fully managed via LCFGng; n INFN-GRID does not support the middleware installation without LCFGng; u INFN-GRID 2.2.0 release is based upon the official LCG-2.2.0 and it is 100% compatible;

4 – n° 4 INFN-GRID Release u Main differences from LCG 2.2.0 to INFN-GRID 2.2.0: n Added support for DAG jobs; n Added support for AFS on the WorkerNodes; n Added support for MPI jobs via home syncronisation with ssh; n Documented installation of WNs on a private network; u Added full function VOMS support: n INFNGRID, CDF are completely managed via VOMS server.

5 – n° 5 INFN-GRID: Resources and supported VOs (**) Hyperthreaded

6 – n° 6 CPU versus VO

7 – n° 7 INFN-GRID: Production Grid service Service Resources are open to all VOs supported RB-BDII scope Italian Grid NEW! Resource Broker/UI DAG prod-rb-01.pd.infn.it

8 – n° 8 EGEE/LCG: Production Grid services Service Resources are open to all VOs supported by INFN-GRID and EGEE/LCG RB: egee-rb-01.cnaf.infn.it support BIOMED VO RB-BDII scope all european resources EGEE/LCG RB/UI with DAG

9 – n° 9 Upgrade/Installation activity u Testing if "the grid is working" is not so easy; u Certification activity in INFN-GRID can be classified into four levels: n Local tests by the local resource center managers; n Certification tests by CMT Team; n Monitor tests by CMT Team; n The fourth level, certification on demand, made both by CMT Team and Application Teams.

10 – n° 10 Basic site tests (1/2) u These tests can be performed by the local resource center manager, just after an installation/upgrade or also after in case of troubles reported by users or found by our periodic test activity ; n All nodes: s Check that all nodes are mounting the LCFGng RPM repository from the LCFGng server; n CE/SE: s Verify the files access permissions and check the validity and the subject of the host certificate; n CE: s Check if the local scheduler works fine locally; n SE: s In the SE storage area there should be one directory for each VO supported with permissions and owners; n WN: s WNs should have some pool accounts for each supported VOs.

11 – n° 11 Certification activity – TEST ZONE u The Central Management Team is responsible of the resource centers certification: checking the functionalities of a site before joining the site to the production grid. u Although all certification jobs are VO independent, the INFNGRID VO is used to perform these jobs; u In particular are checked: s GIIS' information consistence; s Local jobs submission (LRMS); s Grid submission with Globus (globus-job-run); s Grid submission with the ResorceBroker; s ReplicaManager functionalities; s MPI functionalities u In order to certificate a site the CMT uses dedicated grid services: n RB & BDII: gridit-cert-rb.cnaf.infn.it u In this way we avoid to have an uncertified site in the production grid services;

12 – n° 12 Periodic test u CMT and system managers, could notify advices about their resources via web inserting a “Downtime advices”. u The Calendar shows the snapshot of the Production Service Status. u We periodically submit certification jobs to the sites in order to pro-actively find ‘troubles’ before users find them.

13 – n° 13 Ticketing system u INFN-GRID ticketing system is used: s from users to ask questions or to communicate troubles; s from system manager to communicate about common grid tasks (ex: upgrading to a new grid release) s from CMT to system manager to notify a problem n Support Groups are “helper” groups and they exist to resolve the obvious problems arising with the grow of the grid: n Support Grid Services (RB, RLS, VOMS, GridICE, etc) Group; n Support VO Services Group (each for every VO); n Support VOApplications Group (each for every VO); n Support Site Group (each for every site) n Operative Groups Operative Central Management Team (CMT); n Operative Release & Deployment Team; Users -> Create a ticket Supporters/Operatives -> Open the ticket Users and/or Supporters/Operatives -> Update an open ticket Supporters/Operatives -> Close the ticket

14 – n° 14 Why a “cert” queue ? u A CE could exist in many BDIIs with different purpose(EGEE, LCG, VO specific) u After a site upgrade, just as soon as queues were opened, a lot of jobs arrived from anywhere to an uncertified (and unsecure) site and making impossible its fully certification. u To avoid this, all sites joining INFN-GRID have a cert queue (both with PBS and LSF): n High priority queue; n Only open to VO INFNGRID; n With a low max cpu time (10 minutes); n After site installation/upgrade, only the cert queues is opened; n After certification tests by CMT, every other queues will be opened; In addiction, in this way, all periodic test jobs by ROC submitted to the cert queue will always have a higher priority than the other jobs.

15 – n° 15 BDII - ROC setup u All the sites, certified by the ROC team using the test zone are added to the INFN-GRID production BDII accessible via web. u Each ROC should create, manage and publish via web the region BDII configuration n Similar to http://grid-it.cnaf.infn.it/fileadmin/bdii/gridit-bdii- update.conf u The ROC is ‘authoritative’ for its BDII, it is the master copy of CE and SE of his region n Operations relatedwith ROC resource centers are reflected in the BDII content (scheduled downtime, planned upgrade, site certification failure)

16 – n° 16 GridAT - Grid Application Test GridAT has the main goal to provide a general and flexible framework for VO application tests in a grid system. It permits to test a grid site from the VO viewpoint. Results are stored in a central database and browsable on a web page so it will be also used for certification and test activity.

17 – n° 17 Attivita’ in corso u Sistema di supporto: integrazione in EGEE e copertura supporto distribuito u Evoluzione di Gridice per job monitoring, application monitoring, SLA monitoring, urgente configurazione notifiche u Integrazione di DGAS in INFN-GRID  amministrazione sistema di accounting u Porting di INFN-GRID a SL : nuovo sistema di installazione e configurazione u Operation support infrastruttura EGEE/LCG a ‘rotazione’ tra IT/CERN/UK/FR u Training: corso base e avanzato u Allargamento infrastruttura a sedi non INFN: Spaci, Enea, etc u Amministrazione Policy u Pre-production service per definire il programma di migrazione a Glite u Middleware certification testbed u Operational requirements per il middleware

18 – n° 18 Open issue u Interazione con le VO e gli utenti u Interazione con EGEE/LCG NA4 JRAx etc u Resource allocation policy

19 – n° 19 Useful links u INFN Production Grid n http://grid-it.cnaf.infn.it/ http://grid-it.cnaf.infn.it/ u INFN GridICE n http://grid-it.cnaf.infn.it/index.php?grisview&type=1 http://grid-it.cnaf.infn.it/index.php?grisview&type=1 u INFN test and certification n http://grid-it.cnaf.infn.it/index.php?sitetest&type=1 http://grid-it.cnaf.infn.it/index.php?sitetest&type=1 u INFN Support n http://grid-it.cnaf.infn.it/index.php?id=51&type=1 http://grid-it.cnaf.infn.it/index.php?id=51&type=1 u Contact n grid-manager@infn.it grid-manager@infn.it n Grid-release@infn.it Grid-release@infn.it n Ticket for operational issue


Download ppt "– n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid 25-27 ottobre 2004 Bari."

Similar presentations


Ads by Google