Presentation is loading. Please wait.

Presentation is loading. Please wait.

DataGRID Testbed Enlargement EDG Retreat Chavannes, august 2002 Fabio HERNANDEZ

Similar presentations


Presentation on theme: "DataGRID Testbed Enlargement EDG Retreat Chavannes, august 2002 Fabio HERNANDEZ"— Presentation transcript:

1 DataGRID Testbed Enlargement EDG Retreat Chavannes, august 2002 Fabio HERNANDEZ fabio@in2p3.fr

2 Fabio Hernandez 2 Outline  Problem statement  Current process  Acceptance process overview  What should a testbed site provide?

3 Fabio Hernandez 3 Problem Statement  Goal: provide a grid infrastructure with enough computing resources to allow applications work in a suitable testing environment  Constraints: In order to make useful tests, applications need some quality of "service" 37 sites expressed interest in joining the testbed, although not all for them will join at the same time: 5 core sites are already in We want to be able to provide a grid environment with as many resources as possible, so we would like all the sites willing to provide resources to do so

4 Fabio Hernandez 4 Current Status  No central management for taking decisions on which site can join when  Join the testbed means to be registered in the appropriate information index of each available resource broker You only need to officially join if your site is a service provider But you can test without being registered, and this is good (more on this later)  The RB and II are the central points of the testbed People operating the resource broker are the enablers

5 Fabio Hernandez 5 Current Status (cont.)  Currently, when you as a site administrator say you are ready, your site is registered in the II Trust relationship between the RB operators and the site administrators May not scale to ~40 sites  We are building a grid composed of several (not centrally- managed) sites and a central point of failure A failure in a single site can severely disturb the operation of the whole grid  How can we minimize the chances of failure of this central point?  We need some kind of light certification/acceptance process before a site can join the applications testbed The aim is to help new sites to join in a controlled way

6 Fabio Hernandez 6 Acceptance Process Overview  This process is intended to sites providing computing or storage services, i.e. having CEs or SEs UI-only sites are probably not an issue  In a nutshell, the idea is to let the candidate site test its installation before user jobs can be scheduled to this site Each EDG component can be installed and tested separately  Process overview (mainly for CE/WN and SE) Step 1: install/configure the component and perform local tests Verify security infrastructure, Globus layer, file transfer, firewalls, etc. Step 2: for CE, run acceptance test suite using the RB of the applications testbed Force the scheduling of the test jobs to the candidate site by using option --resource of command dg-job-submit Contact Iteam for performing this step

7 Fabio Hernandez 7 Acceptance Process Overview (cont.)  Process overview (cont.) Step 3: add RunTimeEnvironment flag EDG-TEST to the candidate site information system and request the registration into the applications testbed II From this step on, the candidate site can be used by the RB, but it is tagged as "Test". Step 4: run acceptance test suite by ITeam/Test group and iterate Step 5: when certification is over, change flag from EDG-TEST to EDG- CERTIFIED  Two possibilities for the RB not to schedule user jobs to this new site during the certification process: Every testbed user systematically specifies the requirement EDG-CERTIFIED, or RB is configured to so that the matchmaking includes only CEs having the tag EDG-CERTIFIED

8 Fabio Hernandez 8 Acceptance Process Overview (cont.)  Testing of SE is a little more complicated File replication tests need the intervention of (at least) two Ses: unless you install two SEs on your site (which is possible) you need to be registered on another site's SE  Although some automatic procedures can be used to verify the installation/configuration, this should not replace a good understanding of the interactions between all the components by the site administrator  Even if your site is is registered on the II, you as a site administrator have several means to make sure that user jobs don't be scheduled for execution in your site Grid-mapfile, RunTimeEnvironment definition, …

9 Fabio Hernandez 9 What should a testbed site provide?  Close coordination between the RB operators and the site administrators is needed  RB operators must be able to remove a site exhibiting extraneous behavior (hosts unreachable, jobs crashing, …) This may not be easy if the RB uses the EDG MDS hierarchy A manual configuration is probably best suited for this purpose but we need people responsible for this (and probably other) operation tasks  A site administrator must be able to remove his own site for maintenance purposes Be careful with jobs already scheduled for execution in your site  Each site provides a clearly identified person/team responsible for installation, configuration, operation and user support Contact information: individual and collective e-mail addresses are already in place for core sites

10 Fabio Hernandez 10 Questions/Comments


Download ppt "DataGRID Testbed Enlargement EDG Retreat Chavannes, august 2002 Fabio HERNANDEZ"

Similar presentations


Ads by Google