Presentation on theme: "Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed."— Presentation transcript:
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed. Done with a handful of modifications to the Testbed site and to the existing farms. This talks describes what we did and how you can do it too...
Andrew McNab - Manchester HEP - 17 September 2002 Farms at Manchester HEP BaBar 80 * 0.8GHz GridFarm 16 * 1.0GHz DZero / Atlas 60 * 1.5GHz
Andrew McNab - Manchester HEP - 17 September 2002 The problem We want to make existing farms available on the Testbed. But we dont want to massively reconfigure/reinstall farms –theyre in production so need to be kept stable –they are already configured the way their owners need We might want to keep reinstalling as EDG software is updated. –this is labour intensive unless we install from scratch with LCFG install –dont want to have to make many manual changes to CE etc every time we install/upgrade Solution that has been mentioned several times is to have a standard EDG Testbed Site as a front end to the Existing Farm So want to find the minimal set of changes to Farm and Testbed Site that will put the Farm on the Testbed.
Andrew McNab - Manchester HEP - 17 September 2002 Standard Testbed Site SE LCFG WN CEWN /home /flatfiles PBS Node PBS Server PBS All elements installed from LCFG server Computing Element shares /home directories by NFS Storage Element shares /flatfiles with data by NFS PBS Server on CE talks to PBS on Worker Nodes.
Andrew McNab - Manchester HEP - 17 September 2002 What we want SE LCFG WN CEWN /home /flatfiles PBS Node PBS Server PBS Server PBS Node PBS Node Grid Farm / Testbed Site BaBar or DZero/Atlas Farm PBS qsub
Andrew McNab - Manchester HEP - 17 September 2002 Reconfigure Existing Farm PBS Server must allow access from CE, but only for the right users. –Add CE to list of valid job submission clients (eg in hosts.equiv) –Create special queue (bfq or dfq) for Testbed jobs. –Limit queues so desired pool of accounts (eg atlas001 etc) can submit jobs to the bfq/dfq but other queues/pools forbidden. PBS Nodes need access to pool accounts, home directories on CE, and /flatfiles area on SE. –If already using NFS automount, then easy to add /home on CE and /flatfiles on SE (eg as /nfs/gf-home and /nfs/gf-flatfiles) –Add pool accounts to /etc/passwd (or NIS) –Make symbolic links in /home to automount CE /home directories.
Andrew McNab - Manchester HEP - 17 September 2002 Software on PBS Nodes For current EDG job submissions to work, need to install globus-url-copy RPMs on PBS Nodes. –PBS Nodes currently need to make an outgoing gridftp –connections to Resource Broker. –GridFTP possible with NAT, but difficult. Other middleware RPMs will be needed if also intending to manipulate SE and RC during jobs. For use with EDG Testbed, should also install relevant application RPMs
Andrew McNab - Manchester HEP - 17 September 2002 Changes to Testbed Site Have attempted to minimise changes: –easier to document and support –easier to maintain as EDG software changes Basic philosophy: modify EDG scripts to make remote qsub and qstat calls to PBS Server machines on the farms. Only need to edit 3 scripts on the CE –/opt/globus/libexec/globus-script-pbs-queue –/opt/edg/info/mds/sbin/skel/ce-globus.skel –/opt/edg/info/mds/bin/ce-pbs Create grid-mapfile and ce-static.ldif for each queue. Include farm queue and PBS nodes in LCFG site-cfg.h
Andrew McNab - Manchester HEP - 17 September 2002 New behaviour Modified ce-pbs queries PBS Server using remote qstat Publishes edited grid-mapfile listing only the right users. Jobs can be submitted using Resource Broker, based on published information. When received by CE, globus-script-pbs-queue submits job to remote PBS Server EDG Globus jobmanager on CE monitors job status via remote qstat and transmits to Logging as normal. Job runs on PBS Node with access to pool account /home Job completes and returns files to RB via gridftp
Andrew McNab - Manchester HEP - 17 September 2002 Example logs Three jobmanagers visible to GridPP MDS and RB: –gf18.hep.man.ac.uk:2119/jobmanager-pbs-gfq(Grid Farm/Testbed) –gf18.hep.man.ac.uk:2119/jobmanager-pbs-dfq(DZero/Atlas farm) –gf18.hep.man.ac.uk:2119/jobmanager-pbs-bfq(BaBar farm) Different operating system, grid-mapfile lists of users etc for each queue. Can submit job to RB and have it matchmake the requirements –including dynamic properties like free nodes Example log shows submitting a job from UI at RAL via RB at IC, which decides which farm at Manchester matches and sends the job there.
Andrew McNab - Manchester HEP - 17 September 2002 Applying this to other sites This recipe being written up for With current EDG release, the PBS Nodes need outgoing direct internet access (not NAT.) You need to be able to make minor changes to PBS Server permissions, NFS mounts etc as described. You should have some (3?) dedicated Testbed machines, or add it to an existing GridPP/EDG Testbed setup. –We use Microdirect.co.uk boxes at 1.5GHz/256MB/40GB box for £250 …. If you dont use an EDG-supported batch system (PBS etc), you need to modify ce-pbs and globus-script-pbs-* scripts to use your job submission commands.
Andrew McNab - Manchester HEP - 17 September 2002 Summary Its not at all difficult to access existing PBS farms via an EDG Testbed site. –include CE + SE in NFS and PBS configuration of farm –include pool accounts in farms passwd file –enforce security by account pools Only need to modify a handful of files on the Testbed CE. Should be relatively straightforward to apply this to other batch queue systems even if you dont use PBS. Weve demonstrated putting our 150 * ~1 GHz nodes on the current Testbed and submitting jobs via GridPP RB You can too.