BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question: How can we run BaBar software on EDG grid sites?
ParrotChirp Introduction of Parrot BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results We need transparent access to the Objectivity Database (requires local file access)
Parrot functionality BaBar MC production The Parrot Virtual File System HTTPFTPRFIONeSTChirp Local Cache HTTP Server FTP Server (POSIX Interface) Whole File I/O (get/put) Partial File I/O (open,close,read,write, lseek) RFIO Server NeST Server Chirp Server Condor Proxy Secure Remote RPC Condor Shadow Integration with Castor Traditional I/O Services Allocation and Mgmt Full UNIX Semantics Integration with Condor (Ptrace trap) Not yet x509 Optimize
Private network Relay GCB Parrot Chirp NFS The introduction of GCB BaBar MC production software VU (Amsterdam University) EDG testbed (NIKHEF) Condor-G Jobs Results Some computers A lot of computers Jobs Results
GCB functionality GCB Server Central Manager A B P Private network Persistent connection Relay NATNAT
PBS job manager 72 hour jobs Can’t wait for queues Private network NFS BaBar MC production software Queue Batch job Condor-G Job GlideIn EDG testbed (NIKHEF) Relay Private network Relay Parrot Chirp The introduction of GlideIn VU (Amsterdam University) Jobs Results Some computers A lot of computers Jobs Results GCB
GlideIn functionality
Private network PBS job manager 72 hour jobs Can’t wait for queues Private network NFS BaBar MC production software Queue Batch job Condor-G Job GlideIn EDG testbed (NIKHEF) Relay Parrot Chirp Overview of complete setup VU (Amsterdam University) Jobs Results Some computers A lot of computers Jobs Results GCB
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Leave only the components VU (Amsterdam University) Some computers A lot of computers GCB
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp The interesting dependencies VU (Amsterdam University) Some computers A lot of computers GCB NAT box Different MDS scheme Objectivity database LOCK server sockets NFS problems UID / hostname checks Dropping UDP packages Timeout 2 minutes Inactive sockets Inactive File I/O
Consequences Different MDS scheme –Implemented EDG scheme for GlideIn Objectivity –A lot of debugging –Made Parrot mimic hostname and uid –Tricked Objectivity to use standard NFS libraries Aggressive NAT box –Changed GCB to use TCP instead of UDP –Used Parrot to keep sockets alive –Parrot recovers File I/O when TCP connection is lost We are the first to run Objectivity cross-domain
Performance Events Time (minutes) Application Initializes 10 times slower Production 3 times slower Production on local machine Production on EDG testbed
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Possible improvements VU (Amsterdam University) Some computers A lot of computers GCB Parrot: Caching On per directory basis Requires debugging Create more sophisticated tool to acquire resources Resource planning, distribution, etc. Maybe something fancy already exists?
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Move chirp servers to private nodes VU (Amsterdam University) Some computers A lot of computers GCB Use Condor/GCB machinery for chirp server Solves security issues Allows chirp server to be on private nodes Requires new chirp-condor implementation
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Move GCB to head node VU (Amsterdam University) Some computers A lot of computers GCB Move GCB to same machine as Central Manager Solution required for port conflicts Temporary solution: Move CM to a private node
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Use EDG data storage VU (Amsterdam University) Some computers A lot of computers GCB EDG data storage Write events to EDG data storage (gsiFTP) Requires debugging
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Use more sites VU (Amsterdam University) Some computers A lot of computers GCB Private network A lot of computers Other testbed EDG data storage Let GCB manage several private networks at the same time Requires solution for conflicting private addresses
Conclusions It works –BaBar MC production runs successfully on NIKHEF EDG testbed –All this experimental software actually works when used together It looks easy –Our GRID setup is complicated, but…. –Parrot hides problems related to local file access –GCB hides problems related to network configurations –GlideIn hides complications with resource gathering –The user can just submit his/her jobs to a local batch system There is some work to do –Performance could be better Initialization 10 times slower Production 3 times slower –Caching and (semi-) local event storage should improve this –Usability could be improved GlideIn should have a tool to acquire them Several improvements proposed for GCB/Parrot The improvements are done at the level of the “grid” tools –The user benefits without rewriting code