BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
1 Planetary Network Testbed Larry Peterson Princeton University.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
NIKHEF Testbed 1 Plans for the coming three months.
Tam Vu Remote Procedure Call CISC 879 – Spring 03 Tam Vu March 06, 03.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
COS 420 DAY 25. Agenda Assignment 5 posted Chap Due May 4 Final exam will be take home and handed out May 4 and Due May 10 Latest version of Protocol.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Idle virtual machine detection in FermiCloud Giovanni Franzini September 21, 2012 Scientific Computing Division Grid and Cloud Computing Department.
1 Network File System. 2 Network Services A Linux system starts some services at boot time and allow other services to be started up when necessary. These.
File Systems (2). Readings r Silbershatz et al: 11.8.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Networked File System CS Introduction to Operating Systems.
Ashok Agarwal 1 BaBar MC Production on the Canadian Grid using a Web Services Approach Ashok Agarwal, Ron Desmarais, Ian Gable, Sergey Popov, Sydney Schaffer,
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Distributed File Systems
Honeypot and Intrusion Detection System
STAR scheduling future directions Gabriele Carcassi 9 September 2002.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Simplifying Resource Sharing in Voluntary Grid Computing with the Grid Appliance David Wolinsky Renato Figueiredo ACIS Lab University of Florida.
Grid Workload Management Massimo Sgaravatto INFN Padova.
ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Andrew McNabETF Firewall Meeting, NeSC, 5 Nov 2002Slide 1 Firewall issues for Globus 2 and EDG Andrew McNab High Energy Physics University of Manchester.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Error Scope on a Computational Grid Douglas Thain University of Wisconsin 4 March 2002.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Module 10: Windows Firewall and Caching Fundamentals.
Andrew McNabGrid in 2002, Manchester HEP, 7 Jan 2003Slide 1 Grid Work in 2002 Andrew McNab High Energy Physics University of Manchester.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Andrew McNab - Security issues - 4 Mar 2002 Security issues for TB1+ (some personal observations from a WP6 and sysadmin perspective) Andrew McNab, University.
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
T3g software services Outline of the T3g Components R. Yoshida (ANL)
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Five todos when moving an application to distributed HTC.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
Use of HLT farm and Clouds in ALICE
Dag Toppe Larsen UiB/CERN CERN,
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Dag Toppe Larsen UiB/CERN CERN,
Workload Management System
Building Grids with Condor
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
IS 4506 Server Configuration (HTTP Server)
STORK: A Scheduler for Data Placement Activities in Grid
Firewalls Jiang Long Spring 2002.
IS 4506 Configuring the FTP Service
Presentation transcript:

BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question: How can we run BaBar software on EDG grid sites?

ParrotChirp Introduction of Parrot BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results We need transparent access to the Objectivity Database (requires local file access)

Parrot functionality BaBar MC production The Parrot Virtual File System HTTPFTPRFIONeSTChirp Local Cache HTTP Server FTP Server (POSIX Interface) Whole File I/O (get/put) Partial File I/O (open,close,read,write, lseek) RFIO Server NeST Server Chirp Server Condor Proxy Secure Remote RPC Condor Shadow Integration with Castor Traditional I/O Services Allocation and Mgmt Full UNIX Semantics Integration with Condor (Ptrace trap) Not yet x509 Optimize

Private network Relay GCB Parrot Chirp NFS The introduction of GCB BaBar MC production software VU (Amsterdam University) EDG testbed (NIKHEF) Condor-G Jobs Results Some computers A lot of computers Jobs Results

GCB functionality GCB Server Central Manager A B P Private network Persistent connection Relay NATNAT

PBS job manager 72 hour jobs Can’t wait for queues Private network NFS BaBar MC production software Queue Batch job Condor-G Job GlideIn EDG testbed (NIKHEF) Relay Private network Relay Parrot Chirp The introduction of GlideIn VU (Amsterdam University) Jobs Results Some computers A lot of computers Jobs Results GCB

GlideIn functionality

Private network PBS job manager 72 hour jobs Can’t wait for queues Private network NFS BaBar MC production software Queue Batch job Condor-G Job GlideIn EDG testbed (NIKHEF) Relay Parrot Chirp Overview of complete setup VU (Amsterdam University) Jobs Results Some computers A lot of computers Jobs Results GCB

PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Leave only the components VU (Amsterdam University) Some computers A lot of computers GCB

PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp The interesting dependencies VU (Amsterdam University) Some computers A lot of computers GCB NAT box Different MDS scheme Objectivity database LOCK server sockets NFS problems UID / hostname checks Dropping UDP packages Timeout 2 minutes Inactive sockets Inactive File I/O

Consequences Different MDS scheme –Implemented EDG scheme for GlideIn Objectivity –A lot of debugging –Made Parrot mimic hostname and uid –Tricked Objectivity to use standard NFS libraries Aggressive NAT box –Changed GCB to use TCP instead of UDP –Used Parrot to keep sockets alive –Parrot recovers File I/O when TCP connection is lost We are the first to run Objectivity cross-domain

Performance Events Time (minutes) Application Initializes 10 times slower Production 3 times slower Production on local machine Production on EDG testbed

PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Possible improvements VU (Amsterdam University) Some computers A lot of computers GCB Parrot: Caching On per directory basis Requires debugging Create more sophisticated tool to acquire resources Resource planning, distribution, etc. Maybe something fancy already exists?

PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Move chirp servers to private nodes VU (Amsterdam University) Some computers A lot of computers GCB Use Condor/GCB machinery for chirp server Solves security issues Allows chirp server to be on private nodes Requires new chirp-condor implementation

PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Move GCB to head node VU (Amsterdam University) Some computers A lot of computers GCB Move GCB to same machine as Central Manager Solution required for port conflicts Temporary solution: Move CM to a private node

PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Use EDG data storage VU (Amsterdam University) Some computers A lot of computers GCB EDG data storage Write events to EDG data storage (gsiFTP) Requires debugging

PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Use more sites VU (Amsterdam University) Some computers A lot of computers GCB Private network A lot of computers Other testbed EDG data storage Let GCB manage several private networks at the same time Requires solution for conflicting private addresses

Conclusions It works –BaBar MC production runs successfully on NIKHEF EDG testbed –All this experimental software actually works when used together It looks easy –Our GRID setup is complicated, but…. –Parrot hides problems related to local file access –GCB hides problems related to network configurations –GlideIn hides complications with resource gathering –The user can just submit his/her jobs to a local batch system There is some work to do –Performance could be better Initialization 10 times slower Production 3 times slower –Caching and (semi-) local event storage should improve this –Usability could be improved GlideIn should have a tool to acquire them Several improvements proposed for GCB/Parrot The improvements are done at the level of the “grid” tools –The user benefits without rewriting code