VO Box discussion ATLAS NIKHEF - 24-25 January, 2006 Miguel Branco -

VO Box discussion ATLAS NIKHEF - 24-25 January, 2006 Miguel Branco - miguel.branco@cern.ch

ATLAS DDM2 Outline Usage and Motivation for VO Box –Tier0 (Service Challenge) use case An improvement Conclusion

ATLAS DDM3 Usage of VO Box ATLAS data management system requires a set of VO services to be deployed per Storage –Currently hosted on each site VO BOX (ATLAS Data Management = DQ) VO services implemented by DQ: –Interact with Grid middleware: FTS / srmcp LRC SRM –Allow insertion, monitoring, cancellation data transfer requests to each storage –Perform per-file and per-dataset validation –Interact with central ATLAS-specific dataset catalogues –Implement the ATLAS data flow as defined by the computing model

4 Usage of VO Box A typical VO BOX installation contains a set of services associated with one (or more) storage DQ triggers transfers from the destination side –Either VO services at the site poll central catalogs for new or updated dataset transfer requests –… or … –Transfers (per file or dataset) are directly requested to the VO service at the site VO services are not replacements for existing Grid middleware but complement its functionality

ATLAS DDM5 Usage of VO Box DQ components: –“Request Handling”: Apache-hosted services where end-users/production managers can directly insert file/dataset transfer requests –“Fetcher”: task polling central ATLAS dataset subscription catalog to check if any new or updated dataset transfer request has been scheduled for the storage –“Agents” implementing state machine with a backend database containing queue of file transfer requests

ATLAS DDM6 Usage of VO Box ATLAS deploys onto the VO Box –A service container (Apache + mod_python) –Uses the security infrastructure (mod_gridsite + MyProxy client) –A persistent database (MySQL) –A set of ‘agents’ acting on the requests Currently run as cron jobs Using vobox-* utils for handling security: proxy cache

7 Usage of VO Box MySQL Request Handling Fetcher Transfer Agents FTSLRCSRM VO Box ATLAS Central Dataset CataloguesDQ client tools Local Site Services Apache front-end services ATLAS Central Dataset Cat. or other VO Services

ATLAS DDM8 Motivation for VO Box FTS: A huge leap but difficult to use in isolation Use Case: ATLAS wants to move data from site A to site B “Insert FTS request” ? –What about intermediary sites (hop) ? –And what prevents multiple similar (or the same) transfer requests from being inserted multiple times ? –What prevents similar (or the same) set of files from being requested to ‘stage to disk’ many times over? Big lesson from DC-2/Rome: putting grid ‘business logic’ onto job scripts at the worker node is inefficient: –very difficult to control (e.g. stop), difficult to monitor, difficult to “bulk” requests to Grid m/w (FTS, LFC), …

ATLAS DDM9 Motivation for VO Box ATLAS has an LRC per site/storage –Remote connection to the LFC takes ~1.4s –Local connection to the LFC takes ~0.4s –LFC did not support well too many parallel sessions All above is an issue. Was a bottleneck during SC3 Non-LCG sites: FTS vs. srmcp –Missing FTS/FPS functionality VO services at each site are the ones contacting the site middleware services –Allow bulk requests: single stage request for many files, as opposed to e.g. having each job do an SRM request Currently: –WNs do not contact VO Box directly but this may be reconsidered Jobs would use the normal Apache front-end

ATLAS DDM10 Motivation for VO Box VO Services have helped improving: –Quota management –Space cleanup –LRC SE integrity –(Real-time) Monitoring –Error handling –Dynamic data management By building upon baseline services such as SRM and existing FTS functionality

ATLAS DDM11 Data handling scenario for Tier0 exercise (scenario covering transfers up to the Tier1s) Files are processed at CERN and attached to ATLAS datasets –10 Tier1s: 85k files/day transferred to Tier1s Datasets are subscribed at sites Subscriptions may be cancelled and changed to other sites (dynamically due to site errors) VO services poll for new dataset subscriptions and insert FTS requests to transfer data –Each VO service then handles requests until completion, allowing real-time monitoring, cancellation points –Not running centrally! Tier0 only: generating 85k requests/day –Compared to DC2/Rome model (single central DB) it is a more scalable and manageable approach

12 Security overview Current scenario: –Apache + mod_gridsite 2 ports open: 8000 (insecure: HTTP GET), 8443 (GSI security: HTTP POST) –vobox-* Proxy cache; Users must go to MyProxy first –Transfer Agents: Set security environment (from vobox-* proxy cache) and trigger requests to grid services –MySQL database: Must be reachable from ‘Transfer Agents’ and ‘Request Handling’ only –Logging: Apache (hosting services visible to the outside world) logs requests (requester IP, GSI User DN) Fetcher logs all contacts to central ATLAS dataset catalogues Request Handling DB: logs all requests (including User DN) Transfer Agents: single log file; agents depend only on local MySQL DB and act as clients to grid middleware –But depend on proxy cache (e.g. similarly to the RB)

ATLAS DDM13 Suggested small improvement Maintain VO Box principle: the ability for the VO to run its services –If services are hosted away from the site, performance suffers ATLAS experimented with “central” VO BOXes at CERN during SC3, hosted on LSF cluster Consider adding more generic middleware: –Apache, mod_python, mod_gridsite –Vobox-* –MySQL –LCG UI We have suggested on many forums the need for a standard model to develop, deploy and manage VO services

14 Conclusion 1/3 Naïve to assume the ‘grid’ middleware will handle all scenarios, all usage patterns on all conditions –Experiments need the flexibility to handle the middleware in a way that’s efficient - according to the experiment’s needs - and secure! –Difficult - impossible? - to find generic usage pattern across all experiments, just because experiments are different E.g. ATLAS has its own dataset and metadata catalogs, monitoring –Turnaround time for developing or improving Grid middleware is clearly a concern now! ATLAS commissioning

15 Conclusion 2/3 Are the services running on the VO Box generic? –Some of them, yes (eg. the ‘FTS’ babysitters) They should go to the m/w –But not all of them are: some depend on eg. ATLAS dataset definition, datablock definition, ATLAS metadata model An important point is that even if they were generic there is no uniform (and secure) way to deploy and use them –RB has different security handling from FTS (regarding MyProxy usage)… and these are LCG services!

16 Conclusion 3/3 VO box is a work-around as it stands –It is just a box close to the site services to run our VO services Some of its services may be provided by the grid m/w Work on Dynamic, Secure Service Containers would be welcomed as a long-term solution Developers should focus on baseline services and let each experiment handle more complex usage scenarios with their own VO services

VO Box discussion ATLAS NIKHEF - 24-25 January, 2006 Miguel Branco -

Similar presentations

Presentation on theme: "VO Box discussion ATLAS NIKHEF - 24-25 January, 2006 Miguel Branco -"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VO Box discussion ATLAS NIKHEF - 24-25 January, 2006 Miguel Branco -

Similar presentations

Presentation on theme: "VO Box discussion ATLAS NIKHEF - 24-25 January, 2006 Miguel Branco -"— Presentation transcript:

Similar presentations

About project

Feedback