Presentation on theme: "OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab."— Presentation transcript:
OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab
Provenance 2006: Acquire capability to allocate storage to VOs 2007: SRM 2.2 – space reservation – Intensely tested, debugged, documented by OSG Storage – Space reservation cleanup tool. 2008: Partial adoption – Used by Atlas, not used by CMS – Difficult to set up in dCache – Not supported in Bestman Gateway 2009: Increased use of opportunistic storage 2010: Blueprint meeting -renewed request/storage appliance – Requirements doc signed off – Design doc
Feedback from VOs Difficult to use – A large number of steps must be done by a user in order to run jobs using public storage. – Access to storage may not be available as advertised in the BDII. – There are difficulties in moving and tracking large numbers of files when they are treated as independent entities. Suggests requirements beyond those taken from the Blueprint meeting
Outcome of Requirements Process OSG Production Coordinator – Grant allocations to VOs. Resize, extend, rescind. – Clean up expired allocations VO Administrator – Request allocations. Resize, extend, rescind. – Make suballocations for users (and datasets). – Run access checker tool. – Clean up expired allocations. VO Member – Read, write, delete, (copy, and list) files. Includes registration update (and allocation enforcement). – Define datasets. – Replicate, delete datasets. Site Administrator – Help clean up allocation. – Set limits on number of files, concurrent connections.
Constraints on the Design No alteration of Storage Elements – Access continues to be through current clients. No centralized OSG service – Software to be operated by VO Accommodate usage outside the service – Use of traditional means will not have an adverse effect
Design Summary Database will store info on – Allocations – Replicas – Logical Namespace – Monitoring – Registered Users Database will have web-service front end – Invocation through wrapper scripts on the command line
Site registration/monitoring VO Administrator installs software (database, frontend, scripts). OSG Production Coordinator uses tool to register storage areas. – Discovery tool shows total and used space, VOs authorized. – Public storage areas are registered in Allocation database. VO Administrator runs monitoring tool – Discovers storage areas for which the VO is authorized – Checks access with probes – Results saved in Monitoring database
Making Allocations VOA makes a request to the PC asking for space – May optionally specify storage resource PC checks allocations and selects a storage resource. – Tool queries Allocation and Monitoring databases, shows resources that can meet the allocation parameters that the VO can access – New allocation object is made in database – All other VO’s Allocation databases are updated. PC informs VOA of new allocation
Using Allocations –write User invokes a script which does all of these Local file is specified as the source Logical full path is specified as the destination An allocation is selected – that has sufficient space – that has less than the maximum number of files – that is not expired – that is currently accessible to the VO The destination URL is composed VDT client tools are used to write the file The allocation is updated The file logical path is registered in the Namespace catalog The file replica is registered in the Replica catalog
Defining and using Datasets Allow operations on sets of files Especially useful for copying and deleting Could have hook into classads to trigger processing after upload In the Namespace catalog – A file or directory is tagged with a dataset handle – A file belongs to a dataset if it or one of its ancestors has a tag In the Replica catalog – There are dataset replica objects – Has list of member files which replicas that storage resource In transfer operations – List is composed from dataset replica – Resource selection done on the basis of total size
Defining Suballocations Similar procedure to defining allocations – But done by VOA, at the request of a user – Selection is made from the VOs allocations Can have suballocation for dataset – Similar to space reservation Space accounting tracks both the suballocation and its parent – Suballocation counted against parent when made – Writes/deletes update suballocation remaining space Users can clean up their own expired suballocations
Implementation Database – SQL for table creation, update, query – Postgres Web Service – Simple http-based – Message-level security – Access using curl in wrapper scripts – Possibly use Bestman for read, write, delete, copy, ls
Code Assets From the OSG Discovery Tool – Discovery of storage areas – Wrapper scripts for Java clients’ – Maven-build capability in the VDT. From the Pigeon Tool – Monitoring capability. From Bestman – SRM reference implementation – Close support. From MCAS – Restful web service – Development methodology – Fitnesse test suite. From OSG Storage – Postgres install script from the VDT installation package for dCache. – Java methods for WSS fast message-level security
Development Strategy Agile Methodology – Stakeholders involved from the beginning – Demonstrate new features every two weeks – Stakeholders test and give feedback – Frequent developers meetings for progress, short-term planning Continuous Integration – Test with every commit – Packaging is part of build – Nightly build available to stakeholders – Continuous documentation
Timeline October – Detailed implementation planning. – Infrastructure setup: twiki, issue tracker, code repository, build system, continuous integration and test system. – Kickoff meeting. November – SQL for database creation. Start SQL for updates and queries. Shell wrappers for SQL. – Pigeon integration. Installation scripts for testers. December – Finish SQL for updates and queries. Database performance testing/tuning. – Setup webservice code: stubs, queuing mechanism, Java SQL configuration. January – Web service installation script. Java SQL wrappers. Java methods for functions. SRM methods for transfer commands. February – GSI authentication for web service. Performance testing of web service. – End-to-end load testing. Reports capability March – VDT packaging. – ITB testing. – Post-facto registration and cleanup. RPMs for Operations Toolkit.
Unknowns Post-facto accounting. – We have reason to believe it can be done through gratia but have not tested this. Performance tuning – depending on the results of tests. Requirements creep – stakeholders to emphasize/deemphasize various elements, or request additional features.
Software not used iRODS SRM Space Reservation Existing transfer services Alien REDDNET
To provide storage space for non-owner VOs, sites generally allocate an untended common storage area authorized for several OSG VOs. While some VOs have availed themselves of those resources, the experience of the Engage and OSG Storage goups is that there are a number of barriers to its effective use. A large number of steps must be done by a user in order to run jobs using public storage. Access to storage may not be available as advertised in the BDII. There are difficulties in moving and tracking large numbers of files when they are treated as independent entities. A single VO may use up all the public storage on a site, preventing other VOs from having access. Without reportable information on the state and use of public storage, it is difficult to present its value to sites and other VOs.
The OSG Production Coordinator, OSG Storage, and the Engage and LIGO VOs have recognized a need for software to manage space allocations and data transfer for public storage on the Open Science Grid. OSG Storage produced and circulated a requirements document describing its capabilities, which was approved by the stakeholders after a comment period.
OSG Storage surveyed existing software and designed a service to meet the requirements. The service is to be deployed by VOs which make use of public storage, and allows VOs to track their use of allocations on public storage areas which are assigned to them by the Production Coordinator. The service also maintains a catalog of the VO's files, to allow cleanup of expired allocations and to support storage operations on sets of files. Finally, a monitoring component is included so that allocation and resource selection can be done for storage areas that are accessible to the VO.
We estimate it will take one FTE six months to write the service. This includes code/integration for the user interfaces and database operations, a VO test installation package, a VDT-compatible build and installation method, and documentation. We should have the participation of the OSG Production Coordinator and VO representatives throughout the development process, to exercise features as they become available and provide feedback on user experience, software performance, and documentation quality. This requires about 5 % per participant., and a deployment resource for one instance of the service per VO. We would not need participation of site administrators until near the end of the development process; an estimated one-half day would be asked of volunteers at that point. This assumes that the OSG- owned Storage Elements on Gridworks will be available as test endpoints. One non-virtual node should be provided for the developers' test instance of the new service.
On the specifics of what needs to be done, the design uses a database back end and anticipates wide area access through a GSI-authenticated web service front end. We would need to write SQL scripts to create, update, and query the database tables, and Java wrappers for access through the web service. For the front end we have the option of the Bestman reference implementation or a lightweight http endpoint with message-level security. Scripts for installation, startup, and command-line interface need to be written. The Pigeon access checker would be used for monitoring; it would need an add-on to write to the database. There is a good start on the documentation, as a twiki page (https://twiki.grid.iu.edu/bin/view/Storage/OSGPublicStorage) has absorbed much of the requirements and design information.https://twiki.grid.iu.edu/bin/view/Storage/OSGPublicStorage
We will build upon software assets acquired in the past. From the current work: vetted requirements and a thorough database schema and operations design. From the OSG Discovery Tool: discovery of storage areas, wrapper scripts for Java clients, and a maven-build capability in the VDT. From the Pigeon Tool, monitoring capability. From Bestman, the Bestman reference implementation and close support. From MCAS, a Restful web service, a development methodology, and the Fitnesse test suite. From OSG Storage: a Postgres install script from the VDT installation package for dCache. Also from the current work is Java methods WSS fast message-level security.
Unknowns are as follows. We have a requirement to do post-facto accounting. We have reason to believe it can be done through gratia but have not tested this. Performance tuning may be required, depending on the results of tests. While the requirements have been approved on paper, experience with the software may cause stakeholders to emphasize/deemphasize various elements, or request additional features.
Timeline October -Detailed implementation planning. Infrastructure setup: twiki, issue tracker, code repository, build system, continuous integration and test system. Kickoff meeting. November -SQL for database creation. Start SQL for updates and queries. Shell wrappers for SQL. Pigeon integration. Installation scripts for testers. December -Finish SQL for updates and queries. Database performance testing/tuning. Setup webservice code: stubs, queuing mechanism, Java SQL configuration. January -Web service installation script. Java SQL wrappers. Java methods for functions. SRM methods for transfer commands. February -GSI authentication for web service. Performance testing of web service. End-to-end load testing. Reports capability. March -VDT packaging. ITB testing. Post-facto registration and cleanup. RPMs for Operations Toolkit.