Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.

Similar presentations


Presentation on theme: "Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN."— Presentation transcript:

1 Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

2 2 Overview  Study of SRM 2.2 specification  Tests executed  Plan  Status of SRM clients  GLUE schema  Grid Storage System Deployment (GSSD)  Deployment plan  Conclusions

3 3 Study of SRM 2.2 specification  In September 2006 very different interpretations of the spec  3 Releases of the specification: July, September, December  A study of the spec (state/activity diagrams) has identified many behaviours not defined by the specs.  A list of about 50 points has been compiled in September 2006.  Many issues solved. Last 30 points discussed and agreed during the WLCG Workshop. The implementation for those will be ready in June 2007.  The study of the specifications, the discussions and testing of the open issues have helped insure consistency between SRM implementations. https://twiki.cern.ch/twiki/bin/view/SRMDev/IssuesInTheSpecifications

4 4 Tests executed  S2 test suite testing availability of endpoints, basic functionality, use cases and boundary conditions, interoperability, exhaustive and stress tests.  Availability: Ping and full put cycle (putting and retrieving a file)  Basic: basic functionality checking only return codes and passing all basic input parameters  Usecases: testing boundary conditions, exceptions, real use cases extracted from the middleware clients and experiment applications.  Interoperability: servers acting as clients, cross copy operations  Exhaustive: Checking for long strings, strange characters in input arguments, missing mandatory or optional arguments. Output parsed.  Stress: Parallel tests for stressing the systems, multiple requests, concurrent colliding requests, space exhaustion, etc.  S2 tests cron job running 3-4 times per day  In parallel, manual tests from GFAL/lcg-utils,FTS, DPM test suite.  Tests from LBNL similar to S2 basic tests but written in Java  http://sdm.lbl.gov/srm-tester/v22-progress.html http://sdm.lbl.gov/srm-tester/v22-progress.html  http://sdm.lbl.gov/srm-tester/v22daily.html http://sdm.lbl.gov/srm-tester/v22daily.html

5 5 Tests executed  For now only availability, basic, use case and interoperability tests executed on a regular base  Results published daily on a web page. History results also available:  http://cern.ch/grid-deployment/flavia/avail http://cern.ch/grid-deployment/flavia/avail  http://cern.ch/grid-deployment/flavia/basic http://cern.ch/grid-deployment/flavia/basic  http://cern.ch/grid-deployment/flavia/usecase http://cern.ch/grid-deployment/flavia/usecase  http://cern.ch/grid-deployment/flavia/cross http://cern.ch/grid-deployment/flavia/cross s2_logs = daily results history = test archive  Results of failed and successful tests reported daily to developers to signal issues.  Test results and issues discussed on srm-tester and srm- devel lists  https://hpcrdm.lbl.gov/mailman/listinfo/srmtester https://hpcrdm.lbl.gov/mailman/listinfo/srmtester  http://listserv.fnal.gov/archives/srm-devel.html http://listserv.fnal.gov/archives/srm-devel.html

6 6 Tests executed Basic MoU SRM methods MoU SRM methods needed by the end of 2007. Expected by the end of summer Needed now only for dCache!

7 7 Tests executed rather simple  The basic tests are rather simple: for an asynchronous function a foreseen return value is enough! We need other checks to make sure the function is correctly implemented. use cases  Therefore, the use cases and boundary conditions tests are much more meaningful as these give a proper indication of the status of an implementation. detailed report  Daily, the output of the test suite is checked for both red and green boxes. A detailed report is sent to the developers. Also a web page is kept up to date with the open problems: https://twiki.cern.ch/twiki/bin/view/SRMDev/ImplementationsProblems full output of the tests  The test page has links with details about the failures and the full output of the tests, including the exact sequence of actions that determine the failures. http://cern.ch/grid-deployment/flavia/basic/s2_logs/22DPMCERN/index.html

8 8 Tests executed Availability UseCase Interoperability/ Cross Copy

9 9 Tests executed monitored the progress  We monitored the progress of the implementations including progress over time in order to detect problematic implementations. converge  After a first instability period, almost all implementations started to converge around the 11th of December. Rather good stability and response to basic tests. WSDL update  Around the 15th of December 2006 WSDL update. Many implementations introduced fixes as well, for instance to reflect the decisions taken about the open issues in the SRM spec. problemstest suite  Some problems found in the test suite itself and fixed. Use case test suite still does not fully reflect the last decisions taken on the SRM standard (discussed during the WLCG workshop). failuresclose to zero  Now the failures for the basic tests are close to zero for all implementations. Red boxes are understood and are being addressed by the developers.

10 10 Tests executed: status of the implementations  DPM has been rather stable and in good state for the entire period of testing. Few issues were found (few testing conditions caused crashes in the server) and fixed. All MoU methods implemented beside Copy (not needed at this stage).  DRM and StoRM: good interaction. At the moment all MoU methods implemented (Copy in PULL mode not available in StoRM). Implementations rather stable. Some communication issues with DRM need investigation.  dCache: very good improvements in the basic tests in the last weeks. Implementation is rather stable. All MoU methods have been implemented (including Copy that is absolutely needed for dCache) but ExtendFileLifeTime. Timur has promised to implement it as soon as he gets back to US.  CASTOR: The implementation has been rather unstable. The problems have been identified in transferring the requests to the back-end server. Therefore, it has been difficult to fully test with the basic test suite the implementation of the SRM interface. A major effort is taking place to fix these problems.

11 11 Plan  Plan for 1Q of 2007 :  Phase 1: From 16 Dec 2006 until end of January 2007:  Availability and Basic tests  Collect and analyze results, update page with status of endpoints: https://twiki.cern.ch/twiki/bin/view/SRMDev/ImplementationsProblems https://twiki.cern.ch/twiki/bin/view/SRMDev/ImplementationsProblems  Plot results per implementation: number of failures/number of tests executed for all SRM MoU methods.  Report results to WLCG MB.  Phase 2: From beginning until end of February 2007:  Perform tests on use-cases (GFAL/lcg-utils/FTS/experiment specific), boundary conditions and open issues in the spec that have been agreed on.  Plot results as for phase 1 and report to WLCG MB.  Phase 3: From 1 March until “satisfaction/end of March 2007” :  Add more SRM 2.2 endpoints (some T1s ?)  Stress testing  Plot results as for phase 2 and report to WLCG MB.  This plan has been discussed during the WLCG workshop. The developers have agreed to work on this as a matter of priority.

12 12 Status of SRM clients  FTS  SRM client code has been unit-tested and integrated into FTS  Tested against DPM, dCache and StoRM. CASTOR and DRM test started.  Release to development testbed expected first week of February  Experiments could do tests on the dedicated UI set up for this purpose  GFAL/lcg-utils  New rpms expected on test UI first week of February  New patch for gLite release certification at the same time  Still using old schema (see next slide).

13 13 GLUE Schema  GLUE 1.3 available  http://glueschema.forge.cnaf.infn.it/Spec/V13 http://glueschema.forge.cnaf.infn.it/Spec/V13  Not everything originally proposed, only the important changes  LDAP implementation done by Sergio Andreozzi. Deployment preview for now.  Information providers started by Laurence Field. First deployment by end of February.  Clients need to adapt to new schema.

14 14 Grid Storage System Deployment (GSSD)  Working group launched by the GDB to coordinate SRM 2.2 deployment for Tier-1s and Tier-2s  https://twiki.cern.ch/twiki/bin/view/LCG/GSSD https://twiki.cern.ch/twiki/bin/view/LCG/GSSD  Mailing list: storage-class-wg@cern.ch  People involved: developers, site admins, experiments  Use pre-GDB meetings for discussions: Tier-1s presenting their setup and reporting problems. Some Tier-2s.  Interesting outcome from d-Cache workshop in Desy: sites reporting configuration and problems with their installations.  Collected requirements from the experiments.

15 15 Deployment plan  (A.) Collecting requirements from the experiments: more details with respect to what is described in TDR. The idea is to understand how to specifically configure a Tier-1 or a Tier-2: storage classes (quality of storage), disk caches (how many, how big and with which access), storage transition patterns, etc.  Started. Questionnaire sent to experiments. Good input collected from LHCb and ATLAS.  (B.) Understand current Tier-1 setup: requirements ?  Started  (C.) Getting hints from developers: manual/guidelines ?  Started  (D.) Selecting production sites as guinea pigs and start testing with experiments.  Beginning of March 2007 - July 2007  (E.) Assisting experiments and sites during tests (monitoring tools, guidelines in case of failures, cleaning-up, etc.). Define mini SC milestones  March - July 2007  (F.) Accommodate new needs, not initially foreseen, if necessary.  (G.) Have production SRM 2.2 fully functional (MoU) by September 2007

16 16 Deployment plan: client perspective and backup plan  FTS, lcg-utils, GFAL  SRM v1.1 will be the default until SRM 2.2 is deployed in production and it has proven stable  Site admins have to run both SRM v1.1 and SRM v2.2  Until SRM v2.2 installation is stable  SRM type is retrieved from the information system  In case 2 versions found for the same endpoint SRM v2.2 is chosen only if space token (storage quality) specified. Otherwise SRM v1.1 is the default  FTS can be configured per channel on the version to use; policies can also be specified (“always use SRM 2.2”, “use SRM 2.2 if space token specified”,…) ===>>> It is possible and foreseen to run in mixed mode with SRM v1.1 and SRM v2.2, until SRM v2.2 is proven stable for all implementations. ===>>> Backup plan: continue to run SRM v1.1 as done up to now. Introduce SRM v2.2 endpoints if ready and proven stable.

17 17 Conclusions clearer specifications  Much clearer description of SRM specifications. All ambiguous behaviors made explicit. A few issues left out for SRM v3 since they do not affect the SRM MoU. establishedmethodology  Well established and agreed methodology to check the status of the implementations. Boundary conditions, use cases from the upper layer middleware and experiment applications will be the focus of next month’s work. Monthly reports and problems escalation to the WLCG MB. clear plan  A clear plan has been put in place in order to converge. not whereplanned  We are still not where we planned to be. However, we have a firm commitment from the developers to work on the SRM implementation as a matter of priority. sites and experiments Storage Classes  Working with sites and experiments for the deployment of the SRM 2.2 and Storage Classes. Specific guidelines for Tier-1 and Tier-2 sites are being compiled. SRM 2.2 in production by September 2007  It is not unreasonable to expect SRM 2.2 in production by September 2007. backup plan  A backup plan foresees the use of a mixed environment SRM v1 and v2, where the upper layer middleware takes care of hiding the details from the users.


Download ppt "Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN."

Similar presentations


Ads by Google