Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)

Similar presentations


Presentation on theme: "WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)"— Presentation transcript:

1 WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)

2 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 2/17 Outline u Overview of objectives for 2 nd project year, and the corresponding achievements u Ongoing work on use cases u Evaluations by Loose Cannons u Data Challenge work with Atlas and CMS u Comments on the key points of work in the other experiments u The organisation for D 8.3 ‘Testbed assessment for HEP applications’ u The planning for the 3 rd project year, and some associated issues

3 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 3/17 Objectives for the 2 nd project year, and the corresponding achievements OBJECTIVES u Use and exploitation of Testbed1 u Validation of releases + feedback u Participation in the ATF, and the elaboration of use cases u Design of a common middleware layer for WP8 experiments u Use of EDG middleware in experiment Data Challenges(DCs) ACHIEVEMENTS u All experiments have used the applications testbed. Babar and D0 have joined the 4 LHC experiments, and NA48 will soon join. u Both LCs and the experiments have given continual feedback to middleware from both generic and experiment specific evaluations. u The ATF is very active and executes regular ‘scenario playing’ reviews. Use case documents have been produced and will develop in the context of EDG/LCG. u This has moved into the LCG project. u Atlas and then CMS have achieved significant pioneering work in the use of EDG middleware for DCs, and have produced detailed evaluations.

4 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 4/17 Ongoing work on use cases u ‘Common Use Cases for A HEP Common Application Layer’ (HEPCAL) (Document produced for LCG by a WG chaired and largely manned by WP8 people) n General (authorisation,login,browse resources) 4 use cases n Data Management (metadata and data operations) 19 use cases n Job Management (submission,control,monitoring,errors, 16 use cases s resource estimation, job splitting…….) n VO Management (resource reservation,user rights, 4 use cases software publishing…). EDG 1.4.3 satisfies use cases for a basic system(authorisation/authentication, data handling, job submission).EDG 2 will satisfy more advanced requirements e.g. data handling (metadata) and HEP data transformation.There are other areas for discussion e.g. virtual data, experiment s/w publishing u This work to continue within EDG and LCG u In ATF regular scenario playing for use cases to check existing and future design

5 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 5/17 Evaluations by Loose Cannons u The Loose Cannons have been involved in n Functionality and stress testing n Middleware debugging campaigns n Configuration and testing of Storage Elements and Virtual Organisations n Data Challenges of the ATLAS and CMS experiments n Integration Team and Architectural Task Force

6 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 6/17 Data Challenge work with Atlas u Purpose of the evaluation n Verify the use of EDG middleware for Atlas Data Challenges (DC) n Verify the portability of Atlas simulation code to a grid environment u Specific Goals n Compare results with those obtained without the Grid n Make prioritised list of recommendations to EDG for bug-fixes and future developments in an evaluation report u Organization n Joint Atlas/EDG/LCG effort u Resources used (and functions) n Sites: CERN, RAL, Lyon, Nikhef, CNAF + Karlsruhe n Several UIs: Milan, CERN, Cambridge n RB: CERN n RC: Originally shared with CMS. Later a separate one at CNAF.

7 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 7/17 Atlas evaluations (August and Dec/Jan) (DETAILED PAPER IN PREPARATION) u RESULTS n Atlas software was used in the EDG Grid environment n Several hundred simulation jobs of length 4-24 hours were executed, data was replicated using grid tools n Results of simulation agreed with ‘non-Grid’ runs u OBSERVATIONS n Good interaction with EDG middleware providers and with WP6/8 n With a very big effort it was possible to run the jobs n Showed up bugs and performance limitations (fixed or to be fixed in TB 2) s WP1 Many Long Jobs’ failed (now much better) s WP2 Replication Tools were difficult to use and not reliable s WP3 Information Service based on MDS gave poor performance (affected WP1) s WP4 We need to separate out application and system software installations n We need TB2 release for use in large scale data challenges u RECOMMENDATIONS (see combined ATLAS/CMS recommendations…)

8 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 8/17 Data Challenge work with CMS u Purpose of the “stress test”: n Verify the use of EDG middleware for CMS Production n Verify the portability of the CMS Production environment to a grid environment u Specific Goals n Aim for as many simulated events as possible for physics with 1000’s of ‘short’ event generation and ‘long’ detector simulation jobs using full production system n Measure performance, efficiencies and reasons for job failures n Aim for a stable system by bug fixing and the reconfiguration of components u Organization n This was a joint effort involving CMS, EDG, EDT and LCG people u Resources used (and functions) n Sites: CERN, RAL, Lyon, Nikhef, CNAF + Legnaro, Padova, Ecol. Poly, IC n UIs: CNAF, Padova, Ecol. Poly., IC n RBs: CNAF (CMS), CNAF (shared), CERN (CMS), IC(CMS+Babar) n RC: Originally shared with Atlas. Later a separate one at CNAF.

9 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 9/17 SE CE CMS software CMS production components interfaced to EDG middleware BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS CMS production tools on UI: job creation, job submission and monitoring CMS software (rpm-based) installed on CEs/WNs Replica Manager CE CMS software CE CMS software CE WN SE CE CMS software SE

10 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 10/17 CMS use of the EDG TB (some statistics) CEsSEs Nb. of evts time Events Production within EDG as part of the Official CMS production http://cmsdoc.cern.ch/cms/production/www/html/general/index.html

11 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 11/17 CMS/EDG Summary of Stress Test Short jobs Long jobs After Stress Test – Jan 03

12 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 12/17 Main results, observations and recommendations from CMS work (detailed doc in preparation) u RESULTS n Could distribute and run CMS s/w in EDG environment n Generated ~250K events for physics with ~10000 jobs in 3 week period u OBSERVATIONS n Were able to quickly add new sites to provide extra resources n Fast turnaround in bug fixing and installing new software n Job efficiency has grown from ~60% to currently more than 80% (much better for short jobs (secs) than long jobs (hours) ) n Test was labour intensive (since software was developing and the overall system was fragile) s WP1: At the start there were serious problems with long jobs - recently improved s WP2: Replication tools were difficult to use and not reliable, and the performance of the Replica Catalogue was unsatisfactory s WP3: Limitations in Information System based on MDS: performed poorly with increasing query rate s System sensitive to hardware faults and site/system mis-configuration s User tools for fault diagnosis are limited n Testbed 2 should fix the major problems, providing a system suitable for full integration in distributed production

13 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 13/17 Joint recommendations from Atlas/CMS work u There are essential developments needed in n Data Management (robustness and functionality) n Information Systems (robustness and scalability) n Workload Management (scalability for high rates, batch submissions, output file specification) n Mass Storage Support (gridified support due in Version 2) u We must maintain and strengthen joint Experiment/EDG work in the evaluation of system components AND the architecture (both will need to evolve – GRID developments are R/D) n Once middleware providers have done their ‘unit tests’ the applications must work with them in the areas of: s Performance evaluation for the user with increasing rates of job submission and data handling, and an expanding TB configuration s Streamlining procedures for feedback to middleware providers u EDG should provide site validation and monitoring procedures u EDG should provide good user tools for fault detection and diagnosis (what is job status?, why did it fail?……..)

14 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 14/17 Some key points of work in the other experiments u ALICE n Developed scripts for the installation of ALICE software on EDG CEs n Developed a web interface to automatically submit jobs to the testbed and evaluate its "efficiency" (currently in use) n Current development of the AliEn/EDG interface: s Able to send jobs to EDG via AliEn s Completing the tests for registering/accessing data on/from both catalogues (AliEn and EDG), which is required for interoperability u LHCb n Consolidation of basic job submission capability (demonstrated at the EU review, and the opening of the National E-science Center, Edinburgh) n Made RPMs for the LHCb environment n Included DataGrid in the new LHCb distributed production system (DIRAC) and demonstrated that short DataGrid jobs can be submitted and managed via DIRAC

15 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 15/17 u Babar n Deployment of the BaBar VO: s VO and RC at Manchester, RB at IC s CE/SE/WN at SLAC, In2p3, RAL and Ferrara n Deployment and adaptation of EDG software at SLAC (the EDG scripts had to be modified for the WN inside the Internet Free Zone) n Successfully tested BaBar analysis and simulation jobs within the EDG framework. n Next step is to run real full scale analysis on the Grid. u D0 A D0 Replica Catalogue and VO server have been set up at Nikhef. A 124 CPU farm at NIKHEF has been successfully used with EDG s/w. D0 support was added to the official EDG release. (Several sites now support D0 jobs and have installed the RPMs) n Will try the newer releases (and true Grid production) when RH 7.2 support appears.

16 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 16/17 The key content for D 8.3 ‘Testbed assessment for HEP applications’ u ‘Datagrid as an HEP production environment’ n Detailed evaluations of the Atlas and CMS Task Forces n Evaluations by other LHC experiments (Alice, LHCb) n Evaluations from non-LHC experiments (Babar, D0) u Mapping of evaluations to the ‘common use cases’ n General use cases n Data management n Job Management n VO management u A summary of lessons learned for future EDG development, and a statement of priorities for the experiments

17 WP8 Status – Stephen Burke – 30th Jamuary 2003 – n° 17/17 Planning for the 3 rd project year, and associated issues u PLANNING n Continue work with experiments using the Task Force model for Data Challenges n Complete D8.3 for end March 2003 (based on 1.4.3) n Continue architecture work in ATF, and participate in LCG use case/architecture activities n Evaluate Testbed 2 software, and port to experiment software environments for use in the data challenges n Complete D8.4 by Dec 2003 (based on Testbed 2) u SOME IMPORTANT ISSUES n WP8 will work increasingly with experiments rather than doing generic testing, which will taken up by WP6 Testing Group n We must relate EDG/WP8 work to the use by experiments of the forthcoming LCG Prototype, both in terms of software, hardware and user support n Must organise detailed test sessions involving experiments and the providers of middleware for information systems, data management and mass storage handling in the context of moving to Testbed 2 n We look for improved diagnostic information from middleware in case of problems


Download ppt "WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)"

Similar presentations


Ads by Google