Presentation is loading. Please wait.

Presentation is loading. Please wait.

DataGrid is a project funded by the European Commission EDG Conference, Barcelona, May 12-15 2003 under contract IST-2000-25182 Technical Status of the.

Similar presentations


Presentation on theme: "DataGrid is a project funded by the European Commission EDG Conference, Barcelona, May 12-15 2003 under contract IST-2000-25182 Technical Status of the."— Presentation transcript:

1 DataGrid is a project funded by the European Commission EDG Conference, Barcelona, May 12-15 2003 under contract IST-2000-25182 Technical Status of the Project Bob Jones (DataGrid Deputy Project Manager, WP 12) https://edms.cern.ch/document/384120

2 EDG Conference, Barcelona, May 12-15 2003 – n° 2 Talk Outline u WP Management Responsibilities u Applications status u Testbeds status u EDG 2.0 release status u Release procedure review u Quality Group u Architecture Group u Summary

3 EDG Conference, Barcelona, May 12-15 2003 – n° 3 WP Management Responsibilities n WP1: Francesco Prelz (Deputy Massimo Sgaravatto) n WP2: Peter Kunszt (Deputy: Gavin McCance) n WP3: Steve Fisher (Deputy: Laurence Field) n WP4: Maite Barroso Lopez (Deputy: German Cancio Melia) n WP5: John Gordon (Deputy: Jens Jensen) n WP6: Francois Etienne (Deputy: Charles Loomis) n WP7: Franck Bonnassieux (Deputy: Peter Clarke) n WP8: Frank Harris (Deputy: one per experiment) n WP9: Luigi Fusco (Deputy: Julian Linford) n WP10: Vincent Breton (Deputy: Johan Montagnat) n WP11: Maurizio Lancia (Deputy: Mauro Draoli) n WP12: Fabrizio Gagliardi (Deputy: Bob Jones) Technical coordinator: Erwin Laure

4 EDG Conference, Barcelona, May 12-15 2003 – n° 4 Applications Status u Intense usage of application testbed in 2002 and early 2003 by applications to produce data for their deliverables (D8.3,9.3,10.3) n WP8: 5 HEP experiments have used the testbed s ATLAS and CMS task forces very active and successful n Several hundred ATLAS simulation jobs of length 4-24 hours were executed & data was replicated using grid tools n CMS Generated ~250K events for physics with ~10,000 jobs in 3 week period n Since project review: ALICE and LHCb have been generating physics events n WP9: EarthObs level-1 and 2 data processing and storage performed n WP10: Four biomedical groups able to deploy their applications n First Earth Obs site joined the testbed (Biomedical on-going) u Application Working Group n Re-established, chaired by Vincent Breton (WP10), and held its first meeting on March 20 th in Amsterdam n Working to produce a report describing the joint prioritized list of requirements on the basis of Deliverables 8.3, 9.3, 10.3 (PM30) n See following slides for overview of feedback n AWG will give a plenary talk on Thursday

5 EDG Conference, Barcelona, May 12-15 2003 – n° 5 Joint recommendations from Atlas/CMS work u There are essential developments (see EDG 2.0) needed in n Data Management (robustness and functionality) n Information Systems (robustness and scalability) n Workload Management (scalability for high rates, batch submissions,output file specification) n Mass Storage Support (gridified support due in EDG 2.0) u We must maintain and strengthen joint Experiment/EDG work in the evaluation of system components AND the architecture (both will need to evolve – GRID developments are R/D) n Once middleware providers have done their ‘unit tests’ the applications must work with them in the areas of: s Performance evaluation for the user with increasing rates of job submission and data handling, and an expanding TB configuration s Streamlining procedures for feedback to middleware providers u EDG should provide site validation and monitoring procedures u EDG should provide good user tools for fault detection and diagnosis (what is job status?, why did it fail?……..) F.Harris WP8 report EDG review Feb 2003

6 EDG Conference, Barcelona, May 12-15 2003 – n° 6 Objectives and achievements in Y2 u EDG TestBed evaluation n Data Replication s Level-1 orbit data distributed in 5 SEs (CERN, CNAF, LYON, NIKHEF, RAL) s 4,700 15 Mb files (70.5 Gb) replicated – over 10,000 entries in EO RC n Data Processing s Tests were made submitting jobs to process Level-1 data in batches of 1..10..20..50 and 100 orbits at a time s Results strongly depends on Testbed stability – thousands of jobs submitted n Sometimes the TB performed very well (ALL jobs completed successfully) n Other times varying success rates obtained (e.g. 10..30..80% jobs successful) n Level-2 Products s Resulting products stored in CloseSEs - 4, 205 orbits successfully processed s Physical locations on SEs registered in Replica Catalog using WP2 middleware s Product Metadata stored in EO Spitfire catalogues n Level-2 Product Retrieval and Validation s Integration of Testbed and EO components to produce an “End-to-end” GOME Processing and Validation chain has been successfully demonstrated u Significant feedback to Developers, ITEAM & ATF (full details in D9.3 report) L.Fusco WP9 report EDG review Feb 2003

7 EDG Conference, Barcelona, May 12-15 2003 – n° 7 Difficulties in addressing the key challenges u …… u Scalability/Robustness problems on testbed1 n Replica Manager scalability issues n Scalability issues when stressing the Res. Broker n Testbed instability related to info. service u Need to make a very large scale test this year n Competition for testbed resources - HEP applications are often the highest priority u Support for MPI-based job submission is vital for some applications u Programmable APIs to access middleware services u Support by the Storage Element for medical databases u Ports to RH7 and 8 needed Strong expectations on EDG 2.0 components V.Breton WP10 report EDG review Feb 2003

8 EDG Conference, Barcelona, May 12-15 2003 – n° 8 Application Testbed Status u Application testbed n Many incremental improvements between EDG 1.2 and EDG 1.4.x currently deployed n Steady increase in the size of the testbed until a peak of approx 1000 CPUs at 15 sites n Current limitations on usage: s Job submission: maximum of 50 concurrent jobs per RB s Storage: use no more than 50% of the shared storage at any particular site s Sandboxes: input and output sandboxes should only be used for small amounts of data n Reduced CERN participation s Services smoothly migrated to other sites – deployment expertise is growing! n The EDG 1.4.x software is frozen s The testbed is supported and security patches deployed but effort has been concentrated on producing EDG 2.0

9 EDG Conference, Barcelona, May 12-15 2003 – n° 9 Other Testbeds Status u Development testbed n Being used for the integration of EDG 2.0 u Certification testbed (LCG) n CERN site being used for testing EDG 2.0 u Dissemination testbed n Running EDG 1.4 n Created for dissemination demos and tutorials n See presentation and demo on Wednesday (open day) Declared intention is to make a joint EDG/LCG production testbed using EDG 2.0 n The status is currently pending on the deployment of EDG 2.0 and LCG planning n The plans will be clarified during this week

10 EDG Conference, Barcelona, May 12-15 2003 – n° 10 Current Use of Testbeds WPs add unit tested code to CVS repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build system Certification (**) Testbed ~40cpu Production (*) Testbed ~1000cpu WP specific (*) machines Certified public release for use by apps. 24x7 (**) Build system Test Group WPs Bugzilla anomalies reports Unit Test Build CertificationProduction Users Development (*) Testbed ~15cpu Individual WP tests Integration Team Integration Office hours Overall release tests Tagged package Tagged release selected for certification Releases candidate Tagged Releases Releases candidate Certified Releases Certified release selected for deployment Appl. Representatives (**) with LCG (*)Current infrastructure

11 EDG Conference, Barcelona, May 12-15 2003 – n° 11 EDG 2.0 release u Intended content based on feedback from the application groups addressing highest priority issues port to RH 7.3, Condor 6.4 & Globus 2.2 delivered via VDT fixes bugs and means we can be compatible with other grid projects new data mgmt tools: overcome scalability issues with basic replica catalog new information service overcome scalability & performance issues found with MDS 2 inter-operability with US grids via use of GLUE schema improved fabric mgmt & network monitoring tools more scalable and reliable resource brokering and job submission chain MPI support and job check-pointing new common security features across most of the middleware new backward compatible fine-grained authentication mechanism (VOMS) new storage element service providing basic, consistent interface to disk and mass storage systems The successful integration and deployment of a stable EDG 2.0 is the highest priority for the project Progress on release plan constantly updated https://edms.cern.ch/document/333297 Reviewed at the weekly WP mgrs meeting and followed-up at the ITeam meeting

12 EDG Conference, Barcelona, May 12-15 2003 – n° 12 EDG 2.0 release progress I u Port EDG 1.4 to RH 7.3 & LCFGng – Jan 2003 2 nd project review Feb 4 & 5 u Port EDG 1.4 to Globus 2.2 & Condor 6.4 – Feb 2003 n Globus & Condor representatives present at CERN n First use of VDT packaging of Condor & Globus n Issues with GSI and Replica Catalog meant could not run tests to verify port was successfully completed at the end of this slot n Old Data Mgmt tools dropped and GSI issues investigated in parallel u Start integration of new EDG software – March 2003 Good support received from VDT but due to time constraints we had to start integrating new software without having a tagged and fully working release running on RH7 with GT2.2 & Condor 6.4

13 EDG Conference, Barcelona, May 12-15 2003 – n° 13 u Dates at which new software was introduced n R-GMA – March 3 n RLS – March 10 n Storage Element – March 17 n Network Cost function – March 24 u Updates to VDT and fixes for problems found in earlier slots continued in parallel u Tagged intermediate versions given to LCG for testing n March 18, April 7, 11 u Software now frozen – only bug fixes to be added n VOMS and accounting will be included after a stable EDG 2.0 has been deployed All software received but installation, configuration and cross-WP integration issues are currently blocking deployment and testing activities EDG 2.0 release progress II n Reptor – March 31 n RMS – April 7 n Update GLUE schema – April 14 n Resource Broker – April 24

14 EDG Conference, Barcelona, May 12-15 2003 – n° 14 Release Procedure Review I u Were not able to finish each slot with a working version of the software n Due to different reasons – some are shown below n Clearly waiting an unknown amount of time for a fix and yet trying to keep to the schedule has put a lot of pressure on the plan u Autobuild now used n All software for EDG 2.0 is in CVS n Nightly builds and build-on demand facilities heavily used s Any service failures cause a delay in verifying fixes n Far less build related issues reported by ITeam n Support for this essential service is being increased by WP6 u Bugzilla now used more systemically n Shown to work between ITeam, LCG and VDT n Need to be more strict: only make changes for registered bugs and nothing else

15 EDG Conference, Barcelona, May 12-15 2003 – n° 15 Release Procedure Review II u Deadlines and entrance criteria for allocated slots not always respected n Many configuration and installation problems found by ITeam and LCG s Make more use of test LCFGng set-up s Doing a full installation (not just an upgrade) for candidate tags will catch more faults n Software and documentation not always complete n Unit testing not sufficient – less than defined in WP specific test-plans n More effort needs to be dedicated to integration & deployment activities by mware WPs n Still need to do more for auto-install and auto-testing u Scheduled release plan n Very tight schedule with no contingency n Tried to allow for public holidays during the period s But some national holidays were not taken into account Some procedure changes already made but more will be introduced based on experience and feedback gathered at this conference

16 EDG Conference, Barcelona, May 12-15 2003 – n° 16 Quality Group u The Quality Group (QAG) was created in August 2002 with Quality representative (QAR) from each WP. The QAR ensure the measures are applied inside his/her WP. Chaired by Gabriel Zaquine. u http://www.eu-datagrid.org/QAG/ http://www.eu-datagrid.org/QAG/ u The Quality Group has produced an EDG developers guide document n The document gives an overview of the tools available and conventions to be followed for the software development within EDG: s Packaging - Code Management – Automatic Build system - Environment - Interfaces and API's - Documentation s Test and validation process - Integration procedure - Style and naming conventions n http://edms.cern.ch/document/358824 http://edms.cern.ch/document/358824 u Work on EDG 2.0 shows that conventions are not yet being followed by everyone All developers must read this document and ensure their software complies

17 EDG Conference, Barcelona, May 12-15 2003 – n° 17 Architecture Group u ATF has been working to clarify the details of the interactions and interfaces of EDG 2.0 n Continues to meet on a monthly basis http://agenda.cern.ch/displayLevel.php?fid=3l148 http://agenda.cern.ch/displayLevel.php?fid=3l148 n Work driven by use cases provided by the application representatives n A document describing the architecture for EDG 2.0 has been produced: https://edms.cern.ch/file/368971/https://edms.cern.ch/file/368971/ u ATF has been further empowered to “own” the external interfaces n Intended to avoid discrepancies between the interface details agreed by ATF and those found in the software delivered by the mware WPs s Baseline document with interface definitions now in preparation Mware WPs please make sure ATF have the APIs for your external interfaces

18 EDG Conference, Barcelona, May 12-15 2003 – n° 18 Summary u The re-focussing of the project on production has improved the manner in which we build and support the software u The application testbed has reached the highest level of maturity that can be achieved using the currently available grid middleware and supporting manpower u We have well defined guidelines and procedures for the most important aspects of our work n The hard point is to follow them – but we must do this systematically u Points to be addressed at this conference n Do everything possible to ensure a stable EDG 2.0 is deployed this month n Plan the large-scale deployment and support of EDG 2.0 through the summer n Further improve our software process in preparation for September n Clarify future usage and support of the testbeds


Download ppt "DataGrid is a project funded by the European Commission EDG Conference, Barcelona, May 12-15 2003 under contract IST-2000-25182 Technical Status of the."

Similar presentations


Ads by Google