Presentation is loading. Please wait.

Presentation is loading. Please wait.

Your university or experiment logo here BaBar Status Report Chris Brew GridPP16 QMUL 28/06/2006.

Similar presentations

Presentation on theme: "Your university or experiment logo here BaBar Status Report Chris Brew GridPP16 QMUL 28/06/2006."— Presentation transcript:

1 Your university or experiment logo here BaBar Status Report Chris Brew GridPP16 QMUL 28/06/2006

2 Outline 3 BaBar Grid Projects: –Monte Carlo (Simulation) Production –Skimming –User Analysis easyGrid bbrbsub Overall experience with the Grid Conclusion

3 Usual Guff BaBar is a running experiment, Situated at SLAC near San Francisco e + e - collider tuned to investigate CP Violation in B Physics Started taking data in 1999/2000 currently has 350 fb -1 of data Projected to have 1000 fb -1 by end of 2008

4 Data Flow Tier 1 (RAL) Tier 2s Tier 0 (SLAC) Large Tier 2s Tier 1 (RAL) Simulation Production Skimming Analysis Merging

5 Simulation Production Running at M/Cr, RAL, RALPP and B'ham –Tests at Lancs, Oxford + others –Still working to add other BaBar Sites –Limited by need to install Objy DB at each site Stable running: 500,000,000 Events Produced, 12% of worldwide total. New R-GMA Based job monitor: Status query down from 45 minutes to 5 minutes Recent hiatus due to bugs found in BaBar simulation code which caused a global halt. Production has recently restarted C. Brew, G.Castelli


7 Skimming New Grid Project: Process real and simulated data to select ~200 subsamples, defined by the BaBar physics analysis working groups. –Much quicker to run over skim than full data sample –Skimming includes physics analysis code and saves the results, so CPU time spent in skimming is regained many times over Plan is to run at one or more large T2s. If we can get this into production we should be able to recover some of the UKs Common Fund rebate weve lost due to lack of T1 Resources GridPP has funded three months of effort from Will Roethel to further this work G.Castelli, W. Roethel, C. Brew

8 Status of Skimming Prepare code to be installed on gridDone Modify BaBar framework to read data out of dCache and RFIO Working, starting load and stability testing Develop tools for copying and managing data on Storage Elements Under development (PHeDEx?) Integration with BaBar Task Management software Task DB CreationDone Task List CreationWorks Job CreationWorks Local Job SubmissionWorks Grid Job SubmissionWorks Job MonitoringIn progress, should be able to reuse code from SP Tools Job Recovery Job Output CheckingIn progress Data MergingNot Started

9 User Analysis (easyGrid) Prototype running on Manchester Testbed testbed (80 CPUs) since Nov/2005 without problems. Real analysis with real data by real users that knows nothing about grid. No errors in Easygrid job submission. No errors in grid testbed due to installation configuration and improvements. J. Werner

10 Many problems encountered moving from Testbed to Production Grid Resources –errors in RB, CE, etc - 10% of time with less then 4 jobs/second submission rate. –errors in BDII, SE, dcache. SE fails 40% of jobs (less then 100 jobs in parallel). –when SE works, performance is terrible (approx. 8 times more time to run same software). –lack of response to problems from site admins. Serious issue for a typical user analysis which is about 2000 8 CPU hour jobs Product development will be resumed when resources are available and reliable. Meanwhile, EasyGrid prototype and M/Cr testbed will attend users For more information:

11 User Analysis (bbrbsub) Integration of Simple Job Manager + bbrbsub with Grid Submission Take the tools already used by analysis users to submit jobs at RAL Transparently add RAL -> RAL grid submission Add RAL -> M/Cr and M/Cr -> RAL submission capabilities Add RAL -> RALPP and M/Cr -> RALPP Gradually build up full grid functionality –Application transport and configuration –Automatic output recovery –Job to data matching G. Castelli

12 Overall Grid Experience Grid is still not reliable (worst test run): SP running seems to indicate that Grid isn't getting more reliable and may be getting less so, long term efficiency stuck around 80%: –RB Problems (have capability of multiple RB use but efficiency drops because of lack of fail over) –Central LFC problems –BDII problems - Sites drop in and out of bdii –SE Problems - Files randomly don't up/download Could run for 1-2 weeks at a time with minimal intervention, now seems to need daily (or more) interventions RAL to RAL Successful Job Rate GridPBS <50%>99%

13 Conclusions BaBar has made good progress on moving its three main offline compute intensive processes to the Grid Monte-Carlo generation is in production, significant progress has been made in skimming and user analysis There are many things we like about the grid We are adapting the BaBar software framework to integrate better with the grid, the dependence on Objectivity will be removed and we are adding the ability to read data directly from Storage Elements However, reliability and ease of use are still big issues

Download ppt "Your university or experiment logo here BaBar Status Report Chris Brew GridPP16 QMUL 28/06/2006."

Similar presentations

Ads by Google