Presentation is loading. Please wait.

Presentation is loading. Please wait.

DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.

Similar presentations


Presentation on theme: "DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006."— Presentation transcript:

1 DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006

2 Kaushik De 2 First – a Reminder  Panda was designed to minimally depend on robustness of external middleware  This does not apply to DDM – Panda fully depends on and takes advantage of all DQ2 capabilities  Panda was the first ATLAS executor to use DQ2 for production – 6 months before the LCG (still not fully done)  As you saw from Torre’s talk – Panda subscribes to thousands of datasets weekly, and BNL catalog holds more than a million records – leader in DQ2 deployment and use  Panda-DDM is often used as an example of success in ATLAS – keep this up!

3 September 29, 2006 Kaushik De 3 Some Open Issues  Deployment – need to do better (we have 19 installations)  You have already heard from many speakers – no more to say  Data and catalog consistency and cleanup – next few slides  Use output file callback in Panda  Performance and monitoring issues  Alexei reminded us – also need script to cleanup obsolete datasets

4 September 29, 2006 Kaushik De 4 Data Transfer Robustness  Need more robustness in DQ2 to recover from failures  Examples in Alexei’s talk (but Panda usually had fewer problems)  Important Panda issue: DQ2 should never give up on subscriptions  But don’t kill site services because of retries – tricky balancing act!  Force (email) human intervention if impossible to transfer file  … this is normal hardening process – will continue  In the meantime, production must continue  Need to increase production rate by factor of 10 by summer 2007  In addition, there will always be some unavoidable error conditions  We also need to do site cleanup of SE (cache turnover)  Also, delete old temporary Panda datasets (safely): chron script  So – some post DQ2 cleanup will always be necessary

5 September 29, 2006 Kaushik De 5 Proposal for DDM cleanup  Check and repair consistency of site catalogs  Script 1: re-register in local LRC all files found on local SE that are registered in DQ2 central catalog, but not at BNL T1  Marco is working on this script, based on scripts written by Patrick and Wensheng – need to run as chron at every site when stable  Script 2: move old missing files to BNL periodically  Chron run by Wensheng – need to define ‘old’  Script 3: safely cleanup SE space when getting full  Wensheng’s script works well – sites should take over running it  Keep log of all post-DQ2 repairs – feed back to developers so that DQ2 can improve based on real experience (feed back into monitoring?)

6 September 29, 2006 Kaushik De 6 Site Responsibilities  Sites, sites, sites!  Important difference between OSG sites and LCG sites – site mangers have always been proactive within U.S. DDM  Probably reflected in T1/T2 test results!  Need to keep this up – sites should check DDM monitor daily http://panda.atlascomp.org/?dash=prod&redirect=pandamon http://panda.atlascomp.org/?dash=prod&redirect=pandamon http://panda.atlascomp.org/?dash=prod&redirect=pandamon  Site is responsible for maintaining local storage element and keeping various services up and running  Sites should protect data in storage elements  Some of our recent DDM problems have been site specific – need help from DDM operations to help (and often fix mistakes)

7 September 29, 2006 Kaushik De 7 Output Callbacks  Converging on a solution – latest proposal by Torre  Add new Panda job state - ‘transferring’  Enable callback for output subscription blocks  Panda will change ‘transferring’ -> ‘finished’ when callback received  Pros:  Better tracking of output file transfers through Panda  Production team can identify and report problems  Cons:  Jobs may remain un-finished, even though file is available at T2 (physicist can get file through DQ2 – but job not in finished state)  Panda queue may grow very large

8 September 29, 2006 Kaushik De 8 Live Examples http://panda.atlascomp.org/?dash=prod&reload=yes


Download ppt "DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006."

Similar presentations


Ads by Google