Presentation on theme: "22nd January 2003Tim Adye1 Summary of Bookkeeping discussions at RAL Workshop Tim Adye Rutherford Appleton Laboratory Kanga Phone Meeting 22 nd January."— Presentation transcript:
22nd January 2003Tim Adye1 Summary of Bookkeeping discussions at RAL Workshop Tim Adye Rutherford Appleton Laboratory Kanga Phone Meeting 22 nd January 2003
Tim Adye2 RAL Workshop Two half-day parallel sessions Monday afternoon: presentations from Adil, Jean-Yves, Andy, Alessandra, Gregory, Alessio, and myself Tuesday afternoon: discussion Joined by the other parallel (event store) at the end See presentations here Computing/Distributed/workshops/Jan2003/ I summarise the Tuesday discussion session Andy took the minutes, so these notes are just my own memory/interpretation Andy should send out notes tomorrow
22nd January 2003Tim Adye3 CMWG2 recommendations Many CMWG2 recommendations. One was that we develop a general framework for dataset management Persuasively presented by Gregory Generic enough to be of interest to other experiments? We should try to work with others (and recruit effort!), but BaBar should lead (due to our shorter timescales) Hopefully this can be built “on top of” SkimTools.
22nd January 2003Tim Adye4 Technical decisions Will start new SkimTools package, borrowing code from the old. Decided to support only Oracle and MySql, but encourage people to maintain ODBC compliance wherever possible. Stick to Perl wherever possible.
22nd January 2003Tim Adye5 Planning Decisions Identified 3.5 FTE ~0.5x7 FTE: Alessandra, Douglas, Jacek, Antonio, Martino, Paul Jackson, Tim Two stage plan (can go in parallel): (Stage 0: immediately-required changes existing SkimTools) Stage 1: new SkimTools to handle immediate requirements of new model and user requests Come up with use-cases in each area: Alessandra: skimData Tim: Data distribution … Stage 2: CMWG2’s dataset management framework
22nd January 2003Tim Adye6 File size considerations (1) It would be very useful to try to maintain reasonably large file sizes More efficient for analysis job access Simpler for archiving Archiving: mass-store systems (HPSS etc) have problems with too many files: catalogue problems too small files: overhead per GB is larger
22nd January 2003Tim Adye7 File size considerations (2) Figure of merit ~200 MB If many files smaller than this, then we would need to start blobbing files together (eg. with tar) for HPSS This is not trivial to manage Should be able to merge runs for SP and skims Most OPR output files should be > 200MB Teela agreed to make a ballpark estimate to check this Hope to hold off implementing mass-store blobbing until needed System must allow for the possibility of introducing it later