Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dave Newbold, University of BristolGridPP Middleware Meeting ‘Real World’ issues from DC04 DC04: Trying to operate the CMS computing system at 25Hz for.

Similar presentations


Presentation on theme: "Dave Newbold, University of BristolGridPP Middleware Meeting ‘Real World’ issues from DC04 DC04: Trying to operate the CMS computing system at 25Hz for."— Presentation transcript:

1 Dave Newbold, University of BristolGridPP Middleware Meeting ‘Real World’ issues from DC04 DC04: Trying to operate the CMS computing system at 25Hz for one month We are three days in! We are using components that are ready NOW Even if it’s not politically correct Often using several different approaches for comparison This talk: concentrates on data management issues ‘Real World’ issues that have come up during DC04 preparation Stuff that is not (yet) well covered by the available tools I know that… Some issues may be application problems, not middleware ones Some issues may be covered by components under development Some issues may be self-inflicted injuries

2 Dave Newbold, University of BristolGridPP Middleware Meeting Directed data transfer Data management ‘type I’: replica management The (automatic?) movement of data products to where they are needed; managing relevant system and application metadata Best-effort optimisation of data location in response to dynamic workload needs Well-covered by current and future middleware Data transfer ‘type II’: bulk data management The predictable straight(ish) ‘production line’ of data flow Detector -> DAQ -> Buffer -> Reco farm -> T1 -> MSS -> calib -> … Requirements are different to replica management Robustness and reliability paramount (raw data is the ‘crown jewels’) Throughput is very important: ‘best effort’ is not good enough Not explicitly addressed by current middleware products Data distribution is explicitly ‘directed’ by policy ‘Seeds’ the replica mangement system from the Tier-1’s.

3 Dave Newbold, University of BristolGridPP Middleware Meeting Directed data transfer Our current solution Cooperating system of simple ‘agents’ at Tier-0 and Tier-1 They communicate only through a shared (Oracle) DB They have little or no state - it’s all held in the central DB Could this be useful as generic middleware? Other related issues: Lack of a single consistent interface to MSS (in Europe and US) makes life difficult (being addressed?) There are very many failure modes in the data management system that we must think of… Would be good to factorise out the problems of failing storage components by having the MSS ‘remap’ our data when required Predict at least one disk failure per day somewhere in DC04

4 Dave Newbold, University of BristolGridPP Middleware Meeting Data transfer tools Need low-level transfer tools that: Log what is going on! (We have ad-hoc solutions here for DC04) Adjust policy automatically for optimum throughput according to network conditions Fail gracefully when something is wrong at an end-point Play nice with firewalls, etc NB: performance is not currently the problem, but the tools are… Checksumming We would like a system that performs fast file-level checksum of data ON THE DISK No, TCP checksum does not catch all errors Silent disk problems, filesystem errors, NFS problems, etc etc Checksumming data from MSS after-the-fact is very difficult Would also like: Some SIMPLE means of distributed, authenticated, atomic, reliable message-passing between agents over the Grid With a command-line level API for scripting

5 Dave Newbold, University of BristolGridPP Middleware Meeting Other issues… Small files! They seem to be inevitable, but play havoc with efficiency: Huge lists of files in catalogues Not dealt with efficiently by MSS, transfer tools, etc Basic unit of information management: data produced by one MC, reco, filter job during its run (with unique GUID) Do not want to make jobs too long… (too much state in the system) Can aggregation help? Perhaps, but we need the tools Metadata Currently a ‘hot topic’? How to handle efficient distribution of system- and user-level metadata? Which metadata are immutable after creation? Which need to be distributed widely? How to handle schema extension on per-user basis?


Download ppt "Dave Newbold, University of BristolGridPP Middleware Meeting ‘Real World’ issues from DC04 DC04: Trying to operate the CMS computing system at 25Hz for."

Similar presentations


Ads by Google