Download presentation
Presentation is loading. Please wait.
Published byHubert Chandler Modified over 6 years ago
1
DCC Workshop Input from Computing Coordination
Simone Campana, Torre Wenaus Simone Campana – ATLAS Week 13/03/17
2
Simone Campana – ATLAS Week
Introduction The mandate for this talk: “We thought that this could be a blue skies talk looking at Run 3 and beyond, what does and doesn’t scale, event streaming services, etc. How could we reimagine data management and production systems interacting in the future, what granularity will we need to control data processing for HPC workflows, etc.” … and so this is (not) what I will talk about. Simone Campana – ATLAS Week 13/03/17
3
Simone Campana – ATLAS Week
Introduction We are defining the roadmap for a Run-4 computing model We like an adiabatic approach, where new concepts and components are prototyped and evaluated in the context of the existing ecosystem Such components can go to production much earlier than Run-4 or be abandoned. This is how we think we should do R&D
Data Curation and Characterization plays a central role in the model we are defining. Simone Campana – ATLAS Week 13/03/17
4
Axioms (https://en.wikipedia.org/wiki/Axiom)
The granularity of our data processing is the event: we process one event at the time. We organize events in files because this is what file systems supports, we organize files in datasets because this is practical, we organize datasets in containers to characterize data
Most of our data is “cold” data. We write it once, we access it O(10) times, peaked in time. Treating all data as equally “hot” has a cost. Treating all metadata as “hot” has a cost
Most of our data is reproducible. RAW data is not reproducible, all the rest is reproducible. “Very complicated, labor intensive, organizationally expensive, error prone to reproduce” == “irreproducible”. Treating all data as irreproducible has a cost Simone Campana – ATLAS Week 13/03/17
5
Simone Campana – ATLAS Week
Events The event service works today with fine grained data at the event (range) level. Presently this fine granularity is short-lived DCC could take on a scalable, flexible, extensible means of recording such fine grained information. (Quasi) persistently Simone Campana – ATLAS Week 13/03/17
6
Event Streaming and Caching
The Event Streaming Service would be the complement of the Event Service, for asynchronous delivery of data to be processed at fine granularity. Events or event collections A first implementation could deliver client-side pre fetch. In a more sophisticated scenario, a central intelligence mediates the dialog between the WFMS and the DDM systems, complementing the data transfer capabilities of DDM with a server side data access system It can be complemented with hierarchical caching based on data meta-information and access pattern statistics Simone Campana – ATLAS Week 13/03/17
7
Simone Campana – ATLAS Week
The DCC “whiteboard” Supplements the capability to “annotate” meta information ‘Data in play’ whiteboard: at any given time, a subset of datasets/collections is ‘in play’, in use in the system. As this data is manipulated, replicated etc. knowledge about it could be dynamically cached in a whiteboard at the event collection level The whiteboard can flexibly receive arbitrary information associated with particular event collections (eg via tags). Information can be auto-generated from the system or from users
Usage Examples Physics collections in use by an analysis group could be annotated as such in the whiteboard, with consequent special treatment Simone Campana – ATLAS Week 13/03/17
8
Simone Campana – ATLAS Week
The DCC “whiteboard” “Processing in play” whiteboard Tasks currently (or recently) in active processing could also benefit from a whiteboard. e.g. for monitoring purposes. Also, others could add to task whiteboard entries with annotations and information.
A “request level” whiteboard would have its uses also
could cross correlate prodsys requests with spreadsheets entries managed by PMG/MC prod (via programmatic API) to automate the refreshing of information on submission and processing status
Requests associated with a particular analysis group, paper or CP group could be tagged as such, and this tagging propagated through the downstream processing and data products Simone Campana – ATLAS Week 13/03/17
9
Simone Campana – ATLAS Week
The DCC “whiteboard” The definition of “in play” is flexible. Annotations can be made persistent after the entity they refer to is not “in play” anymore. If we accept the concept of hierarchical storage, we have to accept the concept of hierarchical meta-storage (different latency for different levels) How many whiteboards do we need, which technology, which architecture... You do not expect me to answer all that, right? The ‘whiteboard’ approach is an R&D/exploratory approach that gives an easy means of adding and accessing information to play with possible uses, quickly prototype an idea. Simone Campana – ATLAS Week 13/03/17
10
More ideas and use cases
DCC being an essential piece in ensuring reproducibility. Needs to be implemented consistently across components DCC as a great opportunity to reduce complexity: simple decisions like “how to call a campaign: MC16c of MC17” today are forced upon us by complexity (information being sometime disperse, eventually consistent, hard coded) Simone Campana – ATLAS Week 13/03/17
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.