Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.

Similar presentations


Presentation on theme: "David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting."— Presentation transcript:

1 David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

2 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg2 Contents Warning Definitions Purpose Event data granularity –EDO –Sharing category –File –Event list –Dataset –ADB event collection ADB differences Event data space ATLAS data model Event data history EDO VDS SC or event VDS File VDS Event list Dataset VDS Conclusions

3 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg3 Warning Starting point The following is intended as a starting point for discussion Sources Opinions expressed are my own I don’t know of any ATLAS policies or conventions for virtual data There is ATLAS work in progress to use GriPhyN virtual data model for DC1

4 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg4 Definitions Virtual data Data which may be brought into existence using associated history or prehistory History Record of how data was produced Prehistory Prescription for creating data

5 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg5 Definitions (cont) GriPhyN virtual data system (VDS) Unit of data –so far file Transformation takes data units as input an produces more data units – so far an executable with formal parameters Derivation is is an application of a transformation How do we map ATLAS onto this model? –See following…

6 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg6 Purpose Record keeping History provides a record of how data was produced (event-by-event and collectively) On-demand generation If data does not exist or is not easily accessible –History can be used to regenerate data –Prehistory can be used to generate data Production Prehistory can be used to configure production

7 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg7 Event data granularity ATLAS levels of data granularity Physics object (e.g. track, jet or electron) EDO – event data object Sharing category Event File Event list Dataset

8 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg8 EDO Definition EDO is a collection of physics objects –Typically homogenous –May add some collective data such as total transverse energy An algorithm takes one or more EDO’s as input and produces one (or more) as output –Reminiscent of VDS

9 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg9 Sharing Category Definition Collection of related EDO’s with the same event ID –E.g. tracking data or high-level physics objects No sharing of EDO’s between categories? Sharing category is not shared between files

10 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg10 Event Warning Event may mean beam crossing or subset of associated data (event view) HES event view Arbitrary collection of EDO’s associated with the same event ID Scope defined by context –E.g. file or transient data store –All data (including versions) probably not useful Typically (always?) includes all contents of a well-defined set of sharing categories

11 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg11 File Current HES definition Holds EDO’s for a specified set of event ID’s Holds the same set of sharing categories for each event Sharing category or EDO may be held by value or reference EDO may be a replica

12 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg12 File (cont)

13 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg13 File (cont) Future HES definition Add history for each EDO Option to only hold history (regeneration) Include non-event data –E.g. replicas of shared history objects such as algorithms Option to hold only instruction for building (prehistory)? Drop PC’s (sharing categories)?

14 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg14 Event list Definition Collection of ID’s for events satisfying physics selection criteria –E.g. 2 or more jets, one lepton, missing ET all with energy or momentum thresholds Data versions on which selections were based Collective properties –Integrated luminosity

15 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg15 Dataset Purpose To identify the data (and hence the files) that must be gathered for a job to run Definition Event list Restriction on content (EDO type-keys) –E.g. only summary data or tracking data Versions of these EDO’s –Require consistency with selection versions? File collection(s) holding these EDO’s

16 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg16 ADB differences Event ADB event generally holds copies or references to all the event data used in its construction Not possible to combine views of an event –E.g. tracks from one and jets from another –Advantage is enforced consistency –Disadvantage is limited flexibility Event collection ADB event collection is between the event list and dataset defined here

17 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg17 Event data space

18 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg18 ATLAS data model

19 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg19 Event data history Current ATLAS model is object oriented History object for each EDO references –EDO –parents of EDO –Algorithm history object –Job history object See figure Contains complete history only if ancestor history objects are present Regeneration not possible if these are gone

20 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg20 Event data history (cont)

21 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg21 Event data history Modify to add prehistory Enable regeneration from a single event history object Replace algorithm history with algorithm history DAG (directed acyclic graph) –Include links to ancestor algorithm history objects –Requires opening the history objects of the parent EDO’s (unless these were written as part of the same job)

22 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg22 EDO VDS ATLAS data unit Having identified the ATLAS levels of granularity, we need to select which one(s) is used to define our VDS unit of data In the original GriPhyN design, the file was chosen –This is being generalized Natural choice for us is the EDO –Smallest unit of processing (?) –Smallest unit of replication (?)

23 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg23 EDO VDS (cont) EDO transformation Transformation is an Athena algorithm specified by –Parameters in jobOptions –Algorithm version Athena executable typically performs multiple transformations –Algorithm DAG Input type-keys are implicit (buried in the code) This is prehistory data

24 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg24 EDO VDS (cont) EDO derivation Specify input data (event view) –Event ID –Input EDO instances (not just type-key) –Use parent EDO histories to extend algorithm DAG’s back to the raw data Job-specific (which CPU, resources consumed,…) Combined with transformation data, these give the history for each produced EDO

25 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg25 SC or event VDS Sharing category or event (view) is a collection of EDO’s Transformations and derivations can be expressed by merging those for the constituent EDO’s Algorithm DAG’s can often be merged into a single (connected) DAG –Transformation (derivation) should express which EDO’s kept (present) Sensible to speak of SC or event VDS

26 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg26 File VDS (cont) File is also a collection of EDO’s but Do events have a common transformation? –Same algorithm histories Do events have a common derivation? –Same job –Same input file algorithm DAG’s Probably not implies VDS less useful for files However it is likely useful to keep track of the transformations and derivations used in each file

27 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg27 Event list VDS (cont) Event list Transformation Is a selection algorithm applied to each event Includes specification of the content (EDO type-keys) on which the selection is based Might include restriction on EDO versions Derivation Recorded at the event level specifies –EDO instances (normally from dataset) –Job parameters (CPU, …) Meaningful in the context of a dataset

28 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg28 Dataset VDS (cont) Dataset transformations include Algorithm DAG Event selection Event merge (new but trivial) Dataset derivation includes Input datasets Distributed job description –Full specification (e.g. CPU for a given EDO) probably requires examining EDO histories

29 David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg29 Conclusions Much work to do. This is a first pass. Most useful data units for VDS are EDO for tracking data at the event level and data regeneration Dataset for staging and tracking production and shared event selection What about files?


Download ppt "David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting."

Similar presentations


Ads by Google