David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Introduction to Databases Transparencies
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Event Processing Course Event processing networks (relates to chapter 6)
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
An Object-Oriented Approach to Programming Logic and Design
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 7 OS System Structure.
Next-generation databases Active databases: when a particular event occurs and given conditions are satisfied then some actions are executed. An active.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Event Data History David Adams BNL Atlas Software Week December 2001.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
9-13/9/03 Atlas Overview WeekPeter Sherwood 1 Atlfast, Artemis and Atlantis What, Where and How.
Navigation Timing Studies of the ATLAS High-Level Trigger Andrew Lowe Royal Holloway, University of London.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
ITFN 3601 Introduction to Operating Systems Lecture 3 Processes, Threads & Scheduling Intro.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Top-down approach / Stepwise Refinement & Procedures & Functions.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
UK LVL1 Meeting, RAL, 31/01/00Alan Watson 1 ATLAS Trigger Simulations Present & Future? What tools exist? What are they good for? What are the limitations?
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
1 XSL Transformations (XSLT). 2 XSLT XSLT is a language for transforming XML documents into XHTML documents or to other XML documents. XSLT uses XPath.
By ILTAF MEHDI (MCS, MCSE, CCNA) 1 Remember: Examination is a chance not ability. 6/12/2016.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
David Adams Brookhaven National Laboratory September 28, 2006
Searching Business Data with MOSS 2007 Enterprise Search
OGSA Data Architecture Scenarios
Searching Business Data with MOSS 2007 Enterprise Search
ADA aodhisto transformation
Machine Independent Features
Presentation transcript:

David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg2 Contents Warning Definitions Purpose Event data granularity –EDO –Sharing category –File –Event list –Dataset –ADB event collection ADB differences Event data space ATLAS data model Event data history EDO VDS SC or event VDS File VDS Event list Dataset VDS Conclusions

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg3 Warning Starting point The following is intended as a starting point for discussion Sources Opinions expressed are my own I don’t know of any ATLAS policies or conventions for virtual data There is ATLAS work in progress to use GriPhyN virtual data model for DC1

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg4 Definitions Virtual data Data which may be brought into existence using associated history or prehistory History Record of how data was produced Prehistory Prescription for creating data

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg5 Definitions (cont) GriPhyN virtual data system (VDS) Unit of data –so far file Transformation takes data units as input an produces more data units – so far an executable with formal parameters Derivation is is an application of a transformation How do we map ATLAS onto this model? –See following…

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg6 Purpose Record keeping History provides a record of how data was produced (event-by-event and collectively) On-demand generation If data does not exist or is not easily accessible –History can be used to regenerate data –Prehistory can be used to generate data Production Prehistory can be used to configure production

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg7 Event data granularity ATLAS levels of data granularity Physics object (e.g. track, jet or electron) EDO – event data object Sharing category Event File Event list Dataset

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg8 EDO Definition EDO is a collection of physics objects –Typically homogenous –May add some collective data such as total transverse energy An algorithm takes one or more EDO’s as input and produces one (or more) as output –Reminiscent of VDS

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg9 Sharing Category Definition Collection of related EDO’s with the same event ID –E.g. tracking data or high-level physics objects No sharing of EDO’s between categories? Sharing category is not shared between files

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg10 Event Warning Event may mean beam crossing or subset of associated data (event view) HES event view Arbitrary collection of EDO’s associated with the same event ID Scope defined by context –E.g. file or transient data store –All data (including versions) probably not useful Typically (always?) includes all contents of a well-defined set of sharing categories

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg11 File Current HES definition Holds EDO’s for a specified set of event ID’s Holds the same set of sharing categories for each event Sharing category or EDO may be held by value or reference EDO may be a replica

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg12 File (cont)

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg13 File (cont) Future HES definition Add history for each EDO Option to only hold history (regeneration) Include non-event data –E.g. replicas of shared history objects such as algorithms Option to hold only instruction for building (prehistory)? Drop PC’s (sharing categories)?

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg14 Event list Definition Collection of ID’s for events satisfying physics selection criteria –E.g. 2 or more jets, one lepton, missing ET all with energy or momentum thresholds Data versions on which selections were based Collective properties –Integrated luminosity

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg15 Dataset Purpose To identify the data (and hence the files) that must be gathered for a job to run Definition Event list Restriction on content (EDO type-keys) –E.g. only summary data or tracking data Versions of these EDO’s –Require consistency with selection versions? File collection(s) holding these EDO’s

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg16 ADB differences Event ADB event generally holds copies or references to all the event data used in its construction Not possible to combine views of an event –E.g. tracks from one and jets from another –Advantage is enforced consistency –Disadvantage is limited flexibility Event collection ADB event collection is between the event list and dataset defined here

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg17 Event data space

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg18 ATLAS data model

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg19 Event data history Current ATLAS model is object oriented History object for each EDO references –EDO –parents of EDO –Algorithm history object –Job history object See figure Contains complete history only if ancestor history objects are present Regeneration not possible if these are gone

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg20 Event data history (cont)

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg21 Event data history Modify to add prehistory Enable regeneration from a single event history object Replace algorithm history with algorithm history DAG (directed acyclic graph) –Include links to ancestor algorithm history objects –Requires opening the history objects of the parent EDO’s (unless these were written as part of the same job)

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg22 EDO VDS ATLAS data unit Having identified the ATLAS levels of granularity, we need to select which one(s) is used to define our VDS unit of data In the original GriPhyN design, the file was chosen –This is being generalized Natural choice for us is the EDO –Smallest unit of processing (?) –Smallest unit of replication (?)

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg23 EDO VDS (cont) EDO transformation Transformation is an Athena algorithm specified by –Parameters in jobOptions –Algorithm version Athena executable typically performs multiple transformations –Algorithm DAG Input type-keys are implicit (buried in the code) This is prehistory data

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg24 EDO VDS (cont) EDO derivation Specify input data (event view) –Event ID –Input EDO instances (not just type-key) –Use parent EDO histories to extend algorithm DAG’s back to the raw data Job-specific (which CPU, resources consumed,…) Combined with transformation data, these give the history for each produced EDO

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg25 SC or event VDS Sharing category or event (view) is a collection of EDO’s Transformations and derivations can be expressed by merging those for the constituent EDO’s Algorithm DAG’s can often be merged into a single (connected) DAG –Transformation (derivation) should express which EDO’s kept (present) Sensible to speak of SC or event VDS

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg26 File VDS (cont) File is also a collection of EDO’s but Do events have a common transformation? –Same algorithm histories Do events have a common derivation? –Same job –Same input file algorithm DAG’s Probably not implies VDS less useful for files However it is likely useful to keep track of the transformations and derivations used in each file

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg27 Event list VDS (cont) Event list Transformation Is a selection algorithm applied to each event Includes specification of the content (EDO type-keys) on which the selection is based Might include restriction on EDO versions Derivation Recorded at the event level specifies –EDO instances (normally from dataset) –Job parameters (CPU, …) Meaningful in the context of a dataset

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg28 Dataset VDS (cont) Dataset transformations include Algorithm DAG Event selection Event merge (new but trivial) Dataset derivation includes Input datasets Distributed job description –Full specification (e.g. CPU for a given EDO) probably requires examining EDO histories

David Adams ATLAS May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg29 Conclusions Much work to do. This is a first pass. Most useful data units for VDS are EDO for tracking data at the event level and data regeneration Dataset for staging and tracking production and shared event selection What about files?