An RTAG View of Event Collections, and Early Implementations David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.

Slides:



Advertisements
Similar presentations
An Object/Relational Mapping tool Free and open source Simplifies storage of object data in a relational database Removes the need to write and maintain.
Advertisements

Michael Pizzo Software Architect Data Programmability Microsoft Corporation.
Foundations of Relational Implementation n Defining Relational Data n Relational Data Manipulation n Relational Algebra.
IiWAS2002, Bandung, Indonesia Teaching and Learning Databases Dr. Stéphane Bressan National University of Singapore.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Understanding Networked Applications: A First Course Chapter 15 by David G. Messerschmitt.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Fundamentals, Design, and Implementation, 9/e Chapter 16 Object-Oriented Database Processing.
10/3/2000SIMS 257: Database Management -- Ray Larson Relational Algebra and Calculus University of California, Berkeley School of Information Management.
MySql In Action Step by step method to create your own database.
2003 April 151 Data Centres: Connecting to the Real World Clive Page.
DBMS Lecture 9  Object Database Management Group –12 Rules for an OODBMS –Components of the ODMG standard  OODBMS Object Model Schema  OO Data Model.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
LINQ Boot Camp ADO.Net Entity Framework Presenter : Date : Mahesh Moily Nov 26, 2009.
Software Development Stephenson College. Classic Life Cycle.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
Summary of the Persistency RTAG Report (heavily based on David Malon’s slides) - and some personal remarks Dirk Düllmann CCS Meeting,
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
A User’s Introduction to the Grand Challenge Software STAR-GC Workshop Oct 1999 D. Zimmerman.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Data Queries Selecting features in ArcMap Data queries  Important part of a GIS project Can be a part of your data preparation or final analysis  Data.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 8 Advanced SQL.
Information Building and Retrieval Using MySQL Track 3 : Basic Course in Database.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
SkimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002 skimData based replica catalogue RLS (Replica Location.
ATLAS Offline Database Architecture for Time-varying Data, with Requirements for the Common Project David M. Malon LCG Conditions Database Workshop CERN,
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
 CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Session 1 Module 1: Introduction to Data Integrity
A Flexible Distributed Event-level Metadata System for ATLAS David Malon*, Jack Cranshaw, Kristo Karr (Argonne), Julius Hrivnac, Arthur Schaffer (LAL Orsay)
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Selecting features in ArcMap
Object storage and object interoperability
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Some Notes on Logical File Names and Related Interfaces David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.
10 1 Chapter 10 - A Transaction Management Database Systems: Design, Implementation, and Management, Rob and Coronel.
Summary of persistence discussions with LHCb and LCG/IT POOL team David Malon Argonne National Laboratory Joint ATLAS, LHCb, LCG/IT meeting.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Issues Peter Chochula 7 th DCS Workshop, June 16, 2003.
D. Duellmann, IT-DB POOL Status1 POOL Persistency Framework - Status after a first year of development Dirk Düllmann, IT-DB.
POOL Collections and their Relational Incarnations, with Notes on Performance David M. Malon LCG Applications Area Meeting Geneva, Switzerland 24 September.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Generating XML Data from a Database Eugenia Fernandez IUPUI.
Evaluation of the C++ binding to the Oracle Database System Dirk Geppert and Krzysztof Nienartowicz, IT/DB CERN IT Fellow Seminar November 20, 2002.
POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Introduction The concept of a web framework originates from the basic idea that every web application obtains its foundations from a similar set of guidelines.
(on behalf of the POOL team)
Database Systems: Design, Implementation, and Management Tenth Edition
POOL: Component Overview and use of the File Catalog
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Translation of ER-diagram into Relational Schema
Metadata Framework as the basis for Metadata-driven Architecture
Contents Preface I Introduction Lesson Objectives I-2
Database Systems: Design, Implementation, and Management Tenth Edition
Presentation transcript:

An RTAG View of Event Collections, and Early Implementations David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002

Event Collection Types n Implicit (by containment): –“whatever events are in this file, or in this sequence of files” n Explicit: –“this specific list of events” –may consist of selected events from this file, a few from that file, … n Project should support both kinds of collections n RTAG also proposed that explicit collections be queryable (like tag databases)

Implicit collections n Assumption: Implicit collections are created implicitly--simply writing a sequence of events results in an event collection n Providing the file or (file list) as input should suffice to support iteration n Collection may be named and cataloged, but this is an additional, optional, step n Input event iterator interface should be the same as for explicit collections

Explicit collections n Equivalent to a list of “pointers” to events n Because the project will deliver externalizable Refs encapsulating possibly technology-specific persistent addresses, RTAG proposal is an implementation that uses these Refs n Explicit collections behave like a variable-length list of Refs to experiment-specific event entry points n Collections are created explicitly, either by experiment’s framework or by user: Ref to event entry point is inserted into a collection n Natural starting point for an input interface, then, is something like an STL input iterator n Is this a reasonable common input interface for both explicit and implicit collections?

Queryable collections n In an explicit collection, optional “tag” data may also be associated with an event n Ref + tag data are inserted into an explicit collection n Mental model is of a relational table whose columns are the tag attributes, with one column containing a Ref to the event n Propose to implement explicit collections both in the ROOT layer and in the relational layer n Resulting collection would allow preselection of events of interest, but could also be used directly for analysis n An architectural view is that tags are data exported from the event (event-level metadata)

Tag specification interface n Tag definition requires provision of a list of property names and types n Many projects and technologies have proposed interfaces (generic tags, addItem methods for ntuples, SQL CREATE TABLE syntax, XML, …) n Specific choice is perhaps not so important, but interface should be the same for event tags, collection metadata, file metadata, … n Propose to support types in the approximate intersection of MySQL types and ROOT types

Query interface n Should not invent a new query language n Propose a strict, very limited subset of SQL-xx, likely to be supportable in most technologies –Predicates applied to single tags –Predicates are boolean combinations of range queries on attributes –Comparison operators (, =, !=, …) applied to single attribute (column) names—no arithmetic –Logical operators (AND, OR, NOT), and grouping ()

Collection services for the common project n Early project implementation of collections allows us to support a user view of input/output specification at a level “higher” than files, even when our implementation is file-based n Relational implementations allow the project to do early prototyping of facilities that make nontrivial use of relational capabilities: querying, indexing, server-side selection, …

A sample scenario n A production job produces an explicit (tag) collection, instantiated in a ROOT file n N production jobs produce N such files n A concatenation step produces an explicit union of these tag tables to create a SINGLE relational table, which is indexed to support fast SQL predicate-based selection

Collection-level operations n Collection creation, naming, registration n Subset selection (satisfying SQL predicate) n Copying –E.g., from one technology to another n Unions—declarative, and explicit concatenation n …other collection operations and services as required in later releases

Optional potential extensions n Most physics processing is “for each:” “for each event that satisfies my condition, do …” –Order is unimportant, as long as all qualifying events are processed n A U.S. Grand Challenge Project in support of the STAR experiment delivered order-optimized iteration n Simplified view: –sort events that satisfy your condition into groups according to the file(s) they are in –deliver first the groups of events whose files are already cached –initiate prefetching of files for events whose data are not already in the cache (remote? On tape?) n Step one should be “easy” if we can map Refs to file ids; the rest can come later (…or not…)

Volunteering… n Argonne/ATLAS is prepared to volunteer... –to deliver initial implementations of event collections and collection services in a reasonable timeframe (this summer?) –to ensure that deliverables meet reasonable requirements of the four experiments –to do prototyping/benchmarking of relational capabilities (indexing, …) –to partner with others with similar interests n It is clear that many others have done related work (LHCb, BaBar, ROOT team, IT/DB, DESY, …); we are interested partly because the work is valuable to us-- ATLAS has not done tag database prototyping to date