2007-11-22Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.

Slides:



Advertisements
Similar presentations
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
Advertisements

Architecture, Deployment Diagrams, Web Modeling Elizabeth Bigelow CS-15499C October 6, 2000.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
AMI S.A. Datasets… Solveig Albrand. AMI S.A. A set is… A number of things grouped together according to a system of classification, or conceived as forming.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
11/10/2015S.A.1 Searches for data using AMI October 2010 Solveig Albrand.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Tony Doyle & Gavin McCance - University of Glasgow ATLAS MetaData AMI and Spitfire: Starting Point.
G.Corti, P.Robbe LHCb Software Week - 19 June 2009 FSR in Gauss: Generator’s statistics - What type of object is going in the FSR ? - How are the objects.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
DØ Data Handling & Access The DØ Meta-Data Browser Pushpa Bhat Fermilab June 4, 2001.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Monte-Carlo Event Database: current status Sergey Belov, JINR, Dubna.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Introduction to AFS IMSA Intersession 2003 An Overview of AFS Brian Sebby, IMSA ’96 Copyright 2003 by Brian Sebby, Copies of these slides.
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
ELSSISuite Services QIZHI ZHANG Argonne National Laboratory on behalf of the TAG developers group ATLAS Software and Computing Week, 4~8 April, 2011.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
David Adams Brookhaven National Laboratory September 28, 2006
ALICE analysis preservation
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Readiness of ATLAS Computing - A personal view
BOSS: the CMS interface for job summission, monitoring and bookkeeping
New developments on the LHCb Bookkeeping
ATLAS DC2 & Continuous production
Presentation transcript:

Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do you find out what is in a particular dataset? General answers: –Go for coffee and ask. –Web pages. –Databases. Different solutions for each experiment. Examples: –Babar –D0 –ATLAS Experiment Metadata

Joe Foster 2 Babar Bookkeeping Dave Bailey

Joe Foster 3 Why? The bookkeping system is to keep track of data produced that have successfully passed a chain of checks and are declared good to be used by users. The information is organized in datasets. The idea of a datasets is that users don't need to know about the production details - such as good and bad runs, releases. Production systems insert data directly into the bookkeeping. The information in the tables is self consistent, users shouldn't need to go and look for information from other systems. The history of each dataset is maintained. There is support for merged collections (produced from more than one run)

Joe Foster 4 How? Information held in dedicated databases –Oracle at SLAC –MySQL at sites around the world Database keeps track of data that is available and also what is on disk at each site –The “on disk” information is local to each site –Consistent user experience everywhere using perl scripts to query the database (hides SQL queries) –Structure is held in database schema (table relationships) All databases are “open access” so users at any site can query the database at another site to check the status of files and see what’s available locally

Joe Foster 5 Using the Database Important point is that ALL tools are database driven –E.g. Copying data from SLAC to Manchester Mark the data in the local database for import Import data –Process queries database to find out what to get –Updates the status of files when successfully copied Make data available to users –Once imported, data is uploaded to, in our case, xrootd on the Tier2 –Status updated in the database to reflect this Users can now query the local database to see what is available

Joe Foster 6 Details...

Joe Foster 7 Experiment Metadata in ATLAS Computing model is Grid based. Scope of this section: –Still only MC data. –Conditions and calibration databases not covered. –Finding datasets for given process, cuts. –Finding contents, software version, provenance of a dataset. Sources of information –Colleagues. –Dataset names: trig1_misal1_mc T1_McAtNlo_Jimmy.recon.AOD.v –Grid tools DQ2 –Web pages: DC3 Requests Panda Monitor Atlas Metadata Interface CSC reco datasets

Joe Foster 8 Info in Dataset Names Example: trig1_misal1_mc T1_McAtNlo_Jimmy.recon.AOD.v Logical File Name of the dataset. Convention not really standard, but usually helps. Trigger info Misalignments applied MC release 12 Run number Trigger? GeneratorsReconstructed Analysis Object Data Recons version

Joe Foster 9 Don Quijote 2 (DQ2) Line mode interface to ATLAS Grid tasks and datasets. –Search and list by logical file name of dataset. Example: dq2_ls valid1_misal1_mc * Lists dataset name, and optionally names its files + size. Doesn’t tell you what is in the files. –Download datasets for local processing. dq2_get Some sites don’t recognise ATLAS credentials yet!

Joe Foster 10 WWW: ATLAS Computing Commissioning Requests Pages for –High priority samples –Standard model + calibration –Beyond standard model + Higgs –Single Particles For each request for MC production: –Process –Category + subcategory –Cuts –Filter efficiency –Cross section –Number of events –Simulated luminosity –Generator –Data set number ( = ‘run number’). search dataset names –Requester –Link to documentation, etc. No info on individual datasets.

Joe Foster 11 PANDA Monitor PANDA: Production ANd Distributed Analysis system –System for running jobs and data access over the Grid. –Designed by US ATLAS for OSG, but now has links to many sites on LCG, NorduGrid, etc. Panda monitor is a web interface to a big database. –Task and job monitoring Subtasks at different sites Input task (provenance) Datasets produced Configuration data Status –Dataset catalog List of Logical File Names Replicas –Dataset replication (‘subscriptions’). –Fairly flexible search options.

Joe Foster 12 LFN of dataset Task name Provenance Subtask ID Configuration

Joe Foster 13 Part of Panda Dataset Listing Subtask ID

Joe Foster 14 Atlas Metadata Interface (AMI) ‘Official’ ATLAS metadata interface. –Links many data sources in flexible way. –Dataset search on properties of the data, or just dataset name. –Links from search results to, eg, provenance.

Joe Foster 15 AMI Advanced Search

Joe Foster 16 AMI Search Result

Joe Foster 17 AMI Provenance Search Result

Joe Foster 18 CSC reco datasets Lists of current and recent Computing System Commisioning (CSC) tasks for many Physics processes. Some provenance information listed too. Links to Panda monitor for subtasks. Quick and easy way to find recent data.

Joe Foster 19 CSC reco datasets pages Run numbers

Joe Foster 20 CSC reco Results for Run 5200 Links to Panda monitor for subtasks Task names