David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.

Slides:



Advertisements
Similar presentations
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Advertisements

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Java Root IO Part of the FreeHEP Java Library Tony Johnson Mark Dönszelmann
HERA/LHC Workshop, MC Tools working group, HzTool, JetWeb and CEDAR Tools for validating and tuning MC models Ben Waugh, UCL Workshop on.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Nick Brook Current status Future Collaboration Plans Future UK plans.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Event Data History David Adams BNL Atlas Software Week December 2001.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
V. Serbo, SLAC ACAT03, 1-5 December 2003 Interactive GUI for Geant4 by Victor Serbo, SLAC.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL June 7, 2004 BNL Technology Meeting.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
JAS and JACO – Status Report Atlas Graphics Group August 2000 Tony Johnson.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Chapter 29: Program Security Dr. Wayne Summers Department of Computer Science Columbus State University
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
(on behalf of the POOL team)
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Presentation transcript:

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting

David Adams ATLAS May 19, 2003 DIAL BNL technology2 Contents Goals of DIAL What is DIAL? Design Applications Results and Tasks Schedulers Datasets Exchange format Status Development plans GRID requirements

David Adams ATLAS May 19, 2003 DIAL BNL technology3 Goals of DIAL 1. Demonstrate the feasibility of interactive analysis of large datasets How much data can we analyze interactively? 2. Set requirements for GRID services Datasets, schedulers, jobs, results, resource discovery, authentication, allocation, Provide ATLAS with a useful analysis tool For current and upcoming data challenges Real world deliverable Another experiment would show generality

David Adams ATLAS May 19, 2003 DIAL BNL technology4 What is DIAL? Distributed Data and processing Interactive Iterative processing with prompt response – (seconds rather than hours) Analysis of Fill histograms, select events, … Large datasets Any event data (not just ntuples or tag)

David Adams ATLAS May 19, 2003 DIAL BNL technology5 What is DIAL? (cont) DIAL provides a connection between Interactive analysis framework –Fitting, presentation graphics, … –E.g. ROOT and Data processing application –Natural to the data of interest –E.g. athena for ATLAS DIAL distributes processing Among sites, farms, nodes To provide user with desired response time

David Adams ATLAS May 19, 2003 DIAL BNL technology6

David Adams ATLAS May 19, 2003 DIAL BNL technology7 Design DIAL has the following components Dataset describing the data of interest –Organized into events Application –Event loop providing access to the data Result Task –Empty result plus code process each event Scheduler –Distributes processing and combines results

David Adams ATLAS May 19, 2003 DIAL BNL technology8 User Analysis Job 1 Job 2 ApplicationTask Dataset 1 Scheduler 1. Create or locate 2. select3. Create or select 4. select 8. run(app,tsk,ds1) 5. submit(app,tsk,ds) 8. run(app,tsk,ds2) 6. split Dataset Dataset 2 7. create e.g. ROOT e.g. athena Result 9. fill 10. gather Result 9. fill ResultCode

David Adams ATLAS May 19, 2003 DIAL BNL technology9 Applications Current application specification is Name –E.g. athena Version –E.g List of shared libraries –E.g. libRawData, libInnerDetectorReco

David Adams ATLAS May 19, 2003 DIAL BNL technology10 Applications (cont) Each DIAL compute node provides an application description database File-based –Location specified by environmental variable Indexed by application name and version Application description includes –Location of executable –Run time environment (shared lib path, …) –Command to build shared library from task code Defined by ChildScheduler –Different scheduler could change conventions

David Adams ATLAS May 19, 2003 DIAL BNL technology11 Results and Tasks Result is filled during processing Examples –Histogram –Event list –File Task provided by user Empty result plus Code to fill the result –Language and interface depend on application –May need to be compiled

David Adams ATLAS May 19, 2003 DIAL BNL technology12 Schedulers A DIAL scheduler provides means to Submit a job Terminate a job Monitor a job –Status –Events processed –Partial results Verify availability of an application Install and verify the presence of a task for a given application

David Adams ATLAS May 19, 2003 DIAL BNL technology13 Schedulers (cont) Schedulers form a hierarchy Corresponding to that of compute nodes –Grid, site, farm, node Each scheduler splits job into sub-jobs and distributes these over lower-level schedulers Lowest level ChildScheduler starts processes to carry out the sub-jobs Scheduler concatenates results for its sub-jobs User may enter the hierarchy at any level Client-server communication

David Adams ATLAS May 19, 2003 DIAL BNL technology14

David Adams ATLAS May 19, 2003 DIAL BNL technology15 Datasets Datasets specify event data to be processed Datasets provide the following List of event identifiers Content –E.g. raw data, refit tracks, cone=0.3 jets, … Means to locate the data –List of of logical files where data can be found –Mapping from event ID and content to a file and a the location in that file where the data may be found –Example follows

David Adams ATLAS May 19, 2003 DIAL BNL technology16

David Adams ATLAS May 19, 2003 DIAL BNL technology17 Datasets (cont) User may specify content of interest Dataset plus this content restriction is another dataset Event data for the new dataset located in a subset of the files required for the original Only this subset required for processing

David Adams ATLAS May 19, 2003 DIAL BNL technology18

David Adams ATLAS May 19, 2003 DIAL BNL technology19 Datasets (cont) Distributed analysis requires means to divide a dataset into sub-datasets Sub-dataset is a dataset Do not split data from any one event Split along file boundaries –Jobs can be assigned where files are already present –Split most likely done at grid level May assign different events from one file to different jobs to speed processing –Split likely done at farm level

David Adams ATLAS May 19, 2003 DIAL BNL technology20

David Adams ATLAS May 19, 2003 DIAL BNL technology21 Exchange format DIAL components are exchanged Between –User and scheduler –Scheduler and scheduler –Scheduler and application executable Components have an XML representation Exchange mechanism can be –C++ objects –SOAP –Files Mechanism defined by scheduler

David Adams ATLAS May 19, 2003 DIAL BNL technology22 Status All DIAL components in place Release 0.20 made in March But scheduler is very simple –Only local ChildScheduler is implemented –Grid, site, farm and client-server schedulers not yet implemented More details in CHEP paper at

David Adams ATLAS May 19, 2003 DIAL BNL technology23 Status (cont) Dataset implemented as a separate system Implementations: –ATLAS AthenaRoot file >Exists >Holds Monte Carlo generator information –ATLAS combined ntuple hbook file >Under development –Athena-Pool files >when they are available

David Adams ATLAS May 19, 2003 DIAL BNL technology24 Status (cont) DIAL and dataset classes imported to ROOT ROOT can be used as interactive user interface –All DIAL and dataset classes and methods available at command prompt –DIAL and dataset libraries must be loaded Import done with ACLiC Only preliminary testing done Need to add result for TH1 and any other classes of interest

David Adams ATLAS May 19, 2003 DIAL BNL technology25 Status (cont) No application integrated to process jobs Except test program dialproc counts events Plans for ATLAS: –Dialpaw to run paw to process combined ntuple >Under development –Athena to process Athena-Pool event data files >When Athena-Pool is available later this year –Perhaps a ROOT backend to process ntuples >Or is this better handled with PROOF? >Or use PROOF to implement a farm scheduler?

David Adams ATLAS May 19, 2003 DIAL BNL technology26 Status (cont) Interface for logical files was added recently Includes abstract interface for a file catalog –Local directory has been implemented –Plan to add AFS catalog >Good enough for immediate ATLAS needs –Eventually add Magda and/or RLS

David Adams ATLAS May 19, 2003 DIAL BNL technology27 Status (cont) Results Existing implementations –Counter –Event ID list Under development –HBOOK (logical) file >Holding histograms Planned –Collection of other results –ROOT objects (TH1, …) –AIDA instead of ROOT?

David Adams ATLAS May 19, 2003 DIAL BNL technology28 Development plans Highlighted items in red required for useful ATLAS tool and green to use it to analyze Athena-Pool data Schedulers Client-server schedulers Farm scheduler –Allows large-scale test Site and grid schedulers –GRID integration –Interact with dataset, file and replica catalogs –Authentication and authorization

David Adams ATLAS May 19, 2003 DIAL BNL technology29 Development plans (cont) Datasets Hbook combined ntuple –in development Interface to ATLAS POOL event collections –expected in summer ROOT ntuples ?? Applications PAW (with C++ wrapper) Athena for ATLAS ROOT ??

David Adams ATLAS May 19, 2003 DIAL BNL technology30 Development plans (cont) Analysis environment ROOT implementation needs testing JAS? –Java based –Add DIAL java binding? One or more of the LHC Python busses? –SEAL, Athena-ASK, GANGA, … –These can use ROOT via PyRoot –Import DIAL into Python >With SWIG or boost

David Adams ATLAS May 19, 2003 DIAL BNL technology31 GRID requirements A major goal of DIAL is to identify components and services to share with Other distributed interactive analysis projects –PROOF, JAS, … Distributed batch projects –Production –Analysis Non-HEP event-oriented problems –Data organized into a collection of “events” that are each processed in the same way

David Adams ATLAS May 19, 2003 DIAL BNL technology32 GRID requirements (cont) Candidates for shared components include Dataset –Event ID list –Content –File mapping –Splitting Application –Specification –Installation Task –Transformation = Application + Task

David Adams ATLAS May 19, 2003 DIAL BNL technology33 GRID requirements (cont) Shared components (cont) Job –Specification (application, task, dataset) –Response time –Hierarchy (split into sub-jobs) >DAG? Scheduler –Accepts job submission –Splits, submits and monitors sub-jobs –Gathers and concatenates results –Returns status including results and partial results

David Adams ATLAS May 19, 2003 DIAL BNL technology34 GRID requirements (cont) Shared components (cont) Logical files –Catalogs –Replication Authentication and authorization Resource location and allocation –Data, processing and matching

David Adams ATLAS May 19, 2003 DIAL BNL technology35 GRID requirements (cont) Important aspect is latency Interactive system provides means for user to specify maximum acceptable response time All actions must take place within this time –Locate data and resources –Splitting and matchmaking –Job submission –Gathering of results Longer latency for first pass over a dataset –Record state for later passes –Still must be able to adjust to changing conditions

David Adams ATLAS May 19, 2003 DIAL BNL technology36 GRID requirements (cont) Interactive and batch must share resources Share implies more available resources for both Interactive use varies significantly –Time of day –Time to the next conference –Discovery of interesting events Interactive request must be able to preempt long-running batch jobs –But allocation determined by sites, experiments, …