Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.

Similar presentations


Presentation on theme: "ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting."— Presentation transcript:

1 ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting

2 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20052 Contents DIAL project Implementation Components Dataset Transformation Job Scheduler Catalogs User interfaces Status Interactivity Current results Conclusions More information Contibutors

3 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20053 DIAL project DIAL = Distributed Interactive Analysis of Large datasets Goal is to demonstrate the feasibility of doing interactive analysis of large data samples Analysis means production of analysis objects (e.g. histograms or ntuples) from HEP event data Interactive means that user request is processed in a few minutes Large is whatever is available and useful for physics studies –How large can we go? Approach is to distribute processing over many nodes using the inherent parallelism of HEP data Assume each event can be independently processed Each node runs an independent job to processes a subset of the events Results from each job are merged into the overall result

4 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20054 Implementation DIAL software provides a generic end-to-end framework for distributed analysis Generic means few assumptions about the data to be processed or application that carries out the application –Experiment and user must provide extensions to the system Able to support a wide range of processing systems –Local processing using batch systems such as LSF, Condor and PBS –Grid processing using different flavors of CE or WMS Provides friendly user interfaces –Integration with root –Python binding (from GANGA) –GUI for job submission and monitoring (from GANGA) Mostly written in C++ Releases packaged 3-4 times/year

5 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20055 Components AJDL Abstract job definition language Dataset and transformation (application + task) Job C++ interface and XML representation for each Scheduler Handles job processing Catalogs Hold job definition objects and their metadata Web services User interfaces Monitoring and accounting

6 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20056 Dataset Generic data description includes Description of the content (type of data) –E.g. reconstructed objects, electrons, jets with cone size 0.7,… –Number of events and their ID’s Data location –Typically a list of logical file names List of constituent datasets –Model is inherently hierarchical DIAL provides the following classes Dataset – defines the common interface GenericDataset – provides data and XML representation TextDataset – embedded collection of text files SingleFileDataset – Data location is a single file EventMergeDataset – Collection of single file event datasets

7 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20057 Dataset (cont) Current approach for experiment-specific event datasets Add a GenericDataset subclass that is able to read experiment files and fill the appropriate data Use EventMergeDataset to describe large collections DIAL can carry out processing using the GenericDataset interface and implementation Allowed for experiment to provide its own implementation of the Dataset interface If GenericDataset is not sufficient Processing components then require this implementation

8 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20058 Transformation A transformation acts on a dataset to produce another dataset Fundamental user interaction with the system Transformation has two components Application describes the action to take Task provides data to configure the application Task is a collection of named text files Application holds two scripts run – Carries out transformation –Input is dataset.xml and output is result.xml –Has access to the transformation build_task – Creates transformation data from the files in the task –E.g. compiles to build library –Build is platform specific

9 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 20059 Job The Job class hold the following data: Job definition: application, task, dataset and job preferences Status of the corresponding job –Initialized, running, done, failed or killed Other history information –Compute node, native ID, start and stop times, return code, … Job provides interface to Start or submit job Update the status and history information Kill job Fetch the result (output dataset)

10 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200510 Job (cont) Subclasses Provide connection to underlying processing system Implement these methods Currently support fork, LSF and Condor ScriptedJob Subclass that implements these methods by calling a script Scripts may be then written to support different processing systems –Alternative to providing subclass Added in release 1.20 Job copies Jobs may be copied locally or remotely Copy includes only the base class: –All common data available but job cannot be started, updated or killed –Use scheduler for these interactions

11 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200511 Scheduler Scheduler carries out processing The following interface is defined add_task builds the task –Input: application, task –Output: job ID submit creates and starts a job –Input: application, task, dataset and job preferences –Output: job ID job(JobId) returns a reference to the job –Or to a copy if the scheduler is remote kill(JobId) to kill a job

12 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200512 Scheduler (cont) Subclasses implement this interface LocalScheduler –Use a single job type and queue to handle processing, e.g. LSF MasterScheduler –Splits input dataset –Uses an instance of LocalScheduler to process subjobs –Merges results from subjobs Analysis service Web service wrapper for Scheduler –This service may run any scheduler with any job type WsClientScheduler subclass acts as a client to the service –Same interface for interacting with local fork, local batch or remote analysis service Typical mode of operation is to use a remote analysis service

13 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200513 Scheduler (cont) Typical structure of a job running on a MasterScheduler

14 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200514 Scheduler (cont) Scheduler hierarchy Plan to add a subclass to forward requests to a selected scheduler based on job characteristics This will make it possible to create a tree (or web) of analysis services –Should allow us to scale to arbitrarily large data samples (limited only by available computing resources)

15 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200515 Scheduler (cont) Example of a possible scheduler tree for ATLAS

16 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200516 Catalogs DIAL provides support for different types of catalogs Repositories to hold instances of application, task, dataset and job Selection catalogs associate names and metadata with instances of these objects And others MySql implementations Populated with ATLAS transformations and Rome AOD datasets

17 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200517 User interfaces Root Almost all DIAL classes are available at the root command line User can access catalogs, create job definitions, and submit and monitor jobs Python interface available in PyDIAL Most DIAL classes are available as wrapped python classes Work done by GANGA using LCG tools GUI There is a GUI that supports job specification, submission and monitoring Provided by GANGA using the PyDIAL

18 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200518

19 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200519

20 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200520 Status Releases Most of the previous is available in the current release (1.20) Release 1.30 expected next month Release with service hierarchy in January Use in ATLAS Ambition is to use DIAL to define a common interface for distributed analysis with connections to different systems –See figure –ATLAS Distributed analysis is under review Also want to continue with the original goal of providing a system with interactive response for appropriate jobs Integration with the new USATLAS PANDA project

21 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200521 Possible analysis service model for ATLAS

22 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200522 Status (cont) Use in other experiments Some interest but no serious use that I know of Current releases depend on ATLAS software but I am willing to build a version without that dependency –Not difficult A generic system like this might be of interest to smaller experiments that would like to have the capabilities but do not have the resources to build a dedicated system

23 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200523 Interactivity What does it mean for a system to be interactive? Literal meaning suggests user is able to directly communicate with jobs and subjobs Most users will be satisfied if their jobs complete quickly and do not need to interact with subjobs or care if some other agent is doing so –Important exception is the capability to a job and all its subjobs Interactive (responsive) impose requirements on the processing system (batch, WMS, …) High job rates (> 1 Hz) Low submit-to-result job latencies (< 1 minute) High data input rate from SE to farm (> 10 MB/job)

24 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200524 Current results ATLAS generated data samples for the Rome physics meeting in June Data includes AOD (analysis oriented data) that is a summary of the reconstructed event data –AOD size is 100 kB/event All Rome AOD datasets are available in DIAL An interactive analysis service is deployed at BNL using a special LSF queue for job submission The following plot shows processing times for datasets of various sizes –LSF SUSY (green triangles) is a “real AOD analysis” producing histograms –LSF big produces ntuples and extra time is for ntuple merging (not parallelized) –Largest jobs take 3 hours to run serially and are processed in about 10 minutes with the DIAL interactive service

25 ATLAS

26 D. Adams DOSAR meeting DIALSeptember 16, 200526 Conclusions DIAL analysis service running at at BNL Serves clients from anywhere Processing done locally Rome AOD datasets available Robust and responsive since June Next steps Similar service running at UTA (using PBS) Service to select between these BNL service to connect to OSG CE (in place of local LSF) Integration with new ATLAS data management system Integration with PANDA

27 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200527 More information DIAL home page: http://www.usatlas.bnl.gov/~dladams/dial ADA (ATLAS Distributed Analysis) home page: http://www.usatlas.bnl.gov/ADA Current DIAL release: http://www.usatlas.bnl.gov/~dladams/dial/releases/1.20

28 ATLAS D. Adams DOSAR meeting DIALSeptember 16, 200528 Contributors GANGA Karl Harrison, Alvin Tan DIAL David Adams, Wensheng Deng, Tadashi Maeno, Vinay Sambamurthy, Nagesh Chetan, Chitra Kannan ARDA Dietrich Liko ATPROD Frederic Brochu, Alessandro De Salvo AMI Solveig Albrand, Jerome Fulachier ADA (plus those above), Farida Fassi, Christian Haeberli, Hong Ma, Grigori Rybkine, Hyunwoo Kim


Download ppt "ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting."

Similar presentations


Ads by Google