David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.

Slides:



Advertisements
Similar presentations
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Advertisements

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
Grid Workload Management Massimo Sgaravatto INFN Padova.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
ARDA Prototypes Andrew Maier CERN. ARDA WorkshopAndrew Maier, CERN2 Overview ARDA in a nutshell –Experiments –Middleware Experiment prototypes (basic.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management and Interoperability Peter Kunszt (JRA1 DM Cluster) 2 nd EGEE Conference,
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL June 7, 2004 BNL Technology Meeting.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,
EGEE is a project funded by the European Union under contract IST Report from the PTF Fabrizio Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
LCG middleware and LHC experiments ARDA project
Presentation transcript:

David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Contents Introduction Key concepts Components ARDA prototype Integration Strategy Workload management Data management Package management Catalog services Service discovery Service infrastructure Priorities Final comments

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Introduction ATLAS Distributed Analysis (ADA) Distributed data, processing and users Supports all distributed activities –Classical analysis: event data  histograms –User-level production –Easy to add new activity (transformation) Strong emphasis on provenance tracking –Automatic, complete and accurate Easy to use –ROOT, python, GUI and command line submission and monitoring Analysis service insulates users from processing systems –Same look and feel for local or grid processing

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Key concepts Dataset Describes a collection of data –E.g. a collection of reconstructed events, –A collection of histograms, or … Transformation Defines an operation to be performed on the data –Dataset  Dataset Application + task (user configuration of application) Job Instance of a transformation Typical user request processed as a collection of sub-jobs –Same transformation acting on sub-datasets –Master job includes splitting of input dataset and merging of output State and partial results available during processing

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Key concepts (cont) Transformation

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Components Analysis service Web service interface for submitting and monitoring jobs Multiple implementations to handle various processing systems –Including gLite Can be used hierarchically to scale to large number of jobs Catalog services Interfaces for recording datasets, transformations and jobs And metadata associated with these Provide provenance tracking Clients Make use of analysis and catalog services ROOT and python with more possible Common interfaces enable mix and match of clients and services

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Components (cont)

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, ARDA prototype End-to-end system Based on the gLite middleware High-level strategy Keep the common ATLAS structure –Clients –Catalog services Add gLite analysis service based on the gLite WMS –Same user view for submission and monitoring –Easy to compare with other options (LCG, LSF, OSG, …) Make use of the other GLite services or interfaces needed for effective use of the WMS –E.g. data management, package management, monitoring, … Consider adoption of other useful gLite services and interfaces

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Integration Important to understand integration with other grids Nordugrid, OSG, vanilla globus, … Including sites not associated with ATLAS or LHC Options 1.Everyone deploys the gLite services 2.GLite services only deployed at EGEE sites 3.Some combination of 1 and 2 In any case, data must be shared Encourage adoption of common standards here SRM may become a common standard RLS appears to be failure –ATLAS manages this by adding Don Quijote Maintaining a common security infrastructure is very helpful

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Strategy The following sections provide specifics on the following Workload management Data management Package management Catalog services Service discovery Service infrastructure

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Workload management ATLAS jobs are first handled by an analysis service Job specified by transformation and input dataset Processing includes the following steps Split input dataset Define a job for each sub-dataset Submit jobs to WMS Monitor them As each finishes, append to the current result Make use of the gLite WMS To process sub-jobs and possibly to carry out merging We must construct appropriate JDL to apply an ADA transformation –Where is JDL defined?

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Workload management (cont) Transformation does the following Locate and setup required software –Need package management Fetch required input data –Specified by dataset –LFN or GUID used to locate physical files Run software Save output data –Move to SE –Record in replica catalog –Construct output dataset All done using generic interfaces Same transformation used for local processing and other grids User or AS may provide preferences, e.g. which SE or RC

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Data management Local file management GLite has adopted SRM 1.1 for local file management (SE) ATLAS endorses this decision and is starting in the same way High priority (and possible ARDA activity) to integrate SRM into Don Quijote Replica catalogs GLite interfaces for file and replica management are complex ATLAS will keep the Don Quijote as the interface over gLite and other useful catalogs –At present NG, LCG, Grid3 and gLite Need to understand what interface is be presented to gLite WMS –Avoid copying all data to gLite or –Implementing the full gLite DM interface –Unless we adopt Fireman as the ATLAS catalog

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Data management (cont) Bulk file transfers ATLAS is developing reliable transfer software for DC2 Not yet clear how this connects to the gLite transfer service or the transfer capabilities in the present SRM implementations Posix-like I/O for logical files Do not require this –To support as many sites as possible Might use it where available –Is there a performance gain over copy and run?

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Package management Transformations need to locate software ATLAS is defining a simple command line interface pkgmanager for accessing and installing software packages ATLASDIR=`pkg_locate atlas ` There are two implementations: –One based on pacman and –One enabling users to register pre-installed software Soon add one to combine any number of managers Need means to deploy at all sites –Then arbitrary transformation can run at any site willing to install all required software GLite provide PackMan service interface Requires a job ID to find package location Might be able to provide a wrapper exposing the above interface

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Catalog services ADA has identified three catalog interfaces: XML Repository –Stores XML strings indexed by ID –ID is a string Selection catalog –Stores ID’s indexed by key-value pairs Replica catalog –Stores replica ID list indexed by logical ID’s –Not restricted to files Now these are C++ class interfaces Considering adding web service interface –Support clients in other languages –Add server-side functionality

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Catalog services (cont) GLite Fireman is both too complex and too restrictive Includes metadata, ACL’s, directories,… Requires ID be a GUID, replica is a SURL GLite metadata catalog is close to ADA selection catalog Like to add query for schema, Support for basic data types (string, int, float), and Extensible schema (depending on entry)

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Service discovery We need a mechanism for service discovery UDDI looks pretty complex Clarens has put up a discovery service ATLAS is evaluating it This is missing entirely in gLite We would like to see support for site-based services Perhaps a discovery service running at each site –Still have to discover the site discovery service Convention to specify the site with which a machine is associated –Machines away from sites might choose an association –E.g. my laptop prefers CERN for storage and job submission

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Service infrastructure There are many service infrastructure issues Starting and stopping services Monitoring Discovery Security –Gridmap or authorization service –Delegation –Transport vs. message level security ATLAS would like to understand how gLite is addressing these issues To understand how to deploy our services And integrate them with the gLite services

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Priorities 1.Workload management service Web service interface released and deployed (gLite) –And all supporting services With significant compute resources Corresponding analysis service (ARDA) 2. SRM Integrated into Don Quijote (ARDA) DQ integration into DIAL (DIAL) 3. Fireman Web service released and deployed (gLite) –Catalog gLite data –Candidate for single ATLAS catalog Integrated into DQ (ARDA) –So data is available to clients and other grids

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Priorities (cont) 4. DM interface to gLite WMS for data on other grids 5. Package management Command line interface (gLite-NG-OSG or ATLAS) –For jobs on worker nodes Common WSDL for ADA and gLite –To perform or check installations before job submission 6. Service Infrastructure Security Discovery Monitoring 7. Catalog services Should be able to agree on metadata catalog

David Adams ATLAS ARDA Workshop Atlas-ARDA strategy/prioritiesOct 21, Final comments Optimistic that ATLAS-ARDA will build a useful analysis system on gLite Need release and deployment of high priority gLite services –Couple months in advance of delivering end-to-end system Focus on delivering baseline interface and implementation –Extras and performance enhancements come later –GLite, ARDA and ATLAS Still much needed –From gLite, ARDA and ATLAS GLite design has improved Little direct feedback to comments but Documents reflect our comments Need to coordinate plans for non-EGEE sites