Event Service Intro, plans, issues and objectives for today Torre Wenaus BNL US ATLAS S&C/PS Workshop Aug 21, 2014.

Slides:

Advertisements

Similar presentations

PanDA Integration with the SLAC Pipeline Torre Wenaus, BNL BigPanDA Workshop October 21, 2013.

Advertisements

The Internet2 NET+ Services Program Jerry Grochow Interim Vice President CSG January, 2012.

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini

Cloud Computing to Satisfy Peak Capacity Needs Case Study.

Validata Release Coordinator Accelerated application delivery through automated end-to-end release management.

DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.

NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.

Collaborative Management Environment: Merging R&D Tracking and Electronic Proposal Submission R&D Council Meeting March 12, 1999 Dr. Thomas E. Potok.

Assessment of Core Services provided to USLHC by OSG.

A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster

Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.

David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

David Cameron Riccardo Bianchi Claire Adam Bourdarios Andrej Filipcic Eric Lançon Efrat Tal Hod Wenjing Wu on behalf of the ATLAS Collaboration CHEP 15,

PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

ALICE Offline Week | CERN | November 7, 2013 | Predrag Buncic AliEn, Clouds and Supercomputers Predrag Buncic With minor adjustments by Maarten Litmaath.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

Overview of ASCR “Big PanDA” Project Alexei Klimentov Brookhaven National Laboratory September 4, 2013, Arlington, TX PanDA UTA.

Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.

ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN.

MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.

PanDA & BigPanDA Kaushik De Univ. of Texas at Arlington BigPanDA Workshop, CERN October 21, 2013.

ALICE-PanDA Pilot Factorizations Kaushik De Nov. 7, 2014.

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.

Event Service Wen Guan University of Wisconsin 1.

Event Service Torre Wenaus, BNL For P. Calafiura, K. De, T. Maeno, D. Malon, P. Nilsson,

ATLAS Distributed Computing in LHC Run2

Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,

Next-Generation Navigational Infrastructure and the ATLAS Event Store Abstract: The ATLAS event store employs a persistence framework with extensive navigational.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.

Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu

BigPanDA Status Kaushik De Univ. of Texas at Arlington Alexei Klimentov Brookhaven National Laboratory OSG AHM, Clemson University March 14, 2016.

Event Service Torre Wenaus, BNL Thanks to K. De, P. Calafiura, T. Maeno, D. Malon, P. Nilsson, V. Tsulaia, P. Van Gemmeren, R. Vitillo et al for input.

Evolution of storage and data management

Review of the WLCG experiments compute plans

Virtualization and Clouds ATLAS position

Simone Campana CERN-IT

PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL

HPC DOE sites, Harvester Deployment & Operation

HEP Computing Tools for Brain Studies

Steven Newhouse EGI-InSPIRE Project Director, EGI.eu

Fine grained processing with an Event Service

The ADC Operations Story

BigPanDA WMS for Brain Studies

Univ. of Texas at Arlington BigPanDA Workshop, ORNL

Cloud Computing R&D Proposal

This work is partially supported by projects InterExcellence (LTT17018), Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1.

SDM workshop Strawman report History and Progress and Goal.

Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing

Exploit the massive Volunteer Computing resource for HEP computation

A Possible OLCF Operational Model for HEP (2019+)

Workflow Management Software For Tomorrow

Presentation transcript:

Event Service Intro, plans, issues and objectives for today Torre Wenaus BNL US ATLAS S&C/PS Workshop Aug 21, 2014

Event Service A new approach to event processing: stream events in near- continuous way, both at input and output, through a worker node that may have a very short (or a very long) lifetime Decouple processing from the chunkiness of files, and data locality considerations – potentially WAN latency insensitive Potential to make processing completely asynchronous from the flow of input data to the worker Light footprint on the WN; if it disappears we lose little WNs with very short lifetime: great for opportunistic resources – The sand filling up the “full” jar of rocks in HPCs – More examples of using others’ excess capacity: Amazon spot market, volunteer computing (BOINC, created here) 2014/8/21 Torre Wenaus, BNL 2

Event loop Events complete notification PanDA creates merge job Event list Job specification Data sources Token extractor Data sources Worker Out Worker Out Worker Out Data Repository Data Repository Event files Pilot Event Service Spring/Summer 2014 Object Store staging area Object Store staging area PanDA merge job PanDA merge job Worker node Remote PanDA/JED I JEDI Event table JEDI Event table runEvent module manages event loop runEvent module manages event loop Dispatcher getEvents Returns run#/ev#/guid list Event retrieval tokens CERN, BNL SE Worker Output aggregation Data Repository Data Repository getJob request returns event service job getJob Event Index Service Event Index Service POOL file catalog POOL file catalog Token Scatterer 4 4 Tokens Output stager Output stager Event data retrieval (xrootd) 2014/8/21 AthenaMP Payload Torre Wenaus, BNL

Event Service in US ATLAS A fruitful US-dominated collaboration in ATLAS – Across the labs – Across core and distributed software – Soon, across software and facilities as we deploy First target: simulation production – Relatively simple workflow, I/O and metadata – Relatively easy to port to target platforms like HPC – Gives the biggest gain: most of our cycles go to simu Beyond that: let’s worry first about simulation production (but potential is there for less CPU-intensive workloads also, especially if we exploit asynchronous data delivery) 2014/8/21 Torre Wenaus, BNL 4

Status Transitioning from end-to-end testing to pre-production testing Formerly a best-effort activity, now a high profile one with important, dated deliverables Some of which are imminent – Opening of opportunity for Amazon spot market large scale deployment – Supercomputing 2014 demo sponsored by DOE ASCR, preliminary to new ASCR funding rounds – Rise of HPCs in ATLAS: lots capacity full of rocks but not sand – is up and running, but feeling the lack of ES 2014/8/21 Torre Wenaus, BNL 5

Today Survey the technical status, emphasis on outstanding issues and near/midterm plans Then, in part 2, focusing on the early deployment targets – Preproduction on the grid – Amazon spot market – HPCs – Emerge from today with timelines and milestones In a splinter at the end of the day, continue discussion on biggest outstanding technical question mark: ES on HPCs – We have a concept, ‘Yoda’, that we want to move from concept to defined development activity today 2014/8/21 Torre Wenaus, BNL 6

Concept for MPI-based event service extension for HPCs 2014/8/21 Torre Wenaus, BNL 7

Supplemental – SC14 demo 2014/8/21 Torre Wenaus, BNL 8

Objectives and Impact Demonstrate a new fine grained approach to HEP event processing Agile & efficient in exploiting potentially short-lived resources: HPC hole-filling, spot market clouds, volunteer computing Data flows utilize cloud data repositories with no data locality or prestaging requirements Minimize use of costly storage in favor of strongly leveraging powerful networks Granular data processing on HPCs using an Event Service Schematic of the ‘Yoda’ extension of PanDA’s Jedi dynamic job manager into HPCs as an MPI based service. Being implemented by the BNL, LBNL and UT Arlington based team that has implemented the event service and ported PanDA to HPC (ORNL Titan). The Yoda work’s first target is NERSC HPCs. Applicable to any scientific processing that can be finely partitioned ASCR leverage: HPCs, powerful network, cloud data stores, exascale workload management (ASCR supported BigPanDA, 1.3 ExaBytes processed in ‘13) BNL, LBNL, ANL, ORNL, PNNL involved Features Technical Achievements Successful full chain test of Event Service on real ATLAS simulation workloads; entering pre-production testing Data flows to cloud object stores, currently CERN, adding BNL PanDA now HPC capable: running at OLCF Titan Now extending PanDA’s new Jedi dynamic job manager into a mini MPI version as an event service manager for HPCs – ‘Yoda’ 2014/8/21 Torre Wenaus, BNL 9

Why the ES for an ASCR demo The US ATLAS core software teams at 3 DOE labs BNL, ANL, LBNL for the past ~year have been collaborating on developing an ‘Event Service’ (ES) for ATLAS, leveraging their ATLAS core expertise: – BNL - distributed computing, workload management (PanDA) – ANL - event I/O – LBNL - multiprocessing core framework The ES is a new approach to HEP event processing: fine grained, agile in exploiting potentially short-lived opportunistic resources (such as HPC hole-filling), heavily leveraging excellent networking, and designed to integrate well with data repositories ‘in the cloud’ (e.g. object stores) It leverages ASCR strengths: HPCs, powerful networking, ‘virtual data centers’ (whatever they are, the ES can make good use of them) It leverages a project ASCR is already paying for: BigPanDA, extending PanDA beyond ATLAS to HPCs, intelligent networking and the Exascale community It leverages an existing collaboration between four labs -- in addition to the three mentioned, ORNL participates through their (ALICE-directed) participation in the BigPanDA project, and PNNL (Belle II) interested also 2014/8/21 Torre Wenaus, BNL 10