PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop 13-14 November 2012, Santa Cruz.

Slides:

Advertisements

Similar presentations

B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.

Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.

MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.

Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

Belle MC Production on Grid 2 nd Open Meeting of the SuperKEKB Collaboration Soft/Comp session 17 March, 2009 Hideyuki Nakazawa National Central University.

The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.

Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.

Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.

A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.

STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.

Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.

Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

HammerCloud Functional tests Valentina Mancinelli IT/SDC 28/2/2014.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.

Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.

PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

CIS-NG CASREP Information System Next Generation Shawn Baugh Amy Ramirez Amy Lee Alex Sanin Sam Avanessians.

G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.

BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.

1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 2 v3.1 Module 2 Introduction to Routers.

ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.

Efi.uchicago.edu ci.uchicago.edu Caching FAX accesses Ilija Vukotic ADC TIM - Chicago October 28, 2014.

Lessons learned administering a larger setup for LHCb

BDII Performance Tests

Lecture Topics: 11/1 General Operating System Concepts Processes

Presentation transcript:

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz

IMPORTANCE OF ANALYSIS JOBS  Number of analysis jobs are increasing  Production jobs are mostly CPU limited, well controlled, hopefully optimized and can be monitored through other already existing system  Analysis jobs we know very little about and potentially could: be inefficient, wreck havoc at storage elements, networks. Twice failure rate of production jobs 13/11/2012ILIJA VUKOTIC 2

ANALYSIS QUEUES PERFORMANCE Idea  Find what is performance of ATLAS analysis jobs on the grid  There is no framework that everybody uses, that could be instrumented  Understand numbers: each site has it’s hard limits in terms of storage, cpus, network, software.  Improve  ATLAS software  ATLAS files, way we use them  Site’s configurations Requirements:  Monitoring framework  Tests simple, realistic, accessible, versatile as possible  Running on most of the resources we have  Fast turn around  Test codes that are “recommended way to do it”  Web interface for most important indicators 13/11/2012ILIJA VUKOTIC 3

TEST FRAMEWORK 13/11/2012ILIJA VUKOTIC 4 HammerCloud ORACLE DB CERN Results ORACLE DB CERN Results WEB site SVN configuration, test scripts  Continuous  Job performance  Generic ROOT IO scripts  Realistic analysis jobs  Site performance  Site optimization  One-off  new releases (Athena, ROOT)  new features, fixes  All T2D sites (currently 42 sites)  Large number of monitored parameters  Central database  Wide range of visualization tools  Continuous  Job performance  Generic ROOT IO scripts  Realistic analysis jobs  Site performance  Site optimization  One-off  new releases (Athena, ROOT)  new features, fixes  All T2D sites (currently 42 sites)  Large number of monitored parameters  Central database  Wide range of visualization tools

TEST FRAMEWORK 13/11/2012ILIJA VUKOTIC 5 Pilot numbers obtained from panda db  5-50 jobs per day per site  Each job runs at least 24 tests  5 read modes + 1 full analysis job  Over 4 different files  Takes data on machine status  Cross reference to Panda DB  Currently 2 million results in DB WEB site

SUMMARY RESULTS Setup times 13/11/2012ILIJA VUKOTIC 6

SUMMARY RESULTS Stage-in 13/11/2012ILIJA VUKOTIC 7 Space for improvement 60 s = 41 MB/s The Fix

SUMMARY RESULTS Execution time 13/11/2012ILIJA VUKOTIC 8 GPFS not mounted – can’t run in direct mode

SUMMARY RESULTS Stage out 13/11/2012ILIJA VUKOTIC 9

SUMMARY RESULTS Total time = setup + stage in + exec + stage out [s] – as measured by pilot 13/11/2012ILIJA VUKOTIC 10

SUMMARY – GOING DEEPER CPU efficiency  Measures only event loop  Defined as CPU time / WALL time  Keep in mind – very slow machine can have very high CPU eff.  All you want to do is make it as high as possible 13/11/2012ILIJA VUKOTIC 11 FACTS: 1.Unless doing bootstrapping or some weird calculation, users code is negligible compared to unzipping. 2.ROOT can unzip at 40MB/s

SUMMARY – GOING DEEPER CPU efficiency 13/11/2012ILIJA VUKOTIC 12 Direct access site

13/11/2012ILIJA VUKOTIC 13 GOING DEEPER - CASE OF SWITCH STACKING Test files are local to both UC and IU sites. Lower band is IU. Only part of the machines are affected. (the best ones)

We check CPU eff. VS.  Load  Network in/out  Memory  Swap 13/11/2012ILIJA VUKOTIC 14 GOING DEEPER - CASE OF SWITCH STACKING

CASE OF SWITCH STACKING 13/11/2012ILIJA VUKOTIC 15 Machines can do much better as seen in copy2scratch mode. Drained node as bad as busy one. Manual checks show connections to servers much bellow 1Gbps. Stack performance depend on: its configuration (software) what is connected where Optimal switch stacking not exactly trivial. I suspect a lot of sites have the same issues. NET2 and BNL show very similar pattern. Will be investigated till the bottom.

Finally  Two big issues discovered. Just that was worth the effort  Bunch of smaller problems with queues, misconfigurations found and solved FUTURE  Fixing remaining issues  Investigate Virtual Queues  Per site web interface  Automatic procedure to follow performance  Automatic mailing  Investigating non-US sites 13/11/2012ILIJA VUKOTIC 16

13/11/2012ILIJA VUKOTIC 17 WORKFLOW ISSUES For most users this is the workflow:  Skimming/slimming data  usually prun and no complex code  often filter_and_merge.py  Merging data  only part of people do it  unclear how to do it on the grid  moving small files around very inefficient  Getting data locally  DaTRI requests to USA processed slowly  Most people dq2-get  Storing it locally  Not much space in tier-3’s  Accessing data from localgroupdisk Analyzing data  Mostly local queues  Rarely proof  People willing to wait for few hours and manually merge results

SLIM SKIM SERVICE Idea Establish service to which users submit parameters of their skim&slim job, uses opportunistically CPU’s and FAX as data source and provides optimized dataset. Practically  WebUI to submit request and follow job progress  Oracle DB for a backend  Currently UC3 will be used for processing.  Output data will be dq2-put into MWT2 13/11/2012ILIJA VUKOTIC 18 Work started. Performance and turn around time are what will make or brake this service. Work started. Performance and turn around time are what will make or brake this service.

APPENDIX The Fix  timer_command.py is part of the pilot3 code. Used very often in all of the transforms.  Serves to start any command as a subprocess and kills it if not finished before a given timeout. Not exactly trivial.  For some commands was waiting 60 seconds even when command finished.  Was also trying to close all the possible file descriptors before executing child process. That could take from 0.5s – few tens of seconds depending on site’s settings. Fixed in the last pilot version.  Total effect estimate:  Quarter of computing time is spent on analysis jobs  Average analysis job is less than 30 min.  Fix speeds up job in average 3 minutes - 10%  Applied to 40 Tier2’s the fix equivalent of adding one full tier2 of capacity 13/11/2012ILIJA VUKOTIC 19 BACK