SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

Slides:

Advertisements

Similar presentations

Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.

Advertisements

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

ATLAS Analysis Model. Introduction On Feb 11, 2008 the Analysis Model Forum published a report (D. Costanzo, I. Hinchliffe, S. Menke, ATL- GEN-INT )

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.

Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.

Scenario testing Tor Stålhane. Scenario testing – 1 There are two types of scenario testing. Type 1 – scenarios used as to define input/output sequences.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.

Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.

FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.

FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.

Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

Efi.uchicago.edu ci.uchicago.edu FAX Dress Rehearsal Status Report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation.

1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.

Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.

Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Performance Tests of DPM Sites for CMS AAA Federica Fanzago on behalf of the AAA team.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.

Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.

AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.

Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.

Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.

Max Baak 1 Efficient access to files on Castor / Grid Cern Tutorial Max Baak, CERN 30 October 2008.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.

ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.

ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.

Data Distribution Performance Hironori Ito Brookhaven National Laboratory.

Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.

ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.

BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.

29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.

ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.

Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.

Efi.uchicago.edu ci.uchicago.edu Caching FAX accesses Ilija Vukotic ADC TIM - Chicago October 28, 2014.

Atlas IO improvements and Future prospects

Readiness of ATLAS Computing - A personal view

Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.

Presentation transcript:

SkimSlimService ENABLING NEW WAYS

Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster cpu’s) Physicists have no feedback on resources they used. Long running times. Very small percentage of people wants/knows-how to optimize their code. IT people are not happy when someone submits 10k jobs running with 1% efficiency for days, producing 10k of 100 MB files. Huge load on people doing DPD production, frequent errors, slow turnaround. Nobody wants to care about DS sizes, registrations, DDM transfers, approvals. This is the moment to do changes.

(R)evolution of ATLAS data formats 2/18/13ILIJA VUKOTIC 3 Original plan (6y. ago) ESD < 500kB/ev 1k br. AOD <100 kB/ev 500 br. Athena used for everything. 4y. ago ESD < 1500kB/ev 8k br. AOD <500 kB/ev 4k br. Athena + ARA 3y. ago ESD < 1800kB/ev 10k br. AOD <1000 kB/ev 7k br. D3PD <20 kB/ev k br. Athena + ARA + ROOT Today ESD < 1800kB/ev AOD <1000 kB/ev D3PD <200 kB/ev Athena + ARA + ROOT + Mana + RootCore + Event… Proposals for future ESD < 1800kB/ev AOD <1000 kB/ev GODZILA D3PDs, Structured D3PDs D3PD Athena + ARA + ROOT + Mana + RootCore + Event… TAG ?!

Problems with ATLAS data formats 2/18/13ILIJA VUKOTIC 4 large and kept only for a short time. Used only for special studies ESDs too large, needs Athena/ARA/Mana, slow to start up, nobody made it user friendly AOD A lot of them. Flat format Too large. (in sum much larger than AOD) Expensive to produce, store. Inefficient to read Could be reduced at least 60% but nobody knows who needs what Effectively usable only from grid jobs D3PD Takes up to a week to produce it on the grid. People make them larger than necessary to avoid doing it twice Files usually too small for efficient transport, storage, thus requiring merging that can’t be done on grid. Skim/slim D3PD

What a physicist want? 2/18/13ILIJA VUKOTIC 5 A full freedom to do analysis In a language he wants Not be forced to use complex frameworks with hundreds of libraries, 20 min compilations, etc. Not be forced to think about computing farms, queues, data transfers, job efficiency, … Get results in no time.

Idea 2/18/13ILIJA VUKOTIC 6 Let small number of highly experienced physicists together with IT stuff handle big data. They can do it efficiently. Move majority of physicists away from 100TB scale data to ~100GB data. Sufficiently small for transport, you can analyze it anywhere, even on your laptop. However inefficient your code you won’t spend too much resources, and will get results back in a reasonable time.

How would it work 2/18/13ILIJA VUKOTIC 7 Use FAX to access all the data without overhead of staging. Use optimally situated replicas. (possible optimization - production D3PDs preplaced at just several sites, maybe even just one) Physicists request skim/slim through a web service. Could add a few variables in flight. Produced datasets registered in the name of requester. Delivered to a site requested. All in 1-2 hours – this is essential, as only in this case people will skim/slim to only variables they need without thinking of – “what if I forget something I’ll need”.

Would it work? 2/18/13ILIJA VUKOTIC 8 Couple hundreds dedicated cores which are made free from all personal inefficient slims/skims using prun. Highly optimized code As we know what branches (variables) people are using we know what is useless in the original D3PDs, so we can produce them much smaller. If bug found in D3PD production no new global redistribution. Some problems can even be fixed in place without new production. If we find it useful we can split/merger/reorganize D3PD without anyone noticing. We could later even go for a completely different underlying big data format: Godzilla D3PDs, merged AOD/D3PD, Hadoop !

SkimSlimService 2/18/13ILIJA VUKOTIC 9 1 We have no dedicated resources for this I used UC3 but any queue that has cvmfs will suffice. 2 Modified version of filter-and-merge.py used. 3 Currently under my name as I don’t have production role. Web site at CERN gets requests, shows their status Web site at CERN gets requests, shows their status Handmade server 1 receives web queries, collects info on datasets, files, trees, branches Handmade server 1 receives web queries, collects info on datasets, files, trees, branches Executor at UC3 1 gets tasks from the DB, creates, submits condor SkimSlim jobs 2 makes and registers resulting DS 3 Executor at UC3 1 gets tasks from the DB, creates, submits condor SkimSlim jobs 2 makes and registers resulting DS 3 OracleDB at CERN Stores requests, splits them in tasks, serves as a backend for the web site OracleDB at CERN Stores requests, splits them in tasks, serves as a backend for the web site

2/18/13ILIJA VUKOTIC 10

Test runs results 2/18/13ILIJA VUKOTIC 11 Used datasets, skim, slim code of our larges user. Worst case scenario. All of the SMWZ 2012 data and MC 185 TB -> 10 TB (300 branches) Missing in FAX 24 datasets (~3.5%) data.Egamma.txt284 data.Muons.txt288 mc.Alpgen.txt63 mc.Herwig.txt3 mc.Pythia8.txt28 mc.Sherpa.txt19 mc.all.txt9 Total694

Test runs results 2/18/13ILIJA VUKOTIC 12 CPU efficiency: when data local ~ 0.75%, remote data between 10 and 50% (6.25MB/s gives 100% eff.) All of SMWZ requires 8600 CPU hours. Can be done in 2 hours by pooling unused resources. Could have one service in EU and one in US to avoid over the ocean traffic. It is easy to deploy service on anything that mounts CVMFS (UC3,UCT3, UCT2, OSG, EC2). On EC2 assuming small instance ~ 500$ Micro instance and spot pricing ~100$. But result delivery ~1k$ (10TB * 0.12/GB).

Conclusion 2/18/13ILIJA VUKOTIC 13 Produced a fully functional system you may use now. To be done Polish it Market it Push it politically (essential)

Reserve 2/18/13ILIJA VUKOTIC 14

A number of ATLAS sites made their storage accessible from outside using xRootD protocol 1. Has a mechanism that gets you a file if it exists anywhere in the federation. All kinds of sites: xrootd, dCache, dpm, lustre, gpfs Read only Need a grid proxy to use it Instructions: global regional AGLT2MWT2 SLAC 2/18/13ILIJA VUKOTIC 15 What is FAX? 1 CMS has very similar system they call AAA. EU UK OxfordQMUL Redirector Endpoint

2/18/13ILIJA VUKOTIC 16 1 CMS has very similar system they call AAA. We want all the T1s and T2s included. Adding new sites weekly. Currently 31. FAX today AGLT2 BNL-ATLAS BU_ATLAS_TIER2 CERN-PROD DESY-HH INFN-FRASCATI INFN-NAPOLI-ATLAS INFN-ROMA1 JINR-LCG2 LRZ-LMU MPPMU MWT2 OU_OCHEP_SWT2 PRAGUELCG2 RAL-LCG2 RU-PROTVINO-IHEP SWT2_CPB UKI-LT2-QMUL UKI-NORTHGRID-LANCS-HEP UKI-NORTHGRID-LIV-HEP UKI-NORTHGRID-MAN-HEP UKI-SCOTGRID-ECDF UKI-SCOTGRID-GLASGOW UKI-SOUTHGRID-CAM-HEP UKI-SOUTHGRID-OX-HEP WT2 WUPPERTALPROD GRIF-LAL GRIF-IRFU GRIF-LPNHE IN2P3-LAPP

Does it work? 2/18/13ILIJA VUKOTIC 17 *For the most part. But a lot of redundancy in the system. We have ~2.5 copies of popular datasets. YES!

2/18/13ILIJA VUKOTIC 18 What is it good for? IT: less failed jobs Physicist: less failed jobs Failover if grid job has a problem with an input file IT: easier upgrades, more availability Physicist: more CPU resources Diskless Tier2 IT: simpler and cheaper Physicist: more CPU resources Diskless Tier3s Physicist: effectively more disk space Less data movements GlobalLFN simplify scripts Enables storage sharing between nearby sites University queues Amazon, Google, Microsoft clouds Easily spin more workers Optimize applications Who is reading what How efficiently Have full info

How it works? 2/18/13ILIJA VUKOTIC 19 Quite complex system A lot of people involved A lot of development Takes time to deploy Takes time to work out kinks

What can I do today? 2/18/13ILIJA VUKOTIC 20 Access data on T2 disks localgroupdisk, userdisk, … If a file is not there job won’t fail, but will come from elsewhere. I can run jobs at uct2/uct3 and access data anywhere in FAX. Use frun: ◦If you have data processed at 10 sites all over the world ◦Want to merge them ◦Want to submit jobs where queues are short

Full Dress Rehearsal 2/18/13ILIJA VUKOTIC 21 A week of stress testing all of the FAX endpoints While we have continuous monitoring of standard user accesses (ROOT, xrdcp) to stress the system one has to submit jobs to grid. Submitting realistic jobs manually, automatically Had more problems with tests than with FAX ◦Late distribution of test dataset to endpoints (TB size datasets) ◦High load due to winter conferences did not help ◦Jobs running on a grid node are entirely different game due to limited proxy they use. ◦Found and addresses a number of issues ◦New voms libraries developed ◦Settings at several sites corrected ◦New pilot version Conclusion: We broke nothing (storages, lfcs, links, servers, monitoring). As soon as all observed problems fixed, we’ll hit harder.

FAX – remaining to be done 2/18/13ILIJA VUKOTIC 22 Near future: Further expansion: next in line – French and Spanish clouds Improving robustness of all the elements of the system Improving documentation, giving tutorials, user support Months: Move to Rucio Optimization: making network smart so it provides the fastest transfers Integration with other network services

Foogle.com 2/18/13ILIJA VUKOTIC 23 Simple to use: Learn a few simple things (shell scripts, pbs/condor macros, python, root and c++, laTeX, … ) Write a few hundreds pages of code Process crawler data and rewrite in a new way. Move it Rewrite original format to a new different one. Rewrite again. Move it. Code to find the page Compile your page to ps/pdf Show! New internet search engine! Say NO to IE, firefox, chrome! RAW -> ESD ESD -> AOD AOD -> D3PD D3PD -> slimmed D3PD slimmed one to Ntuple for final analysis Final analysis Terminal based! From inventors of WWW !