Efi.uchicago.edu ci.uchicago.edu FAX Dress Rehearsal Status Report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation.

Slides:



Advertisements
Similar presentations
SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.
Advertisements

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
ATLAS federated xrootd monitoring requirements Rob Gardner July 26, 2012.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Efi.uchicago.edu ci.uchicago.edu Status of the FAX federation Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 /
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
The GridPP DIRAC project DIRAC for non-LHC communities.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.
Efi.uchicago.edu ci.uchicago.edu Caching FAX accesses Ilija Vukotic ADC TIM - Chicago October 28, 2014.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Federating Data in the ALICE Experiment
U.S. ATLAS Grid Production Experience
Future of WAN Access in ATLAS
PanDA in a Federated Environment
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
FDR readiness & testing plan
Brookhaven National Laboratory Storage service Group Hironori Ito
Presentation transcript:

efi.uchicago.edu ci.uchicago.edu FAX Dress Rehearsal Status Report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi Institutes University of Chicago Software & Computing Workshop March 13, 2013

efi.uchicago.edu ci.uchicago.edu 2 All the slides are made by Rob Gardner, I just updated them. He can’t be here due to the OSG All Hands meeting.

efi.uchicago.edu ci.uchicago.edu 3 Data federation goals Create a common ATLAS namespace across all storage sites, accessible from anywhere Make easy to use, homogeneous access to data Identified initial use cases – Failover from stage-in problems with local storage – Gain access to more CPUs using WAN direct read access o Allow brokering to Tier 2s with partial datasets o Opportunistic resources without local ATLAS storage – Use as caching mechanism at sites to reduce local data management tasks o Eliminate cataloging, consistency checking, deletion services WAN data access group formed in ATLAS to determine use cases & requirements on infrastructure

efi.uchicago.edu ci.uchicago.edu 4 Implications for Production & Analysis Behind the scenes in the Panda + Pilot systems: – Recover from stage-in to local disk failures – This is in production at a few sites Development coming to allow advanced brokering which includes network performance – Would mean jobs no longer require dataset to be complete at a site – Allows “diskless” compute sites Ability to use non-WLCG resources – “Off-grid” analysis clusters – Opportunistic resources – Cloud resources

efi.uchicago.edu ci.uchicago.edu 5 FDR testing elements Starting week of January 21, we’ve been following a bottoms- up approach which builds stability in lower layers At-large users HammerCloud & WAN-FDR jobs (programmatic) HammerCloud & WAN-FDR jobs (programmatic) Network cost matrix (continuous) Basic functionality (continuous) Complexity

efi.uchicago.edu ci.uchicago.edu 6 Site Metrics “Connectivity” – copy and read test matrices HC runs with modest job numbers – Stage-in & direct read – Local, nearby, far-away Load tests – For well functioning sites only – Graduated tests 50, 100, 200 jobs vs. various # files – Will notify the site and/or list when these are launched Results – Simple job efficiency – Wallclock, # files, CPU %, event rate,

efi.uchicago.edu ci.uchicago.edu 7 Probes, integrated with AGIS Direct xrdcp copy of test files Copy using regional redirector At start of the FDR 22 sites Currently 32 sites Redirection network touches six clouds (DE, FR, IT, RU, UK, US) plus CERN Redirectors ready for ES and Asia regions At start of the FDR 22 sites Currently 32 sites Redirection network touches six clouds (DE, FR, IT, RU, UK, US) plus CERN Redirectors ready for ES and Asia regions

efi.uchicago.edu ci.uchicago.edu 8 Basic redirection functionality Direct access from clients to sites Redirection to non-local data (“upstream”) Redirection from central redirectors to the site (“downstream”) Uses a host at CERN which runs set of probes against sites

efi.uchicago.edu ci.uchicago.edu 9 Redirectors - regional and global 9 Service monitor

efi.uchicago.edu ci.uchicago.edu 10 Connectivity matrix Survey revealed complex security dependencies on various voms and xrootd clients found at sites

efi.uchicago.edu ci.uchicago.edu 11 Cost matrix measurements Cost-of-access: (pairwise network links, storage load, etc.)

efi.uchicago.edu ci.uchicago.edu 12 Comparing local to wide area performance Ping time (ms) read time (s) local Each site can check its connectivity and IO performance for copy and direct read

efi.uchicago.edu ci.uchicago.edu 13 Programmatic Hammer Cloud tests Defined a set of Hammer Cloud tests that probe the infrastructure and which collect measures of various data access patterns Setup by Johannes and Federica using Higgs  WW, and a SUSY D3PD analysis – (Root 5.30) HWW analysis code which analyzes NTUP SMWZ – (Root 5.34) HWW analysis code which analyzes NTUP SMWZ – (Root 5.32) SUSY analysis code which analyzes NTUP SUSYSKIM (p1328, p1329)

efi.uchicago.edu ci.uchicago.edu 14 Hammer Cloud testing Pre-placed, site-unique SUSY and Higgs datasets at all sites Realistic, typical analysis templates for SUSY D3PD maker and Higgs analysis New pilot equipped for stage-in or direct access with XrootD Choose ANALY queue, and redirector Submission runs for (both modes): – Phase 1: Local performance – Phase 2: Nearby performance (e.g. within a cloud) – Phase 3: Far-away performance

efi.uchicago.edu ci.uchicago.edu 15 Test datasets SUSY data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 data12_8TeV physics_JetTauEtmiss.merge.NTUP_SUSYSKIM.r4065_p1278_p1328_p1329_tid _00 SMWZ data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 data12_8TeV physics_Muons.merge.NTUP_SMWZ.f479_m1228_p1067_p1141_tid _00 Each of these datasets gets copied to a version with site-specific names in order to so as to automatically test redirection access and to provide a benchmark comparison

efi.uchicago.edu ci.uchicago.edu 16 Test dataset distribution Both sets of test datasets distributed to most sites with small amounts of cleanup left. These datasets will be used to gather reference benchmarks for the various access configuration

efi.uchicago.edu ci.uchicago.edu 17 Queue configurations This turns out to be the hardest part Providing federated XRootD access exposes the full extent of heterogeneity of sites, in terms of schedconfig queue parameters Each site’s “copysetup” parameters seems to differ, and specific parameter settings need to be tried in the Hammer Cloud job submission scripts using –overwriteQueuedata Amazingly, in spite of this there are a good fraction of FAX- functional sites

efi.uchicago.edu ci.uchicago.edu 18 First phase of HC tests: local access HC run – – HWW code with regular SMWZ input, FAX directIO, production version pilots – This is for access to local data, but via direct access xrootd Results: – 26 sites in the test – 16 sites with job successes – 3 sites where no job started/finished during test – (CERN, ROMA1, OU_OCHEP_SWT2) – 1 site does not have input data (GLASGOW) – 1 site blacklisted (FZU) – 1 site used xrdcp instead of directIO (BNL) – 4 sites with 100% failures (EDCF, IHEP, JINR, LANCS) – 4 sites with job successes and failures – (FRASCATI, NAPOLI, LRZ, RAL) – LRZ experienced again xrootd crashes – SLAC jobs finally succeed Johannes, 3 weeks ago

efi.uchicago.edu ci.uchicago.edu 19 HC efficiencies for selected sites

efi.uchicago.edu ci.uchicago.edu 20 First phase of HC tests: local access HC run – – HWW code with regular SMWZ input, FAX directIO, production version pilots – This is for access to local data, but xrdcp to scratch Results: – 28 sites in the test – 17 sites with job successes – 12 sites with actual xrdcp job successes – 7 sites used directIO AGLT2, LRZ, MPPMU, MWT2, SLAC, SWT2_CPB, WUPPERTAL – 3 sites with all job failures IHEP, JINR, SWT2_CPB – 3 sites with no jobs started during test ECDF, CAM, CERN – 1 site with black-listed ANALY queue OU_OCHEP_SWT2 – 2 sites with no input data LANCS, GRID-LAL Johannes, 2 weeks ago

efi.uchicago.edu ci.uchicago.edu 21 Systematic FDR load tests in progress Choose analysis queue & FAX server sites, #jobs, #files Choose access type: copy files or direct ROOT access (10% events, 30 MB client cache) Record timings in CERN Choose analysis queue & FAX server sites, #jobs, #files Choose access type: copy files or direct ROOT access (10% events, 30 MB client cache) Record timings in CERN Adapted WAN framework for specific FDR load tests

efi.uchicago.edu ci.uchicago.edu 22 Systematic FDR load tests in progress Individual job lists + links back to Panda logs Individual job lists + links back to Panda logs Drill down

efi.uchicago.edu ci.uchicago.edu 23 Systematic FDR load tests in progress US cloud results. 10 jobs * 10 SMWZ files ~ 50GB CPU limited Factors affecting spreads: pair-wise network latency, throughput, storage “business”

efi.uchicago.edu ci.uchicago.edu 24 Systematic FDR load tests in progress US cloud results

efi.uchicago.edu ci.uchicago.edu 25 Systematic FDR load tests in progress EU cloud results

efi.uchicago.edu ci.uchicago.edu 26 Systematic FDR load tests in progress EU cloud results destination events/sBNL-ATLASCERN-PRODECDFROMA1QMUL source BNL-ATLAS CERN-PROD ECDF ROMA QMUL MB/sBNL-ATLASCERN-PRODECDFROMA1QMUL source BNL-ATLAS CERN-PROD ECDF ROMA QMUL

efi.uchicago.edu ci.uchicago.edu 27 Controlled site “load” testing Two sites being in IT cloud read by jobs running at CERN

efi.uchicago.edu ci.uchicago.edu 28 Federated traffic seen in the WLCG dashboard

efi.uchicago.edu ci.uchicago.edu 29 Federation traffic Modest levels now will grow when in production Oxford and ECDF switched to xrootd for local traffic Prague users reading from EOS Co-located Tier 3 client  Tier 2 server

efi.uchicago.edu ci.uchicago.edu 30 Studies from Shuwei Ye at BNL Comparing wall and CPU times for access from Tier3 to datasets at BNL, NET2 and RAL (only BNL results shown) Concludes nearby redirector reduces time to process (validates ATLAS redirection model) Usual performance hit for “long reach” datasets over slow networks (to RAL) More systematic studies to come.

efi.uchicago.edu ci.uchicago.edu 31 ATLAS throughputs (from US) FAX traffic a tiny fraction of the total ATLAS throughput (for now)

efi.uchicago.edu ci.uchicago.edu 32 By destination (FTS + FAX)

efi.uchicago.edu ci.uchicago.edu 33 FAX by source cloud

efi.uchicago.edu ci.uchicago.edu 34 FAX by destination cloud

efi.uchicago.edu ci.uchicago.edu 35 Daily FAX transfer UDP collector down

efi.uchicago.edu ci.uchicago.edu 36 Conclusions The FDR has been a good exercise in exposing a number of site & system integration issues – Site specific client differences  limited proxy check not always working – Non-uniform copysetup parameters in schedconfig for sites – Lack of fault checking in the rungen script for read failures – Tweaks necessary to brokering to allow sending jobs to sites missing datasets In spite of this, much progress: – New functionality in the pilot to handle global paths without using dq2-client & forcing python 2.6 compatibility at all sites – First phase of programmatic HC stress testing nearing completion (local site access) – Some FAX accesses from Tier 3s – Test datasets in place Next steps – Programmatic HC stress tests for regional data access (Phase 2) – Address remaining integration issues above & continue to validate sites – Recruit, acquire feedback from early-adopting users – Outsource monitoring services where possible to WLCG, including central UDP collectors, availability probes, etc. – Global and Rucio namespace mapping, dev. of new N2N module – Set a timeframe for an ATLAS requirement of federating xrootd services at sites

efi.uchicago.edu ci.uchicago.edu 37 Thanks A hearty thanks goes out to all the members of the atlas-adc- federated-xrootd group, especially site admins and providers of redirection & monitoring infrastructure Special thanks to Johannes and Federica for preparing HC FAX analysis stress test templates and detailed reporting on test results Simone & Hiro for test dataset distribution & Simone for getting involved in HC testing Paul, John, Jose for pilot and wrapper changes Rob for testing and pushing us all Wei for doggedly tracking down xrootd security issues & other site problems & Andy for getting ATLAS’ required features into xrootd releases

efi.uchicago.edu ci.uchicago.edu 38 EXTRA SLIDES

efi.uchicago.edu ci.uchicago.edu 39 Data federated (1) Top 100 sites used by ATLAS (bold=FAX accessible) * Includes tape, which we do not federate * *

efi.uchicago.edu ci.uchicago.edu 40 Data federated (2) Top 100 sites used by ATLAS (bold=FAX accessible) GRIF-LAL IN2P3-LAPP

efi.uchicago.edu ci.uchicago.edu 41 Data federated (3) Top 100 sites used by ATLAS (bold=FAX accessible)

efi.uchicago.edu ci.uchicago.edu 42 Full SMWZ DATA+MC coverage (>96% of total 694 datasets) Average number of replicas ~2.5

efi.uchicago.edu ci.uchicago.edu 43 SkimSlimService FAX killer app. Free physicists from dealing with big data Free IT professionals from dealing with physicists, let them deal with what they do the best - big data. Efficiently use available resources (over the pledge, OSG, ANALY queues, EC2)

efi.uchicago.edu ci.uchicago.edu How it works? Use FAX to access all the data without overhead of staging. Use optimally situated replicas. (possible optimization - production D3PDs preplaced at just several sites, maybe even just one) Physicists request skim/slim through a web service. Could add a few variables in flight. Produced datasets registered in the name of requester. Delivered to a site requested. As all of the data is available in FAX, one can do skims of not only production D3PDs but of any flat ntuple, or multi-pass SkimSlims.

efi.uchicago.edu ci.uchicago.edu SSS - QOS Timely result is a paramount! Several levels of service depending on size of input and output data and importance: Example: 1.<1TB (input+output) - 2 hours service – this one is essential, as only in this case people will skim/slim to only variables they need without thinking of – “what if I forget something I’ll need” TB – 6 hours TB – 24 hours 4.Extra fast delivery: at EC2 but comes with a sticker tag

efi.uchicago.edu ci.uchicago.edu SkimSlimService 1 We have no dedicated resources for this I used UC3 but any queue that has cvmfs will suffice. 2 Modified version of filter-and-merge.py used. 3 Currently under my name as I don’t have production role. Web site at CERN gets requests, shows their status Web site at CERN gets requests, shows their status Handmade server 1 receives web queries, collects info on datasets, files, trees, branches Handmade server 1 receives web queries, collects info on datasets, files, trees, branches Executor at UC3 1 gets tasks from the DB, creates, submits condor SkimSlim jobs 2 makes and registers resulting DS 3 Executor at UC3 1 gets tasks from the DB, creates, submits condor SkimSlim jobs 2 makes and registers resulting DS 3 OracleDB at CERN Stores requests, splits them in tasks, serves as a backend for the web site OracleDB at CERN Stores requests, splits them in tasks, serves as a backend for the web site

efi.uchicago.edu ci.uchicago.edu