Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
Efi.uchicago.edu ci.uchicago.edu FAX Dress Rehearsal Status Report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Efi.uchicago.edu ci.uchicago.edu Status of the FAX federation Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 /
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Federating Data in the ALICE Experiment
WLCG IPv6 deployment strategy
Global Data Access – View from the Tier 2
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
FDR readiness & testing plan
Monitoring Of XRootD Federation
Presentation transcript:

efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Facilities UC Santa Cruz November 13, 2012

efi.uchicago.edu ci.uchicago.edu 2 Progress Ongoing issues development & operations FAX in the pilot.. First use case working now – Later  Panda brokering Programmatic WAN testing FAX as supporting portable, lightweight applications on opportunistic resources Most of these slides are from WLCG Storage Federations working group report & from ATLAS SW week (esp. from Paul)

efi.uchicago.edu ci.uchicago.edu 3 Recall our goals Common ATLAS namespace across all storage sites, accessible from anywhere Easy to use, homogeneous access to data Identified initial use cases – Failover from stage-in problems with local SE o Now implemented, in production on several sites – Gain access to more CPUs using WAN direct read access o Allow brokering to Tier 2s with partial datasets o Opportunistic resources without local ATLAS storage – Use as caching mechanism at sites to reduce local data management tasks o Eliminate cataloging, consistency checking, deletion services WAN data access group formed in ATLAS to determine use cases & requirements on infrastructure

efi.uchicago.edu ci.uchicago.edu 4 Development & integration.. operations

efi.uchicago.edu ci.uchicago.edu 5 Federated access, federated site deployment Model has been to work with ATLAS contacts from clouds US: the Tier1, all Tier2 centers – separately, a number of off-grid Tier 3 sites UK: four Tier 2 sites, working on N2N for Castor at RAL DE: three Tier 2 sites plus a CZ site, gearing up for more RU: two Tier 2 sites federated IT: deploying now! EOS – but with concerns about IO load from WAN accesses Network of redirectors and peering established, gaining practical operational experience

efi.uchicago.edu ci.uchicago.edu 6 Sites are used in developing the federation Sites deploy a tandem of xrootd services – As easy as an Apache web server, in principle But the software, while a proven storage technology, requires additional development to become a federating technology: – Experiment-site specific file lookup service (i.e. N2N) – Customizations for backend storage types – Various 3 rd party wide-area monitoring services (UCSD collector, ActiveMQ, Dashboards) – Security for read-only access: missing initially; still need gsi proxy validation – Standardizing monitoring metrics  further development – Status monitoring and alert systems for operations (RSV/Nagios) – New WLCG service definition (in GOCDB, OIM), similar to perfSONAR or Squid – Integration into ATLAS information system (AGIS) – Development in production & analysis systems: pilot & site movers – Accounting & caching will require more development, integration, testing,.. Good news many of these obstacles have been addressed in the R&D phase, and by CMS and ALICE before us. We benefit from vigorous developments by many groups working on various aspects of federation (AAA, XrootD & dCache teams, Dashboard…)

efi.uchicago.edu ci.uchicago.edu 7 SSB and WLCG transfer dashboard with cost matrix decision algorithm Xrootd instabilities seen in the UK cloud – perhaps related to N2N blocking at LFC FAX extensions to ATLAS information system AGIS Need new monitoring f-stream at all sites Stand-alone cmsd for dcache sites xrootd.org repository & EPEL policy (site guidance, esp. DPM) Several dCache specific issues, and many releases under test ( , 2.2.4,…); f-stream, proper stat response and checksum support from dcache- xrootd doors Moving US sites to ro LFC Starting federating sites in Italy SLC6 issues – X509 and voms attribute checking Will update UDP collector service with f-stream format when available Functional testing probes & publishing into ActiveMQ and dashboards Monitoring will have to be validated at all stages FAX-enabled pilot site mover in production at several Tier 2s Documentation for users & site admins On-going work & issues

efi.uchicago.edu ci.uchicago.edu 8 Functional status & cost performance There are many more components as discussed at the Lyon storage federations workshop in September

efi.uchicago.edu ci.uchicago.edu 9 FAX dashboard – sites transfer matrix

efi.uchicago.edu ci.uchicago.edu 10 WLCG We have an on-going set of meetings with two WLCG working groups – WLCG Federated Data Storage (F. Furano Chair) o Explores issues generally and assesses approach taken by each of the LHC experiments – WLCG Xrootd task force (D. Giordano) o New group forming to coordinate deployment and operations o Will seek to engage Grid infrastructure providing groups (EGI/EMI, OSG) for support Both of these will help bring FAX into normal production operations

efi.uchicago.edu ci.uchicago.edu 11 In the pilot, in Panda

efi.uchicago.edu ci.uchicago.edu 12 Pilot capabilities – from SW week

efi.uchicago.edu ci.uchicago.edu 13 Pilot capabilities, cont.

efi.uchicago.edu ci.uchicago.edu 14 Pilot capabilities, cont.

efi.uchicago.edu ci.uchicago.edu 15 WAN performance

efi.uchicago.edu ci.uchicago.edu 16 Testing FAX US ATLAS UC Santa Cruz HammerCloud sites, doors, ports, roles, protocols, paths SVN Test code Sets release Datasets ORACLE DB Results ping, copy time, read times Results ping, copy time, read times WEB site SSB

efi.uchicago.edu ci.uchicago.edu 17 HC based FAX tests HC submits 1 job/day to all of the “client” nodes. Client node is the one using the data. It is an ANALY queue – All the “server” sites have one and the same dataset. Server sites are the ones delivering data. Each job, each half an hour, in parallel: – Pings of all of the “server” sites. – Copies a file from a site (xrdcp/dccp) – Reads the file from a root script – Uploads all the results to Oracle DB at CERN Result are shown at: Results are also given in JSON format to SSB: atlas- ssb.cern.ch/dashboard/request.py/siteview#currentView=Net work+Measurements&highlight=falsehttp://dashb- atlas- ssb.cern.ch/dashboard/request.py/siteview#currentView=Net work+Measurements&highlight=false

efi.uchicago.edu ci.uchicago.edu 18 Testing details Test file – standard ATLAS 760 MB D3PD with 5k branches and 13kevents Measurements – “direct copy”: time (in seconds) for xrdcp to site – “read time” time required to read 10% randomly selected consecutive events using default TTreeCache of 30 MB

efi.uchicago.edu ci.uchicago.edu 19 For jobs at MWT2 (client location) ping Copy time

efi.uchicago.edu ci.uchicago.edu 20 For jobs at MWT2 (client location) ping Read time

efi.uchicago.edu ci.uchicago.edu 21 For jobs at MWT2 (client location) ping CPU eff.

efi.uchicago.edu ci.uchicago.edu 22 Jobs at SWT2_CPB (client location) ping Copy time

efi.uchicago.edu ci.uchicago.edu 23 Jobs at SWT2_CPB (client location) ping Read time

efi.uchicago.edu ci.uchicago.edu 24 At-large use of FAX

efi.uchicago.edu ci.uchicago.edu 25 How to use it? Part - I Datasets should be registered – All the grid produced datasets are automatically registered independently if these are part of official production or simply result of a user's job. – If files are not registered it is trivial to do so. Very detailed description how to do this is given Have your ATLAS grid certificate – Make a proxy setup DQ2 Make sure your code uses TTreeCache! source /afs/cern.ch/project/gd/LCG-share/current_3.2/etc/profile.d/grid_env.sh voms-proxy-init -voms atlas source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.zsh CVMFS version

efi.uchicago.edu ci.uchicago.edu 26 CVMFS environment setup Setup environment Make a proxy setup DQ2 export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh’ export ALRB_localConfigDir=$HOME/localConfig setupATLAS localSetupGLite voms-proxy-init -voms atlas localSetupDQ2Client

efi.uchicago.edu ci.uchicago.edu 27 How to use it? Part - II Check that datasets exist at one of the federated sites Find gLFN's of input datasets – Find closest redirector to compute site. List is here: – Do – make a file with the list of all the gLFN’s export STORAGEPREFIX=root://closestRedirector:port/ dq2-list-files -p data12_8TeV physics_Muons.recon.DESD_ZMUMU.f437_m716_f437 > my_list_of_gLFNS.txt dq2-ls –r myDataSetName

efi.uchicago.edu ci.uchicago.edu 28 How to use it? Part - III From ROOT From prun Instead of giving --inDS myDataset option, provide it with --pfnList my_list_of_gLFNS.txt copy files locally TFile *f = TFile::Open("root://myRedirector:port//atlas/dq2/user/ilijav/HCtest/user.i lijav.HCtest.1/group.test.hc.NTUP_SMWZ.root"); xrdcp root://xrddc.mwt2.org:1096//atlas/dq2/user/ilijav/HCtest/user.ilijav.HCtes t.1/group.test.hc.NTUP_SMWZ.root /tmp/myLocalCopy.root

efi.uchicago.edu ci.uchicago.edu 29 Conclusions FAX usability inches forward – but growing pains due to: – Standardizing metrics – dCache components – Many more sites and SEs First pilots bits are in production at a few sites Co-located Tier3 users using FAX doors for LOCALGROUPDISK analysis Offers storage access from opportunistic or cloud Offers “diskless” use case which would be very attractive to sites for storage admin purposes How to robustify to attract users? – Continue to get feedback from co-located T3 – Same answers as always: demonstrators?