Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.

Slides:



Advertisements
Similar presentations
ATLAS T1/T2 Name Space Issue with Federated Storage Hironori Ito Brookhaven National Laboratory.
Advertisements

Duke and ANL ASC Tier 3 (stand alone Tier 3’s) Doug Benjamin Duke University.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Data management at T3s Hironori Ito Brookhaven National Laboratory.
Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York
Xrootd, XrootdFS and BeStMan Wei Yang US ATALS Tier 3 meeting, ANL 1.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Redirector xrootd proxy mgr Redirector xrootd proxy mgr Xrd proxy data server N2N Xrd proxy data server N2N Global Redirector Client Backend Xrootd storage.
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
CERN IT Department CH-1211 Geneva 23 Switzerland GT WG on Storage Federations First introduction Fabrizio Furano
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Performance Tests of DPM Sites for CMS AAA Federica Fanzago on behalf of the AAA team.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Bestman & Xrootd Storage System at SLAC Wei Yang Andy Hanushevsky Alex Sim Junmin Gu.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Introduce Caching Technologies using Xrootd Wei Yang 10/28/14ATLAS TIM 2014 Univ. Chicago1.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting,
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Solutions for WAN data access: xrootd and NFSv4.1 Andrea Sciabà.
DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Xrootd SE deployment at GridKa WLCG.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Federating Data in the ALICE Experiment
a brief summary for users
Jean-Philippe Baud, IT-GD, CERN November 2007
DPM at ATLAS sites and testbeds in Italy
Xiaomei Zhang CMS IHEP Group Meeting December
Database Replication and Monitoring
Global Data Access – View from the Tier 2
ATLAS Use and Experience of FTS
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
dCache “Intro” a layperson perspective Frank Würthwein UCSD
David Adams Brookhaven National Laboratory September 28, 2006
Report PROOF session ALICE Offline FAIR Grid Workshop #1
PanDA in a Federated Environment
Artem Petrosyan (JINR), Danila Oleynik (JINR), Julia Andreeva (CERN)
Processes The most important processes used in Web-based systems and their internal organization.
Brookhaven National Laboratory Storage service Group Hironori Ito
Malwarebytes Installation Issues Number Facing error with Malwarebytes software is not something unusual as most of the users use to face.
Summary of the dCache workshop
Presentation transcript:

Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011

What is Federated ATLAS Xrootd (FAX) (Wei’s limited understanding) Data access via a single entrance (using Xrootd’s redirection tech) Read data directly over the WAN Read data via local storage cache Work with dCache, Xrootd, and Posix storage systems Started at the Amsterdam WLCG workshop in 2009 Involves experts from US Tier 1, Tier 2s, and many Tier 3s Workshop at Chicago, Sept 12-13 Bi-weekly phone meeting Twiki pages for T1/T2/T3 instructions, meeting notes, site configuration repository Support from ADC on dq2 tools, Dubna on monitoring Software are ready and provides needed functions, almost

Use Cases Read a dataset directly from WAN Figure out the content of a dataset and the file pathes using dq2-ls Just read them from global redirector Pulling data from FAX Run jobs from laptops, desktops Permanent “diskless” sites, or “diskless” mode during site maintenance Bring data to local Tier 3 Xrootd disk (cache) Use dq2-get/xprep to bring dataset to local disk cache, directly or via Xrootd’s FRM, or Just read files against local Tier 3 Xrootd storage, let Xrootd’ FRM fetch the data from FAX Users sharing non-DDM data between sites Old untouched files are purged when space is needed Remove the need to manage the space

How FAX Works Users Bringing in Tier 1s and Tier 2s requires more work. They are big data sources Tier1 / Tier 2 Disk Users Global Redirector Read over WAN Read via local cache Tier 3s Disk cache Common/Unique file path: Global Name Space Tier 3 federation functionally easier

Tier 3s FAX Components FAX User FRM Xrootd dq2-get –xprep dataset FAX List: File + GUID + Checksum List of Files User FRM Xrootd Missing a File: File ? GUID & Checksum List of Files Disk Storage Disk cache /atlas/dq2 disk cache managed by FRM /atlas/local locally managed space

Tier 1 and Tier 2 Components Tier 1 and Tier 2s Global Redirector Xrootd Tier 3s LFC N2N Global Name Space + GUID Site specific file path N2N is difficult to implement And maybe a performance Bottleneck because of LFC Storage

Some stage-in test result The three failed ones exist in the federation. Later xrdcp from global redirector fetched them successfully. This seems to suggest that a single retry will likely avoid the failures Low failure rate indicates reasonably solid architecture (when not under stress !) After dq2-get/xprep, reading those files will be put on hold by Xrootd until they are ready. The FRM stage-in script is trimmed to the absolute minimum, effectively just one xrdcp All files exists in SWT2_CPB's LFC, and can be found by current N2N's heuristic algorithms.

What are still needed Global Name Space DQ2 client tools support Global storage path schema to locate file at Tier 3 without LFC Supported by ADC, see DQ2 support below Heuristic algorithms to locate files in Tier 1/2s via LFC, can be extended to search LFC using GUID 0.3 second penalty seems acceptable DQ2 client tools support dq2-get can trigger Tier 3 Xrootd storage to fetch data from FAX Instructions are documented dq2-ls can print dataset contents in global name space Can also be used to check against local storage for missing files The global path isn’t quite right, working with ADC Come with other extra, unwanted verification features

Documentation and Configuration repository Xrootd Long waited version 3.1.0 in rc3 (target release date 10/17) Name to Name (N2N) translation works for Proxy Function needed at Tier 1/2s. Same for the next one N2N translation works for checksum For Xrootd storage, a chain of checksum events works, from user, to proxy, to backend server A plug-in for dCache storage will be work out Not an issue for GPFS, Lustre (posix) Dual NIC binding in certain security module Info users about FRM stage-in failures --- non-ideal methods X509: hopefully by the end of this year? Documentation and Configuration repository Comprehensive documentation for Tier 3 to setup Xrootd And Tier 1/2s to setup Xrootd in various modes with N2N Site xrootd, proxy and FRM configurations Working on standardizing the FRM script

CMSd for dCache Xrootd door A special cmsd to work directly with dCache Xrootd door A "authorization" plugin for the dCache Xrootd door to convert GFN->LFN A caching mechanism as N2N cache Work with Proof cluster Panda Analysis Queue upon FAX Just started looking into this: ANALY_MWT2_FAX-pbs No LFC, can Panda brokerage handle this? Many possible operational modes, including complete “diskless”. Monitoring Had a discussion with Dubna group on monitoring metrics A daemon to collect local Xrootd summary info and feed to Ganglia An application works with xrootd summary stream: need to decide how to present fetched data