Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 29-October-09 ATLAS Tier 3 Meeting at ANL

Slides:



Advertisements
Similar presentations
Potential Data Access Architectures using xrootd OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC
Advertisements

Xrootd Roadmap Atlas Tier 3 Meeting University of Chicago September 12-13, 2011 Andrew Hanushevsky, SLAC
Xrootd Update OSG All Hands Meeting University of Nebraska March 19-23, 2012 Andrew Hanushevsky, SLAC
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Scalla Back Through The Future Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 8-April-10
The Next Generation Root File Server Andrew Hanushevsky Stanford Linear Accelerator Center 27-September-2004
Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 19-August-2009 Atlas Tier 2/3 Meeting
Xrootd Demonstrator Infrastructure OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC
Scalla/xrootd Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 19-May-09 ANL Tier3(g,w) Meeting.
Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York
File and Object Replication in Data Grids Chin-Yi Tsai.
Xrootd, XrootdFS and BeStMan Wei Yang US ATALS Tier 3 meeting, ANL 1.
Scalla/xrootd Introduction Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 6-April-09 ATLAS Western Tier 2 User’s Forum.
July-2008Fabrizio Furano - The Scalla suite and the Xrootd1 cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd Client Client A small 2-level cluster. Can.
SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,
11-July-2008Fabrizio Furano - Data access and Storage: new directions1.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Xrootd Monitoring Atlas Software Week CERN November 27 – December 3, 2010 Andrew Hanushevsky, SLAC.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
July-2008Fabrizio Furano - The Scalla suite and the Xrootd1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Xrootd Update Andrew Hanushevsky Stanford Linear Accelerator Center 15-Feb-05
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
02-June-2008Fabrizio Furano - Data access and Storage: new directions1.
Accelerating Debugging In A Highly Distributed Environment CHEP 2015 OIST Okinawa, Japan April 28, 2015 Andrew Hanushevsky, SLAC
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC), Fabrizio Furano (INFN/Padova), Gerardo Ganis.
Xrootd Present & Future The Drama Continues Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University HEPiX 13-October-05
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
Scalla/xrootd Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 08-June-10 ANL Tier3 Meeting.
Scalla Advancements xrootd /cmsd (f.k.a. olbd) Fabrizio Furano CERN – IT/PSS Andrew Hanushevsky Stanford Linear Accelerator Center US Atlas Tier 2/3 Workshop.
Scalla Authorization xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory CERN Seminar 10-November-08
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Xrootd Proxy Service Andrew Hanushevsky Heinz Stockinger Stanford Linear Accelerator Center SAG September-04
Scalla In’s & Out’s xrootdcmsd xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory OSG Administrator’s Work Shop Stanford University/SLAC.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
SRM Space Tokens Scalla/xrootd Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-May-08
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Scalla As a Full-Fledged LHC Grid SE Wei Yang, SLAC Andrew Hanushevsky, SLAC Alex Sims, LBNL Fabrizio Furano, CERN SLAC National Accelerator Laboratory.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
11-June-2008Fabrizio Furano - Data access and Storage: new directions1.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Bestman & Xrootd Storage System at SLAC Wei Yang Andy Hanushevsky Alex Sim Junmin Gu.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
09-Apr-2008Fabrizio Furano - Scalla/xrootd status and features1.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
A. Sim, CRD, L B N L 1 Production Data Management Workshop, Mar. 3, 2009 BeStMan and Xrootd Alex Sim Scientific Data Management Research Group Computational.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Xrootd SE deployment at GridKa WLCG.
a brief summary for users
Blueprint of Persistent Infrastructure as a Service
SLAC National Accelerator Laboratory
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
Brookhaven National Laboratory Storage service Group Hironori Ito
Scalla/XRootd Advancements
Support for ”interactive batch”
Presentation transcript:

Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 29-October-09 ATLAS Tier 3 Meeting at ANL

2 Outline System Overview What’s it made of and how it works Opportunistic Clustering Batch nodes as data providers Expansive Clustering Federation for speed and fault tolerance The Virtual Mass Storage System Fullness vs Simplification

3 Full Scalla/xrootd Overview p xrootd protocol for random I/O PaPaPaPa Grid protocol for sequential bulk I/O PgPgPgPg xrootd cluster SRMmanagesGrid-SEtransfers GridFTPftpd Supports >200K data servers MachineMachineMachine FUSE SRMMachine XXXCCCXNN X xrootd C cmsd N cnsd redirector Minimum for a cluster Needed for SRM support BeStMan Globus ftpd with or without xrootdFS xrootdFS

4 The Components xrootd Provides actual data access cmsd Glues multiple xrootd’s into a cluster cnsd Glues multiple name spaces into one name space BeStMan Provides SRM v2+ interface and functions FUSE Exports xrootd as a file system for BeStMan GridFTP Grid data access either via FUSE or POSIX Preload Library This might not be needed for typical Tier 3 sites!

5 Getting to xrootd hosted data Via the root framework Automatic when files named root://.... Manually, use TXNetFile() object Note: identical TFile() object will not work with xrootd! xrdcp The native copy command POSIX preload library Allows POSIX compliant applications to use xrootd gridFTP BeStMan (SRM add-on) srmcp for srm-to-srm copies FUSE Linux only: xrootd as a mounted file system Native Set Simple Add Intensive Full Grid Set Intensive Full Grid Set

6 Cluster Maneuvering DataFiles Application Linux Client Machine Linux Server Machine B DataFiles open(“/foo”); xroot Client Linux Server Machine A xroot Server Linux Server Machine R xroot Server /foo Redirector 1 Who has /foo? 2 I do! 3 Try B 4 open(“/foo”); xrdcp root://R//foo /tmp The xrootd system does all of these steps automatically without application (user) intervention!

7 Corresponding Configuration File # # General section that applies to all servers # all.export /atlas if redirector.slac.stanford.edu all.role manager else all.role server fi all.manager redirector.slac.stanford.edu 3121 # Cluster management specific configuration # cms.allow *.slac.stanford.edu # xrootd specific configuration # xrootd.fslib /opt/xrootd/prod/lib/libXrdOfs.so xrootd.port 1094

8 File Discovery Considerations The redirector does not have a catalog of files It always asks each server, and Caches the answers in memory for a “while” So, it won’t ask again when asked about a past lookup Allows real-time configuration changes Clients never see the disruption Does have some side-effects The lookup takes less than a millisecond when files exist Much longer when a requested file does not exist!exist!

9 Handling Missing Files DataFiles Application Linux Client Machine Linux Server Machine B DataFiles open(“/foo”); xroot Client Linux Server Machine A xroot Server Linux Server Machine R xroot Server /foo Redirector 1 Who has /foo? 2Nope!5 File deemed not to exist if there is no response after 5 seconds! xrdcp root://R//foo /tmp

10 Missing File Considerations System optimized for “file exists” case! This is the classic analysis situation Penalty for going after missing files Aren’t new files, by definition, missing? Yes, but that involves writing data! The system is optimized for reading data So, creating a new file will suffer a 5 second delay Can minimize the delay by using the xprep command Primes the redirector’s file memory cache ahead of time

11 Why Do It This Way? Simple, lightweight, and ultra-scalable Ideal for opportunistic clustering E.g., leveraging batch worker disk space Ideal fit with PROOF analysis Has the R 3 property (Real-Time Reality Representation) Allows for ad hoc changes Add and remove servers and files without fussing Restart anything in any order at any time Ideal for expansive clustering E.g., cluster federation & globalization Virtual mass storage systems and torrent transfers

Clustered Storage System Leveraging Batch Node Disks Opportunistic Clustering Xrootd extremely efficient of machine resources Ultra low CPU usage with a memory footprint 20 ≈ 80MB Ideal to cluster just about anything 12 cmsd xrootd job cmsd xrootd cmsd xrootd Batch Nodes File Servers Redirector

Opportunistic Clustering Caveats Using batch worker node storage is problematic Storage services must compete with actual batch jobs At best, may lead to highly variable response time At worst, may lead to erroneous redirector responses Additional tuning will be required Normally need to renice the cmsd and xrootd As root: renice –n -10 –p cmsd_pid As root: renice –n -5 –p xroot_pid You must not overload the batch worker node Especially true if exporting local work space 13

Opportunistic Clustering & PROOF Parallel Root Facility layered on xrootd Good architecture for “map/reduce” processing Batch-nodes provide PROOF infrastructure Reserve and use for interactive PROOF Batch scheduler must have a drain/reserve feature Use nodes as a parallel batch facility Good for co-locating application with data Use nodes as data providers for other purposes 14ATLAS Tier 3 Meeting 29-Oct-09

PROOF Analysis Results Sergey Panitkin Akira’s talk about “Panda oriented” ROOT analysis comparison at the Jamboree indico.cern.ch/getFile.py/access?contribId=10&sessionId=0&resId=0&materialId=slides&confId=

Expansive Clustering Xrootd can create ad hoc cross domain clusters Good for easily federating multiple sites This is the ALICE model of data management Provides a mechanism for “regional” data sharing Get missing data from close by before using dq2get Architecture allows this to be automated & demand driven This implements a Virtual Mass Storage System 16ATLAS Tier 3 Meeting 29-Oct-09

17 Virtual Mass Storage System cmsd xrootd UTA cmsd xrootd UOM cmsd xrootd BNL all.role meta manager all.manager meta atlas.bnl.gov:1312root://atlas.bnl.gov/includes SLAC, UOM, UTA xroot clusters Meta Managers can be geographically replicated! cmsd xrootd SLAC all.manager meta atlas.bnl.gov:1312 all.role manager

Fetch missing files in a timely manner Revert to dq2get when file not in regional cluster Sites can participate in an ad hoc manner The cluster manager sorts out what’s available Can use R/T WAN access when appropriate Can significantly increase WAN xfer rate Using torrent-style copying 18ATLAS Tier 3 Meeting 29-Oct-09 What’s Good About This?

cmsd xrootd SLAC Cluster 19 Torrents & Federated Clusters cmsd xrootd UTA Cluster cmsd xrootd UOM Cluster cmsd xrootd BNL all.role meta manager all.manager meta atlas.bnl.gov:1312 Meta Managers can be geographically replicated! all.manager meta atlas.bnl.gov:1312 all.role manager xrdcp –x xroot://atlas.bnl.gov//myfile /tmp /myfile

20 Improved WAN Transfer The xrootd already supports parallel TCP paths Significant improvement in WAN transfer rate Specified as xrdcp –S num Xtreme copy mode uses multiple data sources Specified as xrdcp –x Transfers to CERN; examples: 1 source (.de): 12MB/sec ( 1 stream) 1 source (.us): 19MB/sec ( 15 streams) 4 sources (3 x.de +.ru): 27MB/sec ( 1 stream each) 4 sources + || streams:42MB/Sec (15 streams each) 5 sources (3 x.de +.it +.ro): 54MB/Sec (15 streams each)

21 Expansive Clustering Caveats Federation & Globalization are easy if.... Federated servers are not blocked by a firewall No ALICE xroot servers are behind a firewall There are alternatives.... Implement firewall exceptions Need to fix all server ports Use proxy mechanisms Easy for some services, more difficult for others All of these have been tried in various forms Site’s specific situation dictates appropriate approach

22 Summary Monitoring Needed information in almost any setting Xrootd can auto-report summary statistics Specify xrd.report configuration directive Data sent to one or two locations Use provided mpxstats as the feeder program Multiplexes streams and parses xml into key-value pairs Pair it with any existing monitoring framework Ganglia, GRIS, Nagios, MonALISA, and perhaps more

Summary Monitoring Setup 23 DataServers Monitoring Host mpxstats xrd.report monhost:1999 all every 15s monhost:1999 ganglia

24 Putting It All Together xrootd cmsd xrootd cmsd Data Nodes Manager Node SRM Node BestMangridFTP xrootd xrootdFS xrootd Basic xrootd Cluster + xrootd Name Space xrootd = LHC Grid Access cnsd + SRM Node xrootdFS (BestMan, xrootdFS, gridFTP) + cnsd

Can’t We Simplify This? The cnsd present for XrootdFS support Provide composite name space for “ls” command FUSE present for XrootdFS support XrootdFS & FUSE for BeSTMan support BeSTMan for SRM support SRM for push-type grid data management dq2get is a pull function and only needs gridFTP Answer: Y YY Yes! This can be simplified. 25ATLAS Tier 3 Meeting 29-Oct-09

26 Tearing It All Apart xrootd cmsd xrootd cmsd Data Nodes Manager Node SRM Node BestMangridFTP xrootd xrootdFS cnsd dq2get Node dq2get PosixPreloadLibrary xrootd Basic xrootd Cluster = Simple Grid Access dq2get Node (gridFTP + POSIX Preload Lib) + Even more effective if using a VMSS

27 In Conclusion... Xrootd is a lightweight data access system Suitable for resource constrained environments Human as well as hardware Geared specifically for efficient data analysis Supports various clustering models E.g., PROOF, batch node clustering and WAN clustering Has potential to greatly simplify Tier 3 deployments Distributed as part of the OSG VDT Also part of the CERN root distribution Visit

28 Acknowledgements Software Contributors Alice: Derek Feichtinger CERN: Fabrizio Furano, Andreas Peters Fermi/GLAST: Tony Johnson (Java) Root: Gerri Ganis, Beterand Bellenet, Fons Rademakers SLAC: Tofigh Azemoon, Jacek Becla, Andrew Hanushevsky, Wilko Kroeger BeStMan LBNL: Alex Sim, Junmin Gu, Vijaya Natarajan (BeStMan team) Operational Collaborators BNL, CERN, FZK, IN2P3, RAL, SLAC, UVIC, UTA Partial Funding US Department of Energy Contract DE-AC02-76SF00515 with Stanford University