Scalla Advancements xrootd /cmsd (f.k.a. olbd) Fabrizio Furano CERN – IT/PSS Andrew Hanushevsky Stanford Linear Accelerator Center US Atlas Tier 2/3 Workshop.

Slides:



Advertisements
Similar presentations
Potential Data Access Architectures using xrootd OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC
Advertisements

Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Distributed Xrootd Derek Weitzel & Brian Bockelman.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
1 The Google File System Reporter: You-Wei Zhang.
Scalla Back Through The Future Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 8-April-10
The Next Generation Root File Server Andrew Hanushevsky Stanford Linear Accelerator Center 27-September-2004
Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 19-August-2009 Atlas Tier 2/3 Meeting
Xrootd Demonstrator Infrastructure OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC
Scalla/xrootd Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 19-May-09 ANL Tier3(g,w) Meeting.
Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 29-October-09 ATLAS Tier 3 Meeting at ANL
March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.
Xrootd, XrootdFS and BeStMan Wei Yang US ATALS Tier 3 meeting, ANL 1.
Scalla/xrootd Introduction Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 6-April-09 ATLAS Western Tier 2 User’s Forum.
July-2008Fabrizio Furano - The Scalla suite and the Xrootd1 cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd Client Client A small 2-level cluster. Can.
11-July-2008Fabrizio Furano - Data access and Storage: new directions1.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Xrootd Monitoring Atlas Software Week CERN November 27 – December 3, 2010 Andrew Hanushevsky, SLAC.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
July-2008Fabrizio Furano - The Scalla suite and the Xrootd1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
 CASTORFS web page - CASTOR web site - FUSE web site -
Xrootd Update Andrew Hanushevsky Stanford Linear Accelerator Center 15-Feb-05
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
02-June-2008Fabrizio Furano - Data access and Storage: new directions1.
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC), Fabrizio Furano (INFN/Padova), Gerardo Ganis.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Xrootd Present & Future The Drama Continues Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University HEPiX 13-October-05
Scalla/xrootd Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 08-June-10 ANL Tier3 Meeting.
XRootD & ROOT Considered Root Workshop Saas-Fee September 15-18, 2015 Andrew Hanushevsky, SLAC
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Scalla Authorization xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory CERN Seminar 10-November-08
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Xrootd Proxy Service Andrew Hanushevsky Heinz Stockinger Stanford Linear Accelerator Center SAG September-04
Scalla In’s & Out’s xrootdcmsd xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory OSG Administrator’s Work Shop Stanford University/SLAC.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
SRM Space Tokens Scalla/xrootd Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-May-08
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Scalla As a Full-Fledged LHC Grid SE Wei Yang, SLAC Andrew Hanushevsky, SLAC Alex Sims, LBNL Fabrizio Furano, CERN SLAC National Accelerator Laboratory.
Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop Castor2/xrootd.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
11-June-2008Fabrizio Furano - Data access and Storage: new directions1.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
09-Apr-2008Fabrizio Furano - Scalla/xrootd status and features1.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop
PetaCache: Data Access Unleashed Tofigh Azemoon, Jacek Becla, Chuck Boeheim, Andy Hanushevsky, David Leith, Randy Melen, Richard P. Mount, Teela Pulliam,
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
An Introduction to GPFS
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
Storage Architecture for Tier 2 Sites Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 8-May-07 INFN Tier 2 Workshop
a brief summary for users
Dynamic Extension of the INFN Tier-1 on external resources
Xrootd explained Cooperation among ALICE SEs
SLAC National Accelerator Laboratory
Data access and Storage
Ákos Frohner EGEE'08 September 2008
Scalla/XRootd Advancements
Support for ”interactive batch”
Presentation transcript:

Scalla Advancements xrootd /cmsd (f.k.a. olbd) Fabrizio Furano CERN – IT/PSS Andrew Hanushevsky Stanford Linear Accelerator Center US Atlas Tier 2/3 Workshop Stanford University/SLAC 28-November-07

28-November-072: Outline Current Elaborations Composite Cluster Name Space POSIX file system access via FUSE+xrootd New Developments cmsd Cluster Management Service (cmsd) Cluster globalization Conclusion

28-November-073: The Distributed Name Space Scalla Scalla implements a distributed name space Very scalable and efficient Sufficient for data analysis Some users and applications (e.g., SRM) rely on a centralized name space cnsd Spurred the development of a Composite Name Space (cnsd) add-on Simplest solution with the least entanglement

28-November-074: Composite Cluster Name Space Redirector Name Space DataServers Manager cnsd ofs.notify closew, create, mkdir, mv, rm, rmdir |/opt/xrootd/etc/cnsd open/trunc mkdir mv rm rmdir opendir() refers to the directory structure maintained by xrootd:2094 Client

28-November-075: cnsd cnsd Specifics Servers direct name space actions to common xrootd(s) Common xrootd maintains composite name space Typically, these run on the redirector nodes Name space replicated in the file system No external database needed Small disk footprint Deployed at SLAC for Atlas Needs synchronization utilities, more documentation, and packaging See Wei Yang for details Similar mySQL based system being developed by CERN/Atlas Annabelle Leung

28-November-076: Data System vs File System Scalla Scalla is a data access system Some users/applications want file system semantics More transparent but many times less scalable For years users have asked …. Scalla Can Scalla create a file system experience? The answer is …. It can to a degree that may be good enough FUSE We relied on FUSE to show how

28-November-077: FUSE What is FUSE FUse Filesystem in Userspace Used to implement a file system in a user space program Linux 2.4 and 2.6 only Refer to FUSE Can use FUSE to provide xrootd access Looks like a mounted file system SLAC and FZK have xrootd-based versions of this Wei Yang at SLAC Tested and practically fully functional Andreas Petzold at FZK In alpha test, not fully functional yet

28-November-078: XrootdFS (Linux/FUSE/Xrootd) Redirector xrootd:1094 Name Space xrootd:2094RedirectorHost ClientHost opendir create mkdir mv rm rmdir xrootd POSIX Client Kernel User Space Appl POSIX File System Interface FUSE FUSE/Xroot Interface Should run cnsd on servers to capture non-FUSE events

28-November-079: XrootdFS Performance Sun V20z RHEL4 2x 2.2Ghz AMD Opteron 4GB RAM 1Gbit/sec Ethernet Client VA Linux 1220 RHEL3 2x 866Mhz Pentium 3 1GB RAM 100Mbit/sec Ethernet Unix dd, globus-url-copy & uberftp 5-7MB/sec with 128KB I/O block size Unix cp 0.9MB/sec with 4KB I/O block size Conclusion: Better for some things than others.

28-November-0710: Why XrootdFS? Makes some things much simpler Most SRM implementations run transparently Avoid pre-load library worries But impacts other things Performance is limited FUSE Kernel-FUSE interactions are not cheap Rapid file creation (e.g., tar) is limited FUSE FUSE must be administratively installed to be used Difficult if involves many machines (e.g., batch workers) Easier if it involves an SE node (i.e., SRM gateway)

28-November-0711: Next Generation Clustering cmsd Cluster Management Service (cmsd) Functionally replaces olbd Compatible with olbd config file Unless you are using deprecated directives Straight forward migration Either run olbd or cmsd everywhere Currently in alpha test phase Available in CVS head Documentation on web site

28-November-0712: cmsd cmsd Advantages Much lower latency New very extensible protocol Better fault detection and recovery Added functionality Global clusters Authentication Server selection can include space utilization metric Uniform handling of opaque information Cross protocol messages to better scale xproof clusters Better implementation for reduced maintenance cost

28-November-0713: Cluster Globalization cmsd xrootd UTA cmsd xrootd UOM cmsd xrootd BNL all.role meta manager all.manager meta atlas.bnl.gov:1312root://atlas.bnl.gov/includes SLAC, UOM, UTA xroot clusters Meta Managers can be geographically replicated! cmsd xrootd SLAC all.manager meta atlas.bnl.gov:1312 all.role manager

28-November-0714: Why Globalize? Uniform view of participating clusters Can easily deploy a virtual MSS Included as part of the existing MPS framework Try out real time WAN access You really don’t need data everywhere! Alice is slowly moving in this direction The non-uniform name space is an obstacle Slowly changing the old approach Some workarounds could be possible, though

28-November-0715: Virtual MSS Powerful mechanism to increase reliability Data replication load is widely distributed Multiple sites are available for recovery Allows virtually unattended operation Based on BaBar experience with real MSS Automatic restore due to server failure Missing files in one cluster fetched from another Typically the fastest one which has the file really online File (pre)fetching on demand Can be transformed into a 3 rd -party copy When cmsd is deployed Practically no need to track file location But does not preclude the need for metadata repositories

28-November-0716: The Virtual MSS Realized cmsd xrootd UTA cmsd xrootd UOM cmsd xrootd BNL all.role meta manager all.manager meta atlas.bnl.gov:1312 all.role manager Note: the security hats will likely require you use xrootd native proxy support cmsd xrootd SLAC But missing a file? Ask to the global metamgr Get it from any other collaborating cluster all.manager meta atlas.bnl.gov:1312 Local clients still continue to work

28-November-0717: * Dumb WAN Access * Setup: client at CERN, data at SLAC 164ms RTT time, available bandwidth < 100Mb/s Test 1: Read a large ROOT Tree (~300MB, 200k interactions) Expected time: 38000s (latency)+750s (data)+CPU ➙ 10 hrs! Test 2: Draw a histogram from that tree data (6k interactions) Measured time 20min Using xrootd with WAN optimizations disabled *Federico Carminati, The ALICE Computing Status and Readiness, LHCC, November 2007

28-November-0718: * Smart WAN Access * Exploit xrootd WAN Optimizations TCP multi-streaming: for up to 15x improvement data WAN throughput The ROOT TTreeCache provides the hints on ”future” data accesses TXNetFile/XrdClient ”slide through” keeping the network pipeline full Data transfer goes in parallel with computation Throughput improvement comparable to “batch” file-copy tools 70-80%, improvement and we are doing a live analysis, not a file copy! Test 1 actual time: seconds Compared to 30 seconds using a Gb LAN Very favorable for sparsely used files Test 2 actual time: 7-8 seconds Comparable to LAN performance 100x improvement over dumb WAN access (i.e., 20 minutes) *Federico Carminati, The ALICE Computing Status and Readiness, LHCC, November 2007

28-November-0719: Conclusion Scalla is a robust framework Elaborative Composite Name Space XrootdFS Extensible Cluster globalization Many opportunities to enhance data analysis Speed and efficiency

28-November-0720: Acknowledgements Software Collaborators INFN/Padova: Alvise Dorigo Root: Fons Rademakers, Gerri Ganis (security), Beterand Bellenet (windows) Alice: Fabrizio Furano (client), Derek Feichtinger, Guenter Kickinger CERN: Andreas Peters (Castor) STAR/BNL: Pavel Jackl Cornell: Gregory Sharp SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks BaBar: Pete Elmer (packaging) Operational collaborators BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC Funding US Department of Energy Contract DE-AC02-76SF00515 with Stanford University