Xrootd explained Cooperation among ALICE SEs

Slides:

Advertisements

Similar presentations

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari

Advertisements

Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

CERN IT Department CH-1211 Genève 23 Switzerland t XROOTD news Status and strategic directions ALICE-GridKa operations meeting 03 July 2009.

INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.

Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 29-October-09 ATLAS Tier 3 Meeting at ANL

Introduction to CVMFS A way to distribute HEP software on cloud Tian Yan (IHEP Computing Center, BESIIICGEM Cloud Computing Summer School.

XROOTD Tutorial Part 2 – the bundles Fabrizio Furano.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.

Scalla/xrootd Introduction Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 6-April-09 ATLAS Western Tier 2 User’s Forum.

July-2008Fabrizio Furano - The Scalla suite and the Xrootd1 cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd Client Client A small 2-level cluster. Can.

11-July-2008Fabrizio Furano - Data access and Storage: new directions1.

Xrootd Monitoring Atlas Software Week CERN November 27 – December 3, 2010 Andrew Hanushevsky, SLAC.

CERN IT Department CH-1211 Genève 23 Switzerland t Xrootd setup An introduction/tutorial for ALICE sysadmins.

LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.

July-2008Fabrizio Furano - The Scalla suite and the Xrootd1.

CERN IT Department CH-1211 Genève 23 Switzerland Internet Services Xrootd explained

ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)

File sharing requirements of remote users G. Bagliesi INFN - Pisa EP Forum on File Sharing 18/6/2001.

02-June-2008Fabrizio Furano - Data access and Storage: new directions1.

Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC), Fabrizio Furano (INFN/Padova), Gerardo Ganis.

CERN IT Department CH-1211 Genève 23 Switzerland t Scalla/xrootd WAN globalization tools: where we are. Now my WAN is well tuned! So what?

ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.

Scalla/xrootd Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 08-June-10 ANL Tier3 Meeting.

Scalla Advancements xrootd /cmsd (f.k.a. olbd) Fabrizio Furano CERN – IT/PSS Andrew Hanushevsky Stanford Linear Accelerator Center US Atlas Tier 2/3 Workshop.

Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.

Scalla Authorization xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory CERN Seminar 10-November-08

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS XROOTD news New release New features.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

SRM Space Tokens Scalla/xrootd Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-May-08

Scalla As a Full-Fledged LHC Grid SE Wei Yang, SLAC Andrew Hanushevsky, SLAC Alex Sims, LBNL Fabrizio Furano, CERN SLAC National Accelerator Laboratory.

Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop Castor2/xrootd.

CERN IT Department CH-1211 Genève 23 Switzerland t ALICE XROOTD news New xrootd bundle release Fixes and caveats A few nice-to-know-better.

Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.

11-June-2008Fabrizio Furano - Data access and Storage: new directions1.

CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.

Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

09-Apr-2008Fabrizio Furano - Scalla/xrootd status and features1.

CERN IT Department CH-1211 Genève 23 Switzerland t Xrootd through WAN… Can I? Now my WAN is well tuned! So what? Now my WAN is well tuned.

DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.

New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,

CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.

Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

Federating Data in the ALICE Experiment

INTRODUCTION TO WEB HOSTING

Unit 3 Virtualization.

Dynamic Extension of the INFN Tier-1 on external resources

Data Access in HEP A tech overview

Version Control with Subversion

ALICE Monitoring

Torrent-based software distribution

Report PROOF session ALICE Offline FAIR Grid Workshop #1

Status of the CERN Analysis Facility

Introduction to CVMFS A way to distribute HEP software on cloud

Principles of Network Applications

StoRM Architecture and Daemons

Data access and Storage

PanDA in a Federated Environment

Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.

5.0 : Windows Operating System

Simulation use cases for T2 in ALICE

The INFN Tier-1 Storage Implementation

Scalla/XRootd Advancements

OffLine Physics Computing

Grid Canada Testbed using HEP applications

Support for ”interactive batch”

Computer communications

Presentation transcript:

Xrootd explained Cooperation among ALICE SEs Current status and directions (The ALICE Global redirector) Fabrizio Furano CERN IT/GS Oct 2008 ALICE Offline Week http://savannah.cern.ch/projects/xrootd http://xrootd.slac.stanford.edu The ALICE global redirector

Scalla/Xrootd Most famous basic features No weird configuration requirements Scale setup complexity with the requirements’ complexity. No strange SW dependencies. Highly customizable Fault tolerance High, scalable transaction rate Open many files per second. Double the system and double the rate. NO DBs for filesystem-like funcs! Would you put one in front of your laptop’s file system? How long would the boot take? No known limitations in size and total global throughput for the repo Very low CPU usage on servers Happy with many clients per server Thousands. But check their bw consumption vs the disk/net performance! WAN friendly (client+protocol+server) Enable efficient remote POSIX-like direct data access through WAN WAN friendly (server clusters) Can set up WAN-wide huge repositories by aggregating remote clusters Or making them cooperate The ALICE global redirector

Basic working principle Client cmsd xrootd A small 2-level cluster. Can hold Up to 64 servers P2P-like The ALICE global redirector

Using WAN needs a design Technically, various things can be done with WANs and the xrootd The client can read/write efficiently We can group remote clusters into one meta-cluster Remote xrootd-based clusters can fetch files between them In ALICE these possibilities are going to be exploited in a coordinated way to: Give robustness to tier-2 data access… now! Without breaking the computing model Suggest new ideas/tricks For quick and robust data movement (Tier-2s) For maintenance, SE backups, …. The ALICE global redirector

Cluster globalization Up to now, xrootd clusters could be populated With xrdcp from an external machine Writing to the backend store (e.g. CASTOR/DPM/HPSS etc.) E.g. FTD in ALICE now uses the first Load and resources problems All the external traffic of the site goes through one machine Close to the dest cluster If a file is missing or lost For disk and/or catalog screwup Job failure ... manual intervention needed With 107 online files finding the source of a trouble can be VERY tricky The ALICE global redirector

The ALICE global redirector Missing files? But, what if the missing file is available in a neighbor cluster? What if the cluster was able to detect it (in a few millisecs) and fetch it at max speed? After all, it was a “hole” in the repo In the near future, this highly robust way to move files could be used as a transport mechanism Always coordinated by FTD The ALICE global redirector

Virtual Mass Storage System – What’s that? Basic idea: A request for a missing file comes at cluster X, X assumes that the file ought to be there And tries to get it from the collaborating clusters, from the fastest one The MPS (MSS intf) layer can do that in a very robust way Note that X itself is part of the game And it’s composed by many servers In practice Each cluster considers the set of ALL the others like a very big online MSS This is much easier than what it seems Slowly Into production for ALICE in tier-2s NOTE: it is up to the computing model to decide where to send jobs (which read files) The ALICE global redirector

The Virtual MSS in ALICE cmsd xrootd ALICE global redirector all.role meta manager all.manager meta alirdr.cern.ch:1312 But missing a file? Ask to the global metamgr Get it from any other collaborating cluster cmsd xrootd GSI cmsd xrootd CERN cmsd xrootd Prague NIHAM SUBATECH … any other Local clients work normally The ALICE global redirector

The ALICE::CERN::SE July trick All of the ALICE cond data is in ALICE:CERN::SE 5 machines pure xrootd, and all the jobs access it, from everywhere There was the need to refurbish that old cluster (2 yrs old) with completely new hw VERY critical and VERY stable service. Stop it and every job stops. A particular way to use the same pieces of the vMSS In order to phase out an old SE Moving its content to the new one Can be many TBs rsync cannot sync 3 servers into 5 or fix the organization of the files Advantages Files are spread evenly  load balancing is effective More used files are fetched typically first No service downtime, the jobs did not stop Server downtime of 10-20min The client side fault tolerance made the jobs retry with no troubles The ALICE global redirector

The ALICE::CERN::SE July trick Grid Shuttle other LOAD cmsd xrootd ALICE global redirector cmsd xrootd New SE (starting empty) cmsd xrootd Old CERN::ALICE::SE (full) The ALICE global redirector

The ALICE global redirector Virtual MSS The mechanism is there, fully “boxed” The new setup does everything it’s needed A side effect: Pointing an app to the “area” global redirector gives complete, load-balanced, low latency view of all the repository An app using the “smart” WAN mode can just run Probably now a full scale production/analysis won’t But what about a small debug analysis on a laptop? After all, HEP sometimes just copies everything, useful and not I cannot say that in some years we will not have a more powerful WAN infrastructure And using it to copy more useless data looks just ugly If a web browser can do it, why not a HEP app? Looks just a little more difficult. Better if used with a clear design in mind The ALICE global redirector

The tests at GSI and Subatech A small test to evaluate: The robustness of the xfer mechanism The illusion of having a unique huge repository NB From Alien it is not available… you must use the Alien tools The automated fetching from one site to another So, a functional test among “friend” sites But with the system in production BTW it “knows” all the ALICE OCDB storage The ALICE global redirector

The tests at GSI and Subatech Test: a small cluster at GSI and one at Subatech join the global game Plus a list of files (816, 1.5T in total) Ask those clusters to fetch the files from CERN Delete some files from one of them Verify that the files are still globally accessible Pointing an app to the global redirector Access the files In a local cluster, like a job and verify that they are immediately restored if absent Because they “ought to be there” by definition Basically it’s a functional test The ALICE global redirector

The tests at GSI and Subatech Very exciting results. No xfer failures reported. Some minor setup issues to be properly fixed Root-made setup, combinations of trailing slashes An app pointed to the global redirector just works Immediately, but this was limited to online files Some issues on the meta-manager to be investigated An app pointed where the file “disappeared” Works after the delay of fetching the file from the neighbor site Performance Just a max of 4 concurrent xfers brung about 80MB/s at GSI (2 data servers) 40-45MB/s at Subatech (1 data server) 10 xfers at GSI brung more than 100MB/s Both have a Gb WAN Completely filling it up does not look like a problem It looks that we saturated the disks The ALICE global redirector

The ALICE global redirector

Scalla/Xrootd simple setup The xrd-installer setup has been refurbished to include all the discussed items Originally developed by A.Peters and D.Feichtinger Usage: same of before Login as the xrootd user (being root is not necessary and adds complexity) Create a directory in $HOME Download the installer script wget http://project-arda-dev.web.cern.ch/project-arda-dev/xrootd/tarballs/installbox/xrd-installer Run it xrd-installer –install –prefix /home/xrduser/myxrootdinstall Then, there are 2 small files containing the parameters For the token authorization library (if used) For the rest (about 10 parameters) Geeks can also use the internal xrootd config file template And have access to really everything Needed only for very particular things The ALICE global redirector

Setup important parameters VMSS_SOURCE: where this cluster tries to fetch files from, in the case they are absent. LOCALPATHPFX: the prefix of the namespace which has to be made "facultative” (not needed by everybody) LOCALROOT is the local (relative to the mounted disks) place where all the data is put/kept by the xrootd server. OSSCACHE: probably your server has more than one disk to use to store data. Here you list the mountpoints to aggregate. MANAGERHOST: the redirector’s name SERVERONREDIRECTOR: is this machine both a server and redirector? OFSLIB: are you using the ALICE security model? METAMGRHOST, METAMGRPORT: host and port number of the meta-manager (default: ALICE global redirector) The ALICE global redirector

The ALICE global redirector Conclusion We spoke about the status of WAN access and WAN-wide clusters But the WAN could be a generic ADSL (which has quite a high latency) We like the idea of being able to access data efficiently and without pain, no matter from where And this is quite difficult to achieve with typical distributed FSs WAN-wide clusters and inter-cluster cooperation With reference to the ALICE data model A lot of work going on Assistance, integration, debug, deployments Setup robustness to config “styles” and combinations Different persons suppose different things (slashes, spaces, …) New fixes/features Documentation to update on the website (!) The ALICE global redirector

The ALICE global redirector Acknowledgements Old and new software Collaborators Andy Hanushevsky, Fabrizio Furano (client-side), Alvise Dorigo Root: Fons Rademakers, Gerri Ganis (security), Bertrand Bellenot (MS Windows porting) Derek Feichtinger, Andreas Peters, Guenter Kickinger STAR/BNL: Pavel Jakl, Jerome Lauret Cornell: Gregory Sharp Subatech: Jean-Michel Barbet GSI: Kilian Schwartz SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks Peter Elmer Operational collaborators BNL, CERN, CNAF, FZK, INFN, IN2P3, RAL, SLAC The ALICE global redirector