Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Genève 23 Switzerland t Xrootd LHC An up-to-date technical survey about xrootd- based storage solutions.

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Genève 23 Switzerland t Xrootd LHC An up-to-date technical survey about xrootd- based storage solutions."— Presentation transcript:

1 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Xrootd usage @ LHC An up-to-date technical survey about xrootd- based storage solutions

2 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Outline Intro –Main use cases in the storage arena Generic Pure xrootd @ LHC –The Atlas@SLAC way –The Alice way CASTOR2 Roadmap Conclusions F.Furano (CERN IT-DM) - Xrootd usage @ LHC

3 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Introduction and use cases F.Furano (CERN IT-DM) - Xrootd usage @ LHC

4 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The historical Problem: data access F.Furano (CERN IT-DM) - Xrootd usage @ LHC Physics experiments rely on rare events and statistics –Huge amount of data to get a significant number of events The typical data store can reach 5-10 PB… now Millions of files, thousands of concurrent clients –The transaction rate is very high Not uncommon O(10 3 ) file opens/sec per cluster –Average, not peak –Traffic sources: local GRID site, local batch system, WAN Up to O(10 4 ) clients per server! If not met then the outcome is: –Crashes, instability, workarounds, “need” for crazy things Scalable high performance direct data access –No imposed limits on performance and size, connectivity –Higher performance, supports WAN direct data access –Avoids WN under-utilization –No need to do inefficient local copies if not needed Do we fetch entire websites to browse one page?

5 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The Challenges LHC User Analysis Boundary Conditions –GRID environment GSI authentication User space deployment –CC environment Kerberos, - admin deployment High I/O load Moderate Namespace load Many clients O(1000-10000) Sequential File Access Sparse File Access Basic Analysis (today) RAW, ESD Advanced Analysis (tomorrow) ESD,AOD, Ntuple, Histograms Batch Data Access Interactive Data Access RAP root, dcap,rfio.... MFS Mounted File Systems T0/T3 @ CERN Preferred interface is MFS –Easy, intuitive, fast response, standard applications –Moderate I/O load –High Namespace load Compilation Software startup searches Less Clients O(#users) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

6 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Main requirement Data access has to work reliably at the desired scale –This also means: It has not to waste resources F.Furano (CERN IT-DM) - Xrootd usage @ LHC

7 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t A simple use case I am a physicist, waiting for the results of my analysis jobs –Many bunches, several outputs Will be saved e.g. to an SE at CERN –My laptop is configured to show histograms etc, with ROOT –I leave for a conference, the jobs finish while in the plane –When there, I want to simply draw the results from my home directory –When there, I want to save my new histos in the same place –I have no time to loose in tweaking to get a copy of everything. I loose copies into the confusion. –I want to leave the things where they are. I know nothing about things to tweak. What can I expect? Can I do it? I know nothing about things to tweak. What can I expect? Can I do it? F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)

8 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Another use case ALICE analysis on the GRID Each job reads ~100-150MB from ALICE::CERN::SE These are cond data accessed directly, not file copies –I.e. VERY efficient, one job reads only what it needs. It just works, no workarounds –At 10-20MB/s it takes 5-10 secs (most common case) –At 5MB/s it takes 20secs –At 1MB/s it takes 100 Sometimes data are accessed elsewhere –Alien allows to save a job by making it read data from a different site. Very good performance Quite often the results are written/merged elsewhere F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)

9 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Pure Xrootd F.Furano (CERN IT-DM) - Xrootd usage @ LHC

10 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t xrootd Plugin Architecture F.Furano (CERN IT-DM) - Xrootd usage @ LHC lfn2pfn prefix encoding Storage System (oss, drm/srm, etc) authentication (gsi, krb5, etc) Clustering (cmsd) authorization (name based) File System (ofs, sfs, alice, etc) Protocol (1 of n) (xrootd) Protocol Driver (XRD)

11 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The client side Fault tolerance in data access –Meets WAN requirements, reduces jobs mortality Connection multiplexing (authenticated sessions) Up to 65536 parallel r/w requests at once per client process Up to 32767 open files per client process –Opens bunches of up to O(1000) files at once, in parallel –Full support for huge bulk prestages Smart r/w caching –Supports normal readaheads and “Informed Prefetching” Asynchronous background writes –Boosts writing performance in LAN/WAN Sophisticated integration with ROOT –Reads in advance the “right” chunks while the app computes the preceding ones –Boosts read performance in LAN/WAN (up to the same order) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

12 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The Xrootd “protocol” The XRootD protocol is a good one Efficient, clean, supports fault-tolerance etc. etc… –It doesn’t do any magic, however It does not multiply your resources It does not overcome hw bottlenecks BUT it allows the true usage of the hw resources –One of the aims of the project still is sw quality In the carefully crafted pieces of sw which come with the distribution –What makes the difference with Scalla/XRootD is: Scalla/XRootD Implementation details (performance + robustness) –And bad performance can hurt robustness (and vice-versa) Scalla SW architecture (scalability + performance + robustness) Designed to fit the HEP requirements You need a clean design where to insert it Born with efficient direct access in mind –But with the requirements of high performance computing –Copy-like access becomes a particular case F.Furano (CERN IT-DM) - Xrootd usage @ LHC

13 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Pure Xrootd @ LHC F.Furano (CERN IT-DM) - Xrootd usage @ LHC

14 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The Atlas@SLAC way with XROOTD Pure Xrootd + Xrootd-based “filesystem” extension Adapters to talk to BestMan SRM and GridFTP More details in A.Hanushevsky’s talk @ CHEP09 F.Furano (CERN IT-DM) - Xrootd usage @ LHC xrootdcmsdcnsd xrootd/cmsd/cnsd FUSE ADAPTER FUSE GridFTP Fire Wall SRM

15 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The ALICE way with XROOTD Pure Xrootd + ALICE strong authz plugin. No difference among T1/T2 (only size and QOS) WAN-wide globalized deployment, very efficient direct data access CASTOR at Tier-0 serving data, Pure Xrootd serving conditions to the GRID jobs “Old” DPM+Xrootd in several tier2s F.Furano (CERN IT-DM) - Xrootd usage @ LHC Xrootd site (GSI) A globalized cluster ALICE global redirector Local clients work Normally at each site Missing a file? Ask to the global redirector Get redirected to the right collaborating cluster, and fetch it. Immediately. A smart client could point here Any other Xrootd site (CERN) Cmsd Xrootd Virtual Mass Storage System … built on data Globalization More details and complete info in “Scalla/Xrootd WAN globalization tools: where we are.” @ CHEP09

16 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t CASTOR2 Putting everything together @ Tier0/1s F.Furano (CERN IT-DM) - Xrootd usage @ LHC

17 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The CASTOR way Client connects to a redirector node The redirector asks CASTOR where the file is Client then connects directly to the node holding the data CASTOR handles tapes in the back F.Furano (CERN IT-DM) - Xrootd usage @ LHC Disk Servers Redirector A B C Client Open file X Go to C CASTOR Where is X ? On C Tape backend Trigger migration/recall Credits: S.Ponce (IT-DM)

18 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t CASTOR 2.1.8 Improving Latency - Read 1 st focus on file (read) open latencies Estimate 1 10 100 1000 ms Castor 2.1.7 (rfio) Castor 2.1.7 (rfio) Castor 2.1.8 (xroot) Castor 2.1.8 (xroot) Castor 2.1.9 (xroot) Castor 2.1.9 (xroot) October 2008 Network Latency Limit Read Open Latencies Credits: A.Peters (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

19 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Estimate CASTOR 2.1.8 Improving Latency – Metadata Read Next focus on meta data (read) latencies 1 10 100 1000 ms Castor 2.1.7 Castor 2.1.8 Castor 2.1.9 October 2008 Network Latency Limit Stat Latencies Credits: A.Peters (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

20 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Prototype - Architecture XCFS Overview - xroot + FUSE DATA FS Authz CLIENT xcfsd libXrdPosix libXrdClient /dev/fuse VFS Client Application glibc libfuse FUSE LL Implementation XROOT Posix Library XROOT Client Library Posix access to /xcfs (i.e. a generic application) Posix access to /xcfs (i.e. a generic application) libXrdCatalogFs xrootd libXrdSec xrootd libXrdSec libXrdCatalogFs xrootd libXrdSec libXrdCatalogOfs xrootd libXrdSecUnix XFS FS Name Space Provider Meta Data Filesystem libXrdCatalogAuthz Strong Auth Plugin xrootd server daemon Remote Access Protocol (ROOT plugs here) Remote Access Protocol (ROOT plugs here) DISK SERVER MD SERVER Capability Credits: A.Peters (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

21 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Early Prototype - Evaluation Meta Data Performance File Creation* File Rewrite File Read Rm Readdir/Stat Access ~1.000/s ~2.400/s ~2.500/s ~3.000/s Σ = 70.000/s * These values have been measured executing shell commands on 216 mount clients. Creation performance decreases with the filling of the namespace on a spinning medium. Using an XFS filesystem over a DRBD blockdevice in a high-availability setup file creation perfromance stabilizes at 400/s (20 Mio files in the namespace) Credits: A.Peters (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

22 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Network usage (or waste!) Network traffic is an important factor – it has to match the ratio IO(CPU Server) / IO(Disk Server) –Too much unneeded traffic means fewer clients supported (serious bottleneck: 1 client works well, 100-1000 clients do not at all) –Lustre doesn't disable readahead during forward-seeking access and transfers the complete file if reads are found in the buffer cache (readahead window starts with 1M and scales up to 40 M) XCFS/LUSTRE/NFS4 network volume without read-ahead is based on 4k pages in Linux –Most of the requests are not page aligned and result in additional pages to be transferred (avg. read size 4k), hence they xfer twice as much data (but XCFS can skip this now!) –2nd execution plays no real role for analysis since datasets are usually bigger than client buffer cache F.Furano (CERN IT-DM) - Xrootd usage @ LHC Credits: A.Peters (IT-DM) – ACAT2008

23 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Why is that useful? Users can access data by LFN without specification of the stager Users are automatically directed to 'their' pool with write permissions Why is that useful? Users can access data by LFN without specification of the stager Users are automatically directed to 'their' pool with write permissions CASTOR 2.1.8-6 Cross Pool Redirection T3 Stager T0 Stager XX X Manager Server Meta Manager Name Space T3 pool subscribed r/w for /castor/user r/w for /castor/cms/user/ T0 pool subscribed ro for /castor ro for /castor/cms/data T3 pool subscribed r/w for /castor/user r/w for /castor/cms/user/ T0 pool subscribed ro for /castor ro for /castor/cms/data Example Configuration There are even more possibilities if a part of the namespace can be assigned to individual pools for write operations. There are even more possibilities if a part of the namespace can be assigned to individual pools for write operations. X xrootd cmsd (cluster management) Manager Credits: A.Peters (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

24 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Towards a Production Version Further Improvements – Security GSI/VOMS authentication plugin prototype developed based on pure OpenSSL –using additionally code from mod_ssl & libgridsite –significantly faster than GLOBUS implementation After Security Workshop with A.Hanushevsky Virtual Socket Layer introduced into xrootd authentication plugin base to allow socket oriented authentication over xrootd protocol layer –Final version should be based on OpenSSL and VOMS library VirtualSocketVirtualSocket VirtualSocketVirtualSocket F.Furano (CERN IT-DM) - Xrootd usage @ LHC

25 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t The roadmap F.Furano (CERN IT-DM) - Xrootd usage @ LHC

26 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t XROOT Roadmap @CERN XROOT is strategic for scalable analysis support with CASTOR at CERN / T1s will support other file access protocols until they become obsolete CASTOR Secure RFIO has been released in 2.1.8 deployment impact in terms of CPU may be significant –Secure XROOT is default in 2.1.8 (Kerb. or X509) Expect to lower CPU cost than rfio due to session model No plans to provide un-authenticated access via XROOT F.Furano (CERN IT-DM) - Xrootd usage @ LHC

27 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t XROOTD Roadmap CASTOR –Secure RFIO has been released in 2.1.8 –deployment impact in terms of CPU may be significant Secure XROOT is default in 2.1.8 (Kerb. or X509) Expect to lower CPU cost than rfio due to session model No plans to provide un-authenticated access via XROOT DPM –support for authentication via xrootd is scheduled start certification begin of July dCache –Relies on a custom full re-implementation of XROOTD protocol –protocol docs have been updated by A. Hanushevsky –in contact with CASTOR/DPM team to add authentication/authorisation on the server side –evaluating common client plug-in / security protocol F.Furano (CERN IT-DM) - Xrootd usage @ LHC

28 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Conclusion A very dense roadmap Many, many tech details Heading for –Solid and high performance data access For production and analysis –More advanced user analysis scenarios –Need to match existing architectures, protocols and workarounds F.Furano (CERN IT-DM) - Xrootd usage @ LHC

29 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Thank you Questions?


Download ppt "CERN IT Department CH-1211 Genève 23 Switzerland t Xrootd LHC An up-to-date technical survey about xrootd- based storage solutions."

Similar presentations


Ads by Google