Presentation is loading. Please wait.

Presentation is loading. Please wait.

XROOTD Tutorial Part 1 Introduction and basic concepts Fabrizio Furano.

Similar presentations


Presentation on theme: "XROOTD Tutorial Part 1 Introduction and basic concepts Fabrizio Furano."— Presentation transcript:

1 XROOTD Tutorial Part 1 Introduction and basic concepts Fabrizio Furano

2 Purpose A basic tutorial for present and future sysadmins A basic tutorial for present and future sysadmins Should be useful for other roles as well Should be useful for other roles as well Many many ideas around xrootd, we cannot cover everything, so we start from the beginning Many many ideas around xrootd, we cannot cover everything, so we start from the beginning Goals: Goals: Knowing what we are talking about Knowing what we are talking about Doing a couple of exercises Doing a couple of exercises Being able to face the effort of setting up a cluster Being able to face the effort of setting up a cluster Being able to solve problems – support people Being able to solve problems – support people XROOTD tutorial - GridKA school 20102

3 Outline What’s that? What’s that? The original distribution (vanilla xrootd) The original distribution (vanilla xrootd) What/where is it, how to do simple things with it What/where is it, how to do simple things with it Exercise: setting up a personal data server (1hr) Exercise: setting up a personal data server (1hr) The bundles The bundles Philosophy Philosophy Let’s take one Let’s take one What does it do in general What does it do in general Exercise: setting up our cluster (1 hr) Exercise: setting up our cluster (1 hr) Exercise: doing something with it (30min) Exercise: doing something with it (30min) Conclusion and other directions Conclusion and other directions E.g. vMSS, SRM compliance E.g. vMSS, SRM compliance XROOTD tutorial - GridKA school 20103

4 Xrd for dummies A plugin loader, whose default set of plugins does… A plugin loader, whose default set of plugins does… …storage aggregation (disks/machines/sites) …storage aggregation (disks/machines/sites) Aggregating means hiding the distribution through an unique entry point Aggregating means hiding the distribution through an unique entry point High performance data access through a specialized client High performance data access through a specialized client Smart design, modern protocols, timeouts, “infinite” scalability, fault tolerance, … Smart design, modern protocols, timeouts, “infinite” scalability, fault tolerance, … NO databases, the file systems already know enough about their content NO databases, the file systems already know enough about their content Fully plugin based Fully plugin based All the hooks that are needed by serious app developers All the hooks that are needed by serious app developers Alone it does basic things Alone it does basic things The power comes from the configurability and the adaptability to HEP and HPC requirements The power comes from the configurability and the adaptability to HEP and HPC requirements XROOTD tutorial - GridKA school 20104

5 xrootd Plugin Architecture 5 XROOTD tutorial - GridKA school 2010 lfn2pfn prefix encoding Storage System (oss, drm/srm, etc) authentication (gsi, krb5, etc) Clustering (cmsd) Authorization (default, alice, etc) File System (ofs, sfs, alice, etc) Protocol (1 of n) (xrootd, xproofd etc.) Protocol Driver (XRD)

6 How does it work (1/2) A single server aggregates mountpoints A single server aggregates mountpoints XROOTD tutorial - GridKA school 20106xrootd EXPORT an unique name space, e.g. /mydata/a/b/c There’s no trace of the mountpoints here Mount points, i.e. FAST local storage (although fragmented) /dataX /dataY /dataZ /data.. ClientClient

7 How does it work (2/2) A redirector aggregates up to 64 servers A redirector aggregates up to 64 servers (Many redirectors, called supervisors) can aggregate up to 200K servers) (Many redirectors, called supervisors) can aggregate up to 200K servers) XROOTD tutorial - GridKA school cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd Client Client A small 2-level cluster. Can hold Up to 64 servers P2P-like

8 TCP daemons These are all high performance TCP servers, living in one port each (normally) These are all high performance TCP servers, living in one port each (normally) 1094/TCP – Standard data access port. This must be visible to the applications/users, eventually from outside the site 1094/TCP – Standard data access port. This must be visible to the applications/users, eventually from outside the site All the servers must be reachable by the apps All the servers must be reachable by the apps All the servers must be configured as… uhm… servers! All the servers must be configured as… uhm… servers! Max available number of file descriptors is often server-unfriendly Max available number of file descriptors is often server-unfriendly Thousands of clients per server can happen often Thousands of clients per server can happen often If more clients need to be accommodated  problems/OS choppiness If more clients need to be accommodated  problems/OS choppiness The port related to the internal clustering protocol is less important. Applications/users do not use it. A common port for this is 3122/TCP The port related to the internal clustering protocol is less important. Applications/users do not use it. A common port for this is 3122/TCP XROOTD tutorial - GridKA school 20108

9 Name translation LFN = Logical File Name LFN = Logical File Name It’s the filename in the EXPORTED namespace It’s the filename in the EXPORTED namespace As it is read/written by the applications As it is read/written by the applications PFN = Physical File Name PFN = Physical File Name It’s the INTERNAL filename It’s the INTERNAL filename The file as it is stored in the mountpoints The file as it is stored in the mountpoints NOT visible by the applications, they don’t need. NOT visible by the applications, they don’t need. Only the sysadmin knows it Only the sysadmin knows it XROOTD tutorial - GridKA school 20109

10 LFN PFN mapping (1/2) Simple and fast, just a string mapping Simple and fast, just a string mapping Please remember that the apps DO NOT SEE this Please remember that the apps DO NOT SEE this Let’s suppose that we have only one mountpoint /mnt/data1 : Let’s suppose that we have only one mountpoint /mnt/data1 : PFN = /LFN PFN = /LFN E.g. /mydata/myfile.dat  /mnt/data1/mydata/myfile.dat E.g. /mydata/myfile.dat  /mnt/data1/mydata/myfile.dat The string is called LOCALROOT The string is called LOCALROOT It usually is a mountpoint with an additional directory It usually is a mountpoint with an additional directory XROOTD tutorial - GridKA school

11 LFN PFN mapping (2/2) LOCALROOT is one of the best friends of security LOCALROOT is one of the best friends of security It means that no application has access to any directory in the machine that does not begin with this prefix It means that no application has access to any directory in the machine that does not begin with this prefix In other words: every data file stored will have a private path starting with it In other words: every data file stored will have a private path starting with it So you know where the stuff goes So you know where the stuff goes And that nobody will mess up with it And that nobody will mess up with it XROOTD tutorial - GridKA school

12 Mountpoints for data ALWAYS store your data in a SUBDIRECTORY ALWAYS store your data in a SUBDIRECTORY It’s easier to rename/move/maintain It’s easier to rename/move/maintain Like: /mnt/data01/xrddata Like: /mnt/data01/xrddata /mnt/data01 IS A VERY BAD CHOICE /mnt/data01 IS A VERY BAD CHOICE /home/xrootd IS EVEN WORSE /home/xrootd IS EVEN WORSE In case of hw replacements/failures these are your best friends, KEEP THEM SIMPLE AND PRACTICAL In case of hw replacements/failures these are your best friends, KEEP THEM SIMPLE AND PRACTICAL The user running the xrootd daemon must have rwx access to them (possibly own them) The user running the xrootd daemon must have rwx access to them (possibly own them) XROOTD tutorial - GridKA school

13 Aggregating mountpoints We aggregate several mountpoints into one server by giving to the xrootd daemon one more information We aggregate several mountpoints into one server by giving to the xrootd daemon one more information Yes, the list of the dirs to aggregate, what else? Yes, the list of the dirs to aggregate, what else? This is called “Cache File System” This is called “Cache File System” When given this information, a server will slightly change the way it places files around When given this information, a server will slightly change the way it places files around LOCALROOT will still hold the filenames, but they are symlinks in this case LOCALROOT will still hold the filenames, but they are symlinks in this case The various dirs hold the data files, with the names slightly modified (but still recognizable) The various dirs hold the data files, with the names slightly modified (but still recognizable) In practice LOCALROOT hosts the “catalogue”, or, better, the “namespace” In practice LOCALROOT hosts the “catalogue”, or, better, the “namespace” And it can always be reconstructed in case of disasters And it can always be reconstructed in case of disasters XROOTD tutorial - GridKA school

14 Aggregating mountpoints Again: DO NOT PUT DATA STRAIGHT INTO MOUNTPOINTS Again: DO NOT PUT DATA STRAIGHT INTO MOUNTPOINTS Create a directory into each of them. In the case of the cache filesystem something like: Create a directory into each of them. In the case of the cache filesystem something like: /xrddata /xrddata A good name for the localroot one is A good name for the localroot one is /xrdnamespace /xrdnamespace Of course, one of the mountpoints will contain BOTH the localroot (which acts as a namespace) AND one dir of data Of course, one of the mountpoints will contain BOTH the localroot (which acts as a namespace) AND one dir of data XROOTD tutorial - GridKA school

15 The root user (1/2) Simple rule (the same as Apache): an xrootd/cmsd daemon REFUSES TO START AS ROOT. Simple rule (the same as Apache): an xrootd/cmsd daemon REFUSES TO START AS ROOT. So, you always need a proper user for it to run (most people use ‘xrootd’) So, you always need a proper user for it to run (most people use ‘xrootd’) It MUST have rwx access to the data mountpoints, ev. owning them It MUST have rwx access to the data mountpoints, ev. owning them In theory it does not need a $HOME, in practice, in the more sophisticated setups there’s always some plugin that needs it. In theory it does not need a $HOME, in practice, in the more sophisticated setups there’s always some plugin that needs it. Hence, for us it’s as if it’s needed. Let’s do it. Hence, for us it’s as if it’s needed. Let’s do it. XROOTD tutorial - GridKA school

16 The root user (2/2) In practice: In practice: Root is used only to setup the machine, create partitions/mountpoints etc. Root is used only to setup the machine, create partitions/mountpoints etc. The setup of the vanilla package can be anywhere, including problematic places like /usr/bin/xrootd or /opt/xrootd or /usr/bin etc. The setup of the vanilla package can be anywhere, including problematic places like /usr/bin/xrootd or /opt/xrootd or /usr/bin etc. The setup of the more sophisticated bundles is done generally in /home/xrootd The setup of the more sophisticated bundles is done generally in /home/xrootd Some sysadmins stick to /usr or /opt or love to put everything into an RPM package. Some sysadmins stick to /usr or /opt or love to put everything into an RPM package. The setup and the HOME must be in a LOCAL DRIVE, so everything works also if the machine is temporarily disconnected The setup and the HOME must be in a LOCAL DRIVE, so everything works also if the machine is temporarily disconnected XROOTD tutorial - GridKA school

17 The server machine It MUST always work, hence: It MUST always work, hence: Avoid dependencies to useless things Avoid dependencies to useless things E.g. AFS/NFS homes… NO! $HOME must be a local and separated partition, different from the one hosting the data E.g. AFS/NFS homes… NO! $HOME must be a local and separated partition, different from the one hosting the data This aids sleep… This aids sleep… In general, it must be able to survive arbitrarily long network disconnections In general, it must be able to survive arbitrarily long network disconnections Once reconnected it has to work without intervention Once reconnected it has to work without intervention One of the consequences of the xrootd fault tolerance mechanism is that the traffic may come almost immediately after the reconnection One of the consequences of the xrootd fault tolerance mechanism is that the traffic may come almost immediately after the reconnection Every relaxation of these is in the responsibility of the sysadmin Every relaxation of these is in the responsibility of the sysadmin Being called by night is generally not funny Being called by night is generally not funny XROOTD tutorial - GridKA school

18 Where to get it Let’s stick to the vanilla tarball for the moment Let’s stick to the vanilla tarball for the moment 2 places: 2 places: The original repo at SLAC The original repo at SLAC The Savannah repo at CERN The Savannah repo at CERN https://savannah.cern.ch/projects/xrootd https://savannah.cern.ch/projects/xrootd https://savannah.cern.ch/projects/xrootd XROOTD tutorial - GridKA school

19 Pre-requirements A working development environment (g++, libs, etc.) A working development environment (g++, libs, etc.) Yum gcc, gcc-c++, zlib-devel Yum gcc, gcc-c++, zlib-devel The servers don’t need anything special to compile The servers don’t need anything special to compile Some plugins do! Some plugins do! E.g. Kerberos, X509 etc… E.g. Kerberos, X509 etc… The configure.classic script disables everything for which the requirements are not met The configure.classic script disables everything for which the requirements are not met For the moment we want just to do an exercise, we don’t need strange things (we will) For the moment we want just to do an exercise, we don’t need strange things (we will) Locate the latest stable tarball in the website(s) Locate the latest stable tarball in the website(s) XROOTD tutorial - GridKA school

20 Download and unpack XROOTD tutorial - GridKA school

21 Configure/Compile it XROOTD tutorial - GridKA school

22 Start it manually Let’s start our personal server: xrootd [–d] Let’s start our personal server: xrootd [–d] XROOTD tutorial - GridKA school

23 It’s already working As a single, non clusterized server As a single, non clusterized server By default: By default: It exports /tmp It exports /tmp No LFN/PFN translation (identity function) No LFN/PFN translation (identity function) Prints the log to stdout Prints the log to stdout With –d we started it in DEBUG mode, so it’s quite verbose With –d we started it in DEBUG mode, so it’s quite verbose Familiarize with the log Familiarize with the log XROOTD tutorial - GridKA school

24 URL format root://HOST/ABSOLUTEPATH root://HOST/ABSOLUTEPATH HOST  host1[,host2,…hostN][:port] HOST  host1[,host2,…hostN][:port] A random host is chosen if there are alternatives A random host is chosen if there are alternatives Each hostname can be DNS-aliased Each hostname can be DNS-aliased NB this is not DNS round-robin NB this is not DNS round-robin ABSOLUTEPATH is an absolute path, hence it starts with ‘/’ ABSOLUTEPATH is an absolute path, hence it starts with ‘/’ Hence, an URL looks like: Hence, an URL looks like: root://myhost//mypath/myfile root://myhost//mypath/myfile XROOTD tutorial - GridKA school

25 Xrdcp It’s the xrootd data copy app It’s the xrootd data copy app Basic usage: xrdcp Basic usage: xrdcp Where and can be: Where and can be: Local pathnames e.g. /home/furano/mydata.txt Local pathnames e.g. /home/furano/mydata.txt Root: URLS, e.g. root://host//mydata.txt Root: URLS, e.g. root://host//mydata.txt XROOTD tutorial - GridKA school

26 Xrdcp – the basics It’s a data copy program, with several features It’s a data copy program, with several features The easiest way to test a new server/cluster, just read/write into it and then check manually the presence of the files The easiest way to test a new server/cluster, just read/write into it and then check manually the presence of the files XROOTD tutorial - GridKA school

27 The config file [xrootd.cf] Right now we just had a simple personal server. Good to play with, useless in a serious site… Right now we just had a simple personal server. Good to play with, useless in a serious site… We need to configure it, clusterize etc. We need to configure it, clusterize etc. The syntax is described in the docs in the website The syntax is described in the docs in the website Let’s have a quick look Let’s have a quick look TONS of options may be specified, to accommodate the weirdest requirements TONS of options may be specified, to accommodate the weirdest requirements Let’s start from the very basic ones: Let’s start from the very basic ones: export : Allows a directory prefix to be exported (by default only /tmp is exported) export : Allows a directory prefix to be exported (by default only /tmp is exported) oss.localroot : Configure the LFN PFN translation oss.localroot : Configure the LFN PFN translation oss.cache : Specify mountpoints to aggregate oss.cache : Specify mountpoints to aggregate XROOTD tutorial - GridKA school

28 Localroot, PFN, LFN XROOTD tutorial - GridKA school

29 The cache file system Ugly historical name, actually it’s not a cache at all(!) Ugly historical name, actually it’s not a cache at all(!) It’s the mechanism used to aggregate partitions It’s the mechanism used to aggregate partitions The true file name is put as a symlink into the LOCALROOT The true file name is put as a symlink into the LOCALROOT The data file (slightly renamed) is put into the appropriate data partition The data file (slightly renamed) is put into the appropriate data partition The link points to the data file The link points to the data file XROOTD tutorial - GridKA school

30 Using partitions [oss.cache] XROOTD tutorial - GridKA school

31 The ‘xrd’ command line (1/3) An UI that gathers together all the functionalities that are not related to data read/write, e.g. An UI that gathers together all the functionalities that are not related to data read/write, e.g. Stat: gives info about a file (size, date etc.) Stat: gives info about a file (size, date etc.) Locatesingle: find the first replica of a file in the cluster (used by PROOF to optimize its scheduling) Locatesingle: find the first replica of a file in the cluster (used by PROOF to optimize its scheduling) Locateall: find all the replicas of a file Locateall: find all the replicas of a file Dirlist: list the content of a directory Dirlist: list the content of a directory Rm: try to guess… Rm: try to guess… The easiest thing to do is starting it and request ‘help’ The easiest thing to do is starting it and request ‘help’ XROOTD tutorial - GridKA school

32 The ‘xrd’ command line (2/4) A true example. Enabling the debug mode we discover why a data server seems broken from outside A true example. Enabling the debug mode we discover why a data server seems broken from outside In practice we are not able to connect because the firewall is closed In practice we are not able to connect because the firewall is closed XROOTD tutorial - GridKA school

33 The ‘xrd’ command line (3/4) XROOTD tutorial - GridKA school

34 The ‘xrd’ command line (4/4) We can use it in scripts We can use it in scripts Just put the command+args in the command line: Just put the command+args in the command line: xrd host[:port] cmd arg1 arg2 … argN XROOTD tutorial - GridKA school

35 Directories and exports It may seem philosophical, but ‘pure’ xrootd handles directories in a funny way It may seem philosophical, but ‘pure’ xrootd handles directories in a funny way Remember: everything was designed to optimize the frequent case, i.e. open/read/write Remember: everything was designed to optimize the frequent case, i.e. open/read/write A directory in practice is not quite an entity A directory in practice is not quite an entity It’s more similar to a string that prefixes a filename It’s more similar to a string that prefixes a filename This means that the ‘xrd’ command line does its best to FAKE a directory structure that may not exist exactly in that form This means that the ‘xrd’ command line does its best to FAKE a directory structure that may not exist exactly in that form XROOTD tutorial - GridKA school

36 Basic clustering Cmsd daemons clusterize into a tree-shaped network Cmsd daemons clusterize into a tree-shaped network Xrootd daemons talk to their cmsd counterpart Xrootd daemons talk to their cmsd counterpart Redirector machine Redirector machine Manager Manager Supervisor Supervisor Meta-manager Meta-manager Data server machine Data server machine XROOTD tutorial - GridKA school

37 How clusters work Dynamic subscription, p2p-like protocol, no static lists Dynamic subscription, p2p-like protocol, no static lists Servers are given the name of the redirector that administrates their cell (max 64) Servers are given the name of the redirector that administrates their cell (max 64) Redirectors may be managers or supervisors (=sub- managers) to create huge clusters Redirectors may be managers or supervisors (=sub- managers) to create huge clusters The protocol can pause/redirect clients explicitly and gracefully The protocol can pause/redirect clients explicitly and gracefully XROOTD tutorial - GridKA school

38 How clusters work XROOTD tutorial - GridKA school Client Redirector (Head Node) Data Servers open file X A B C go to C open file X Who has file X? I have Cluster 2 nd open X go to C Redirectors Cache file location

39 A word about security Plugins are trivial to load, that’s not the big deal Plugins are trivial to load, that’s not the big deal XrdSec already has a good number of them, covering most of the cases (SSS, krb4/5, X509, UNIX, ALICE tokens…) XrdSec already has a good number of them, covering most of the cases (SSS, krb4/5, X509, UNIX, ALICE tokens…) Less trivial is to configure them and match their protocol’s infrastructure Less trivial is to configure them and match their protocol’s infrastructure That’s not really xrootd stuff That’s not really xrootd stuff XROOTD tutorial - GridKA school

40 Authentication/Authorization Xrootd splits them off completely Xrootd splits them off completely XrdSec plugins XrdSec plugins How to authenticate a client How to authenticate a client XrdAcc plugins XrdAcc plugins What to do with the authenticated client, apply permissions, etc What to do with the authenticated client, apply permissions, etc In this tutorial we don’t have time to deal with that. It’s worth more than a tutorial only for security. In this tutorial we don’t have time to deal with that. It’s worth more than a tutorial only for security. BUT… in HEP there are common practices and standard configurations BUT… in HEP there are common practices and standard configurations Often common things in the same group/experiment Often common things in the same group/experiment XROOTD tutorial - GridKA school

41 An exercise (1h30’) Download the source tarball, compile it and start it in single server mode. Download the source tarball, compile it and start it in single server mode. Configure a private single server exporting the namespace “/mydata” Configure a private single server exporting the namespace “/mydata” The data namespace must be stored in the dir /scratch/ /xrdnamespace The data namespace must be stored in the dir /scratch/ /xrdnamespace And the data files into And the data files into /scratch/ /data1/ /scratch/ /data1/ /scratch/ /data2 /scratch/ /data2 Write a 10MB data file with LFN /mydata/ using xrdcp Write a 10MB data file with LFN /mydata/ using xrdcp Read it back to /dev/null, with xrdcp Read it back to /dev/null, with xrdcp Verify (as a sysadmin) the correctness of the symlink and of the data file Verify (as a sysadmin) the correctness of the symlink and of the data file XROOTD tutorial - GridKA school


Download ppt "XROOTD Tutorial Part 1 Introduction and basic concepts Fabrizio Furano."

Similar presentations


Ads by Google